# 数字处理与数据处理实践


1. 数字处理与数据处理实践

1.1 输入文件处理示例

输入:numbers.txt

包含10行数字,如:2 3 4 5 6 7 8 9 10 11

输出:processed_numbers.txt

包含处理后的结果,如:[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

代码实现:

import sys

def process_numbers():
    output_file = "processed_numbers.txt"
    try:
        with open("numbers.txt", "r") as input_file:
            numbers = [int(line.strip()) for line in input_file]
            with open(output_file, "w") as output_file:
                for num in numbers:
                    output_file.write(str(num) + "\n")
    except FileNotFoundError:
        print("文件找不到,请检查路径!")
    finally:
        if sys.platform == "win32":
            # 混合操作(如Windows文件系统)
            # 示例:文件夹操作
            import os
            os.chdir(os.path.dirname(output_file))
            if os.path.exists("processed_numbers.txt"):
                os.remove(output_file)

1.2 CSV数据处理成Excel 示例

输入:data.csv

包含100行数据,如:name,age, score

输出:processed_data.xlsx

包含处理后的数据,如:name,age, score

代码实现:

import pandas as pd

def process_data():
    output_file = "processed_data.xlsx"
    try:
        df = pd.read_csv("data.csv")
        df.to_excel(output_file, index=False)
        print("数据已保存至Excel文件!")
    except FileNotFoundError:
        print("文件找不到,请检查路径!")
    finally:
        if os.path.exists(output_file):
            os.remove(output_file)

1.3 文本文件去除重复内容示例

输入:file.txt

含1000行文本,如:1 2 3 4 5 6 7 8 9 10 11

输出:cleaned_text.txt

去除重复内容,如:1 2 3 4 5 6 7 8 9 10

代码实现:

def remove_duplicates(file_path):
    output_file = "cleaned_text.txt"
    try:
        with open(file_path, "r", encoding="utf-8") as input_file:
            lines = input_file.readlines()
        unique_lines = []
        seen = set()
        for line in lines:
            line = line.strip()
            if not line or line in seen:
                unique_lines.append(line)
                seen.add(line)
        with open(output_file, "w", encoding="utf-8") as output_file:
            for line in unique_lines:
                output_file.write(line + "\n")
    except Exception as e:
        print(f"处理失败: {e}")
    finally:
        if os.path.exists(output_file):
            os.remove(output_file)

# 示例调用
remove_duplicates("file.txt")

1.4 文本文件去除特殊字符示例

输入:input.txt

含1000行文本,如:abc def 123 | 456 | 789

输出:output.txt

去除特殊字符,如:abc def 123 456 789

代码实现:

def remove_special_characters(file_path):
    output_file = "output.txt"
    try:
        with open(file_path, "r", encoding="utf-8") as input_file:
            lines = input_file.readlines()
        unique_lines = []
        seen = set()
        for line in lines:
            line = line.strip()
            if not line or line in seen:
                unique_lines.append(line)
                seen.add(line)
        with open(output_file, "w", encoding="utf-8") as output_file:
            for line in unique_lines:
                output_file.write(line + "\n")
    except Exception as e:
        print(f"处理失败: {e}")
    finally:
        if os.path.exists(output_file):
            os.remove(output_file)

# 示例调用
remove_special_characters("input.txt")

2. 总结

通过上述实践,可以看出数据处理和文件操作的核心逻辑:
文件处理:读取、解析、输出数据
数据清洗:去除重复、处理特殊字符
文件输出:保存结果并确保路径正确

无论使用Python还是Java,核心逻辑都遵循同样的思路:清晰的变量定义、逻辑分层(读取、处理、输出)以及注释解释。项目可独立运行,无需依赖框架,且实现时间在1~3天内即可完成。


发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注