1. 数字处理与数据处理实践
1.1 输入文件处理示例
输入:numbers.txt
包含10行数字,如:2 3 4 5 6 7 8 9 10 11
输出:processed_numbers.txt
包含处理后的结果,如:[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
代码实现:
import sys
def process_numbers():
output_file = "processed_numbers.txt"
try:
with open("numbers.txt", "r") as input_file:
numbers = [int(line.strip()) for line in input_file]
with open(output_file, "w") as output_file:
for num in numbers:
output_file.write(str(num) + "\n")
except FileNotFoundError:
print("文件找不到,请检查路径!")
finally:
if sys.platform == "win32":
# 混合操作(如Windows文件系统)
# 示例:文件夹操作
import os
os.chdir(os.path.dirname(output_file))
if os.path.exists("processed_numbers.txt"):
os.remove(output_file)
1.2 CSV数据处理成Excel 示例
输入:data.csv
包含100行数据,如:name,age, score
输出:processed_data.xlsx
包含处理后的数据,如:name,age, score
代码实现:
import pandas as pd
def process_data():
output_file = "processed_data.xlsx"
try:
df = pd.read_csv("data.csv")
df.to_excel(output_file, index=False)
print("数据已保存至Excel文件!")
except FileNotFoundError:
print("文件找不到,请检查路径!")
finally:
if os.path.exists(output_file):
os.remove(output_file)
1.3 文本文件去除重复内容示例
输入:file.txt
含1000行文本,如:1 2 3 4 5 6 7 8 9 10 11
输出:cleaned_text.txt
去除重复内容,如:1 2 3 4 5 6 7 8 9 10
代码实现:
def remove_duplicates(file_path):
output_file = "cleaned_text.txt"
try:
with open(file_path, "r", encoding="utf-8") as input_file:
lines = input_file.readlines()
unique_lines = []
seen = set()
for line in lines:
line = line.strip()
if not line or line in seen:
unique_lines.append(line)
seen.add(line)
with open(output_file, "w", encoding="utf-8") as output_file:
for line in unique_lines:
output_file.write(line + "\n")
except Exception as e:
print(f"处理失败: {e}")
finally:
if os.path.exists(output_file):
os.remove(output_file)
# 示例调用
remove_duplicates("file.txt")
1.4 文本文件去除特殊字符示例
输入:input.txt
含1000行文本,如:abc def 123 | 456 | 789
输出:output.txt
去除特殊字符,如:abc def 123 456 789
代码实现:
def remove_special_characters(file_path):
output_file = "output.txt"
try:
with open(file_path, "r", encoding="utf-8") as input_file:
lines = input_file.readlines()
unique_lines = []
seen = set()
for line in lines:
line = line.strip()
if not line or line in seen:
unique_lines.append(line)
seen.add(line)
with open(output_file, "w", encoding="utf-8") as output_file:
for line in unique_lines:
output_file.write(line + "\n")
except Exception as e:
print(f"处理失败: {e}")
finally:
if os.path.exists(output_file):
os.remove(output_file)
# 示例调用
remove_special_characters("input.txt")
2. 总结
通过上述实践,可以看出数据处理和文件操作的核心逻辑:
– 文件处理:读取、解析、输出数据
– 数据清洗:去除重复、处理特殊字符
– 文件输出:保存结果并确保路径正确
无论使用Python还是Java,核心逻辑都遵循同样的思路:清晰的变量定义、逻辑分层(读取、处理、输出)以及注释解释。项目可独立运行,无需依赖框架,且实现时间在1~3天内即可完成。