使用 python 对csv 数据进行分析,导出表格中相同数据数量,统计某一列中 各个选项的重复数量,并导出为新的表格
It appears there was an issue executing the Python code due to an internal error. However, I can guide you through the process of splitting a large CSV file into smaller files, each containing 1,000,000 rows, using Python and pandas. Here’s how you can do it:
-
Import pandas: First, you need to import the pandas library. If you don’t have pandas installed, you can install it using pip:
pip install pandas
. -
Read and Split the CSV File:
- Use
pandas.read_csv
with thechunksize
parameter set to 1,000,000. This will read the file in chunks of 1,000,000 rows at a time. - Iterate over these chunks, and for each chunk, save it as a new CSV file.
- Use
-
Example Code:
import pandas as pd def split_csv(file_path, chunk_size=1000000): # Read the large csv file with specified chunksize chunk_generator = pd.read_csv(file_path, chunksize=chunk_size) for i, chunk in enumerate(chunk_generator): # Define the name for the smaller chunk files new_file_name = f'chunk_{i+1}.csv' # Save the chunk to a new file chunk.to_csv(new_file_name, index=False) print(f'Chunk {i+1} saved as {new_file_name}') # Replace '/path/to/your/large/file.csv' with the path to your large CSV file split_csv('/path/to/your/large/file.csv')
In this code:
- Replace
'/path/to/your/large/file.csv'
with the path to your large CSV file. - The function
split_csv
reads your large CSV file in chunks and saves each chunk as a new file namedchunk_1.csv
,chunk_2.csv
, and so on.
Remember, this process might take some time depending on the size of the CSV file and the performance of your computer.