Pandas 如何在Excel中排序日期

强大的应用程序（如Excel）在处理数据数组方面表现出色，但某些复杂的操作可能需要更强大的工具集。如果初始数据存在偏差或需要高级排序机制，那么整理基于日期的条目的任务可能会带来独特的挑战。Pandas是一个强大的Python库，专门用于数据处理和审核，它弥补了这一空白。本文将介绍如何使用Pandas在Excel表格中精确排序日期序列，并对每行代码进行清晰的解释。

安装Pandas和OpenPyXL

在我们进入代码丛林之前，首先要在您的环境中预先安装Pandas。Python的包安装程序Pip可以简洁地完成此任务 –

pip install pandas

让我们首先概述我们的努力的通用算法 –

导入所需的Python库。
将CSV文件导入到DataFrame。
根据指定的列或条件对DataFrame进行排序。
（可选）将排序后的DataFrame存档到CSV文件。

示例

在以下示例中，我们将学习如何使用简单的pandas库对Excel中的日期进行排序。

import pandas as pd

data = {
   'Date': ['2023-01-01', '2022-01-01', '2023-05-01', '2022-05-01'],
   'Year': [2023, 2022, 2023, 2022],
   'Month': [1, 1, 5, 5],
   'Day': [1, 1, 1, 1],
   'Value': [100, 200, 150, 250]
}
df = pd.DataFrame(data)
print(df)

输出

Date  Year  Month  Day  Value
0  2023-01-01  2023      1    1    100
1  2022-01-01  2022      1    1    200
2  2023-05-01  2023      5    1    150
3  2022-05-01  2022      5    1    250

示例

sort_values(by=[‘Date’], key=pd.to_datetime)

这个技术包括三个关键步骤−

a. 导入pandas并将CSV文件导入DataFrame。
b. 使用sort_values整理DataFrame。括号内的’by’指定要排序的列（这里是’Date’），’key’将’Date’字符串转换为datetime对象，确保精确排序。
c. 万事俱备！现在你的DataFrame按照日期正确排序，从最早的开始，到最新的结束。

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Date' column
data.sort_values(by='Date', key=pd.to_datetime, inplace=True)

# Print the sorted DataFrame
print(data)

输出

Date  Value
1  2022-01-01    200
3  2022-05-01    250
0  2023-01-01    100
2  2023-05-01    150

sort_values(by=[‘Year’, ‘Month’, ‘Day’])

这个方法本质上是 –

通过引入pandas并将CSV数据导入DataFrame来开始。
使用sort_values方法来对DataFrame进行排序。这次我们按照一个列表中指定的多个列进行排序，列表在’by’下面。DataFrame首先按照’Year’排序，然后在每一年内按照’Month’排序，在每个月内再按照’Day’排序。
现在，你的DataFrame已经按照年、月和日进行了整齐的排序。

示例

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Year', 'Month' and 'Day' columns
data.sort_values(by=['Year', 'Month', 'Day'], inplace=True)

# Print the sorted DataFrame
print(data)

输出

Year  Month  Day  Value
1  2022      1    1    200
3  2022      5    1    250
0  2023      1    1    100
2  2023      5    1    150

sort_values(key = pd.to_datetime)

这种方法与其他方法稍有不同。下面是操作步骤 −

首先，导入pandas并将CSV数据导入到DataFrame中，与之前介绍的方法类似。
接下来，使用sort_values对DataFrame进行排序，但这次没有指定’by’，而是使用’key’将DataFrame索引转换为datetime对象（假设索引是日期），从而确保正确排序。
现在，您的DataFrame已按日期索引排序。

示例

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by the 'Date' column
data.sort_values(by='Date', key=pd.to_datetime, inplace=True)

# Set 'Date' as the DataFrame's index
data.set_index('Date', inplace=True)

# Print the sorted DataFrame
print(data)

输出

Date      Value    
2022-01-01    200
2022-05-01    250
2023-01-01    100
2023-05-01    150

sort_index()

此技术的操作方式如下 −

通过导入pandas并将CSV文件导入到DataFrame中进行启动。
使用sort_index方法根据索引对DataFrame进行排序。该方法不需要任何参数，因为它会自动按照DataFrame的索引进行排序。
您的DataFrame现在按照其索引进行排序。

示例

import pandas as pd

# Read the CSV file into a DataFrame
data = pd.read_csv('filename.csv')

# Sort the DataFrame by its index
data.sort_index(inplace=True)

# Print the sorted DataFrame
print(data)

输出

在所有的代码示例中，将’filename.csv’替换为你的CSV文件的实际路径。

结论

在本文中，我们详细介绍了使用Pandas处理Excel文件中日期顺序的方法。Python的多功能pandas库是一个高效的工具，可以根据各种标准（如列值或索引）对CSV文件中的数据进行排序。将排序后的数据存档到CSV文件中保留了结构化数据，以便进行进一步的探索或未来操作，从而展示了Python强大的数据处理能力。