在 Pandas DataFrame 中将列类型从字符串转换为日期格式

在 Pandas DataFrame 中将列类型从字符串转换为日期格式

当我们在Python的Pandas DataFrame中处理数据时,经常会遇到时间序列数据。Pandas是一个强大的工具,可以处理Python中的时间序列数据,我们可能需要将给定数据集中的字符串转换为日期时间格式。

在本教程中,我们将学习如何将字符串转换为日期时间格式,格式为”dd/mm/yy”。如果日期不符合所需格式,用户将无法对日期执行任何基于时间序列的操作。为了解决这个问题,我们需要将日期转换为所需的日期时间格式。

在Python中转换数据类型格式的不同方法

在本节中,我们将讨论可以用于将Pandas DataFrame列的数据类型从字符串转换为日期时间的不同方法:

方法1:使用pandas.to_datetime()函数

在这种方法中,我们将使用”pandas.to_datetime()”函数来将Pandas DataFrame列的数据类型进行转换。

示例:

import pandas as pnd

# Creating the dataframe
data_frame = pnd.DataFrame({'Date':['12/05/2021', '11/21/2018', '01/12/2020'],
                'Event':['Music- Dance', 'Poetry- Songs', 'Theatre- Drama'],
                'Cost':[15400, 7000, 25000]})

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# Here, we are checking the data type of the 'Date' column
data_frame.info()

输出:

The data is: 
         Date           Event                     Cost
0  12/05/2021    Music- Dance  15400
1  11/21/2018   Poetry- Songs   7000
2  01/12/2020  Theatre- Drama  25000

RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Date    3 non-null      object 

1   Event   3 non-null      object
 2   Cost    3 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

在输出中,我们可以看到DataFrame中”Date”列的数据类型是”object”,这意味着它是一个字符串。现在,我们将使用”pnd.to_datetime()”函数将数据类型转换为日期时间格式:

import pandas as pnd

# Creating the dataframe
data_frame = pnd.DataFrame({'Date':['12/05/2021', '11/21/2018', '01/12/2020'],
                'Event':['Music- Dance', 'Poetry- Songs', 'Theatre- Drama'],
                'Cost':[15400, 7000, 25000]})

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# For converting the 'Date' column of DataFrame into datetime format
data_frame['Date'] = pnd.to_datetime(data_frame['Date'])

# Here, we are checking the data type of the 'Date' column
data_frame.info()

输出:

The data is: 
         Date           Event                     Cost
0  12/05/2021    Music- Dance  15400
1  11/21/2018   Poetry- Songs   7000
2  01/12/2020  Theatre- Drama  25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
 ---  ------  --------------  -----         
 0   Date    3 non-null      datetime64[ns] 

1   Event   3 non-null      object        
 2   Cost    3 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes

现在,我们可以看到DataFrame中“Data”列的格式已经被更改为日期时间格式。

方法2:使用DataFrame.astype()函数。

在这种方法中,我们将使用“DataFrame.astype()”函数将Pandas DataFrame列的数据类型进行转换。

示例:

import pandas as pnd

# Creating the dataframe
data_frame = pnd.DataFrame({'Date':['12/05/2021', '11/21/2018', '01/12/2020'],
                'Event':['Music- Dance', 'Poetry- Songs', 'Theatre- Drama'],
                'Cost':[15400, 7000, 25000]})

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# Here, we are checking the data type of the 'Date' column
data_frame.info()

输出:

The data is: 
         Date           Event                     Cost
0  12/05/2021    Music- Dance  15400
1  11/21/2018   Poetry- Songs   7000
2  01/12/2020  Theatre- Drama  25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Date    3 non-null      object 

1   Event   3 non-null      object
 2   Cost    3 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

在输出中,我们可以看到DataFrame的”Date”列的数据类型是”object”,这意味着它是一个字符串。现在,我们将使用”Data_Frame.astype()”函数将数据类型转换为日期时间格式:

import pandas as pnd

# Creating the dataframe
data_frame = pnd.DataFrame({'Date':['12/05/2021', '11/21/2018', '01/12/2020'],
                'Event':['Music- Dance', 'Poetry- Songs', 'Theatre- Drama'],
                'Cost':[15400, 7000, 25000]})

# Print the dataframe
print ("The data is: ") 
print (data_frame)
# For converting the 'Date' column of DataFrame into datetime format
data_frame['Date'] = data_frame['Date'].astype('datetime64[ns]')

# Here, we are checking the data type of the 'Date' column
data_frame.info()

输出:

The data is: 
         Date           Event   Cost
0  12/05/2021    Music- Dance  15400
1  11/21/2018   Poetry- Songs   7000
2  01/12/2020  Theatre- Drama  25000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    3 non-null      datetime64[ns] 

1   Event   3 non-null      object        
 2   Cost    3 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes

现在,通过使用data_frame [‘Date’] .astype(’datetime64 [ns])。可以看到DataFrame中“Data”列的格式已更改为datetime格式。

方法3:

假设我们在DataFrame列中有一个以“yymmdd”格式的日期,并且我们必须将其从字符串转换为datetime格式。

示例:

import pandas as pnd

# Now, we will initialize the nested list with Dataset
play_list = [['210302', 67000], ['210901', 62000], ['210706', 61900],
            ['210402', 59000], ['210802', 74000], 
            ['210804', 54050], ['210109', 57650], ['210509', 67300], ['210209', 76600]]

# Creating a pandas DataFrame
data_frame = pnd.DataFrame(play_list,columns = ['Date','Patient Number'])

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# Here, we are checking the data type of the 'Date' column
print (data_frame.dtypes)

输出:

The data is: 
     Date          Patient Number
0  210302           67000
1  210901           62000
2  210706           61900
3  210402           59000
4  210802           74000
5  210804           54050
6  210109           57650
7  210509           67300
8  210209           76600
 Date              object 

Patient Number     int64
dtype: object

在输出中,我们可以看到DataFrame中“Date”列的数据类型是“object”,这意味着它是字符串。现在,我们将使用“data_frame[‘Date’] = pnd.to_datetime(data_frame[‘Date’], format = ‘%y%m%d’)”函数将数据类型转换为日期时间格式。

import pandas as pnd

# Now, we will initialize the nested list with Dataset
play_list = [['210302', 67000], ['210901', 62000], ['210706', 61900], 
            ['210402', 59000], ['210802', 74000], 
            ['210804', 54050], ['210109', 57650], ['210509', 67300], ['210209', 76600]]

# creating a pandas dataframe
data_frame = pnd.DataFrame(play_list,columns = ['Date','Patient Number'])

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# For converting the 'Date' column of DataFrame into datetime format
data_frame['Date'] = pnd.to_datetime(data_frame['Date'], format = '%y%m%d')

# Here, we are checking the data type of the 'Date' column
print (data_frame.dtypes)

输出:

The data is: 
     Date         Patient Number
0  210302           67000
1  210901           62000
2  210706           61900
3  210402           59000
4  210802           74000
5  210804           54050
6  210109           57650
7  210509           67300
8  210209           76600
 Date              datetime64[ns] 

Patient Number             int64
dtype: object

在上面的代码中,我们使用”pnd.to_datetime(data_frame[‘Date’], format = ‘%y%m%d’)”函数将列”Date”的数据类型由”object”更改为”datetime64[ns]”。

方法4:

我们可以使用”pandas.to_datetime()”函数将多个列从”string”格式转换为”datetime”格式,即”YYYYMMDD”格式。

# Initializing the nested list with Data set
Dataset_list = [['20210612', 54000, '20210812'], 
               ['20210814', 65000, '20210614'], 
               ['20210316', 71500, '20210316'], 
               ['20210519', 45000, '20210119'], 
               ['20210221', 98000, '20210221'], 
               ['20210124', 23000, '20210724'], 
               ['20210929', 12000, '20210924']] 

# creating a pandas dataframe
data_frame = pnd.DataFrame(
  Dataset_list, columns = ['Treatment_starting_Date',
                         'Patients Number',
                         'Treatment_ending_Date'])

# Print the dataframe
print ("The data is: ") 
print (data_frame)

# Here, we are checking the data type of the 'Date' column
print (data_frame.dtypes)

输出:

The data is: 
  Treatment_starting_Date   Patients Number     Treatment_ending_Date
0   20210612                54000                20210812
1   20210814                65000                20210614
2   20210316                71500                20210316
3   20210519                45000                20210119
4   20210221                98000                20210221
5   20210124                23000                20210724
6   20210929                12000                20210924
 Treatment_starting_Date    object 

Patients Number             int64
 Treatment_ending_Date      object 

dtype: object

在这里,输出中我们可以看到DataFrame中“Date”列的数据类型是“object”,这意味着它是一个字符串。现在,我们将使用“pnd.to_datetime(data_frame[”], format = ‘%y%m%d’)”函数将“Date”列的数据类型转换为日期时间格式。

import pandas as pnd

# Initializing the nested list with Data set
Dataset_list = [['20210612', 54000, '20210812'],
               ['20210814', 65000, '20210614'],
               ['20210316', 71500, '20210316'],
               ['20210519', 45000, '20210119'],
               ['20210221', 98000, '20210221'],
               ['20210124', 23000, '20210724'],
               ['20210929', 12000, '20210924']]

# creating a pandas dataframe
data_frame = pnd.DataFrame(
  Dataset_list, columns = ['Treatment_starting_Date',
                         'Patients Number',
                         'Treatment_ending_Date'])

# Print the dataframe
print ("The data is: ") 
print (data_frame)


# For converting the multiple columns of DataFrame into datetime format
data_frame['Treatment_starting_Date'] = pnd.to_datetime(
                          data_frame['Treatment_starting_Date'],
                          format = '%Y%m%d'
)
data_frame['Treatment_ending_Date'] = pnd.to_datetime(
                          data_frame['Treatment_ending_Date'],
                          format = '%Y%m%d'
)

# Here, we are checking the data type of the 'Date' column
print (data_frame.dtypes)

输出:

The data is: 
  Treatment_starting_Date  Patients Number Treatment_ending_Date
0                20210612            54000              20210812
1                20210814            65000              20210614
2                20210316            71500              20210316
3                20210519            45000              20210119
4                20210221            98000              20210221
5                20210124            23000              20210724
6                20210929            12000              20210924
 Treatment_starting_Date    datetime64[ns] 

Patients Number                     int64
 Treatment_ending_Date      datetime64[ns] 

dtype: object

在上面的输出中,我们可以看到通过使用”pnd.to_datetime()”函数,”Treatment_starting_Date”和”Treatment_ending_Date”的数据类型已被更改为datetime格式。

结论

在本教程中,我们学习了将Pandas DataFrame的列类型从字符串转换为datetime的不同方法。

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程