Pandas 如何合并数据框
Pandas是一个开源的Python库,提供高性能的数据操作和分析工具,使用其强大的数据结构。在Pandas中,数据框是一种二维数据结构,即数据以行和列的表格形式对齐。
在本文中,我们将看到如何在Python中合并数据框。我们将使用merge()方法。以下是语法:
dataframe.merge(right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
这里,
参数 | 值 | 描述 |
---|---|---|
right | 要合并的DataFrame或Series | |
how | ‘left’ ‘right’ ‘outer’ ‘inner’: 默认 ‘cross’ | 合并方式 |
on | 字符串 列表 | 进行合并的级别 |
left_on | 字符串 列表 | 进行合并的DataFrame左侧的级别 |
right_on | 字符串 列表 | 进行合并的DataFrame右侧的级别 |
left_index | True False | 是否将左侧DataFrame的索引作为连接键 |
right_index | True False | 是否将右侧DataFrame的索引作为连接键 |
sort | True False | 是否按连接键对DataFrame进行排序 |
suffixes | 列表 | 用于重叠列的字符串列表 |
copy | True False |
使用merge()方法合并数据帧,使用右侧数据帧的键
要合并数据帧,我们将使用merge()方法。how参数的右值仅使用右侧数据帧的键,类似于SQL的右外连接。
示例
import pandas as pd
# Create Dictionaries
dct1 = {'Player':['Steve','David'], 'Age':[29, 25,]}
dct2 = {'Player':['Steve','Kane'], 'Age':[31, 27]}
# Create DataFrame from Dictionary elements using pandas.dataframe()
df1 = pd.DataFrame(dct1)
df2 = pd.DataFrame(dct2)
print("DataFrame1 = \n",df1)
print("\nDataFrame2 = \n",df2)
# Combining DataFrames using the merge() method
res = df1.merge(df2, how='right')
print("\nCombined DataFrames = \n",res)
输出
DataFrame1 =
Player Age
0 Steve 29
1 David 25
DataFrame2 =
Player Age
0 Steve 31
1 Kane 27
Combined DataFrames =
Player Age
0 Steve 31
1 Kane 27
使用merge()方法从左侧数据框中使用键合并数据框
为了合并数据框,我们将使用merge()方法。how参数的左值仅使用左侧数据框中的键,类似于SQL的左外连接。
示例
import pandas as pd
# Create Dictionaries
dct1 = {'Player':['Steve','David'], 'Age':[29, 25,]}
dct2 = {'Player':['Steve','Kane'], 'Age':[31, 27]}
# Create DataFrame from Dictionary elements using pandas.dataframe()
df1 = pd.DataFrame(dct1)
df2 = pd.DataFrame(dct2)
print("DataFrame1 = \n",df1)
print("\nDataFrame2 = \n",df2)
# Combining DataFrames using the merge() method
# The how parameter is set to left
res = df1.merge(df2, how='left')
print("\nCombined DataFrames = \n",res)
输出
DataFrame1 =
Player Age
0 Steve 29
1 David 25
DataFrame2 =
Player Age
0 Steve 31
1 Kane 27
Combined DataFrames =
Player Age
0 Steve 29
1 David 25
从两个数据框中合并具有键的并集
要合并数据框,我们将使用merge()方法。how参数的外部取值使用两个框架的键的并集,类似于SQL的全外连接。
示例
import pandas as pd
# Create Dictionaries
dct1 = {'Player':['Steve','David'], 'Age':[29, 25,]}
dct2 = {'Player':['Steve','Kane'], 'Age':[31, 27]}
# Create DataFrame from Dictionary elements using pandas.dataframe()
df1 = pd.DataFrame(dct1)
df2 = pd.DataFrame(dct2)
print("DataFrame1 = \n",df1)
print("\nDataFrame2 = \n",df2)
# Combining DataFrames using the merge() method
# The how parameter is set to outer i.e.
res = df1.merge(df2, how='outer')
print("\nCombined DataFrames = \n",res)
输出
DataFrame1 =
Player Age
0 Steve 29
1 David 25
DataFrame2 =
Player Age
0 Steve 31
1 Kane 27
Combined DataFrames =
Player Age
0 Steve 29
1 David 25
2 Steve 31
3 Kane 27
从两个数据帧中使用键的交集合并数据帧
要合并数据帧,我们将使用merge()方法。how参数的内部值使用两个数据帧的键的交集,类似于SQL的内连接。
示例
import pandas as pd
# Create Dictionaries
dct1 = {'Player':['Steve','David'], 'Age':[29, 25,]}
dct2 = {'Player':['Steve','Kane'], 'Age':[31, 27]}
# Create DataFrame from Dictionary elements using pandas.dataframe()
df1 = pd.DataFrame(dct1)
df2 = pd.DataFrame(dct2)
print("DataFrame1 = \n",df1)
print("\nDataFrame2 = \n",df2)
# Combining DataFrames using the merge() method
# The how parameter is set to inner
res = df1.merge(df2, how='inner')
print("\nCombined DataFrames = \n",res)
输出
DataFrame1 =
Player Age
0 Steve 29
1 David 25
DataFrame2 =
Player Age
0 Steve 31
1 Kane 27
Combined DataFrames =
Empty DataFrame
Columns: [Player, Age]
Index: []
从两个数据框中进行笛卡尔乘积合并
要合并数据框,我们将使用merge()方法。how参数的交叉值会从两个框架中创建笛卡尔乘积:
示例
import pandas as pd
# Create Dictionaries
dct1 = {'Player':['Steve','David'], 'Age':[29, 25,]}
dct2 = {'Player':['Steve','Kane'], 'Age':[31, 27]}
# Create DataFrame from Dictionary elements using pandas.dataframe()
df1 = pd.DataFrame(dct1)
df2 = pd.DataFrame(dct2)
print("DataFrame1 = \n",df1)
print("\nDataFrame2 = \n",df2)
# Combining DataFrames using the merge() method
# The how parameter is set to cross i.e. cartesian product
res = df1.merge(df2, how='cross')
print("\nCombined DataFrames = \n",res)
输出
DataFrame1 =
Player Age
0 Steve 29
1 David 25
DataFrame2 =
Player Age
0 Steve 31
1 Kane 27Combined DataFrames =
Player_x Age_x Player_y Age_y
0 Steve 29 Steve 31
1 Steve 29 Kane 27
2 David 25 Steve 31
3 David 25 Kane 27