Pandas 系列中的align()方法是做什么的

pandas Series的align方法用于根据相同的行和/或列配置来对齐两个pandas series对象，可以通过指定join、axis等参数来实现。

与将两个series对象合并不同，pandas series的align方法以特定的顺序对齐它们。该方法接受10个参数，它们是 "other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None" 。其中other、join和axis参数非常重要，根据这些参数来确定输出的series对象的对齐方式。

示例1

import pandas as pd
s1 = pd.Series([8,4,2,1], index=[5,3,4,2])
s2 = pd.Series([15,12,10,11],index=[1,2,4,5])
print(s1)
print(s2)
a,b = s1.align(s2)
print("Output for align method")
print(a)
print(b)

解释

s1和s2是两个pandas系列对象，它们的索引标签分别为[1,2,4,5]和[2,3,4,5]。我们在这两个系列对象上使用了align方法，没有任何参数，我们得到了另外两个系列对象作为align方法的输出结果。

输出

5    8
3    4
4    2
2    1
dtype: int64
1    15
2    12
4    10
5    11
dtype: int64

Output of align method without any parameter.

1    NaN
2    1.0
3    4.0
4    2.0
5    8.0
dtype: float64
1    15.0
2    12.0
3    NaN
4    10.0
5    11.0
dtype: float64

上述的4个Series对象是s1、s2、a和b。顶部的两个对象是s1和s2，底部的两个对象是使用pandas series align方法默认参数生成的。

s1中的索引标签已经重新排列，以与s2中的索引对齐。

s1中添加了一个标签为’1’的索引，s2中添加了一个标签为’3’的索引。这些值被填充为NaN。这是因为默认的join参数是在索引标签上进行的外连接。

示例2

import pandas as pd
s1 = pd.Series([8,4,2,1], index=[5,3,4,2])
s2 = pd.Series([15,12,10,11],index=[1,2,4,5])
print(s1)
print(s2)
a,b = s1.align(s2, join='right')
print("Output of align method with join parameter.")
print(a)
print(b)

说明

现在我们在上面的示例中使用了“right”选项来应用连接参数。观察下面输出块中的差异。

输出

5    8
3    4
4    2
2    1
dtype: int64
1    15
2    12
4    10
5    11
dtype: int64

Output of align method with join parameter.

1    NaN
2    1.0
4    2.0
5    8.0
dtype: float64
1    15
2    12
4    10
5    11
dtype: int64

只保留“right”系列对象（s2）中找到的行。而索引标签“3”不再存在。这是因为我们在系列对象上进行了右连接。