Pandas 如何在filter方法中使用正则表达式检索系列对象的行

通过使用regex参数，我们可以将正则表达式应用于filter()方法，从而帮助检索系列对象的行。Pandas系列构造函数中series.filter()方法的基本工作是根据索引标签对系列对象的行进行子集划分。

参数regex用于定义用于检索结果行的搜索模式（正则表达式）。

示例1

在以下示例中，我们使用整数列表创建了一个系列对象，并使用pandas的数据范围函数创建了索引标签。

# importing pandas package
import pandas as pd

# create date index
index = pd.date_range('2021-08-1', periods=10, freq='10H30min40s')

#creating pandas Series with date-time index
series = pd.Series([1,2,3,4,5,6,7,8,9,10], index=index)

print(series)

print("Output: ")
# Apply the filter method with regex
print(series.filter(regex='40$'))

解释

在这里，我们通过使用regex参数来指定搜索模式，来过滤掉pandas series对象中的一些行。

输出

以下是输出结果 –

2021-08-01 00:00:00    1
2021-08-01 10:30:40    2
2021-08-01 21:01:20    3
2021-08-02 07:32:00    4
2021-08-02 18:02:40    5
2021-08-03 04:33:20    6
2021-08-03 15:04:00    7
2021-08-04 01:34:40    8
2021-08-04 12:05:20    9
2021-08-04 22:36:00    10
Freq: 37840S, dtype: int64

Output:
2021-08-01 10:30:40    2
2021-08-02 18:02:40    5
2021-08-04 01:34:40    8
Freq: 113520S, dtype: int64

我们可以注意到上面的输出块，我们已经成功地过滤掉了系列对象中索引为40秒的行。

示例2

让我们取另一个系列对象来筛选出索引标签中包含空格的行。

# importing pandas package
import pandas as pd

Countrys = ['Brazil','Canada','New Zealand','Iceland', 'India', 'Sri Lanka', 'United    States']
Capitals = [ 'Belmopan','Ottawa','Wellington','Reykjavik', 'New Delhi','Colombo',          'Washington D.C']

#creating pandas Series
series = pd.Series(Capitals, index=Countrys)

print(series)

print("Output: ")
# Apply the filter method with regex
print(series.filter(regex='. .'))

输出

以下是输出结果：

Brazil                 Belmopan
Canada                   Ottawa
New Zealand          Wellington
Iceland               Reykjavik
India                 New Delhi
Sri Lanka               Colombo
United States    Washington D.C
dtype: object

Output:
New Zealand        Wellington
Sri Lanka             Colombo
United States  Washington D.C
dtype: object

在上述输出中，我们成功地从系列对象中过滤出了“新西兰”、“斯里兰卡”、“美国”几行。