如何基于条件在Pandas DataFrame中选择行|极客笔记

如何基于条件在Pandas DataFrame中选择行

在本教程中，我们将学习如何使用Python基于条件选择Pandas DataFrame中的行。

用户可以使用’ >’, ‘=’, ‘<=’, ‘>=’, ‘！=’运算符基于特定列的值选择行。

条件

我们将讨论可以应用于Pandas DataFrame的不同条件。

条件1

使用基本方法选择DataFrame中“百分比”大于70的所有行。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subject_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

# Then we will select rows based on condition
result_DataFrame = Data_Frame[Data_Frame['Percentage_1'] > 70]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
5    John     24          ADS            78
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

条件2

使用“ loc[] ”方法，从DataFrame中选择所有’Percentage’大于70的行。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

# Then we will select rows based on condition, That is, Using loc[] method
result_DataFrame = Data_Frame.loc[Data_Frame['Percentage_1'] > 70]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
5    John     24          ADS            78
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

条件3

使用“ loc[] ”方法，选择DataFrame中“Percentage”不等于71的所有行。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

# Then we will select rows based on condition, That is, Using loc[] method
result_DataFrame = Data_Frame.loc[Data_Frame['Percentage_1'] != 71]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
    Name_1  Age_1 Subjects_1  Percentage_1
0    Anuj     23       DBMS            88
1    Ashu     24        ADS            62
2   Yashi     21       ASPM            85
4  Joshua     21       MFCS            55
5    John     24        ADS            78
6     Ray     25       ASPM            70
7   Lilly     22        TOC            66
9  Rachel     22       OOPS            89

现在，我们将学习如何使用DataFrame的”isin()”函数来选择那些列值在列表中存在的行。

条件4

使用基本方法，从给定的DataFrame中选择所有列值为” Subjects_1 “的行，这些行在” Subjects_2 “列表中存在。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

Subjects_2 = ['ASPM', 'ADS', 'TOC']

# Then we will select rows based on condition, That is, Using isin[] method
result_DataFrame = Data_Frame[Data_Frame['Subjects_1'].isin(Subjects_2)]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
   Name_1  Age_1 Subjects_1  Percentage_1
1   Ashu     24        ADS            62
2  Yashi     21       ASPM            85
5   John     24        ADS            78
6    Ray     25       ASPM            70
7  Lilly     22        TOC            66

条件5

选择给定数据框中所有行，其中“ Subjects_1 ”列的值在“ Subjects_2 ”列表中出现，并使用“ loc[] ”方法。

代码:

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

Subjects_2 = ['ASPM', 'ADS', 'TOC']

# Then we will select rows based on condition, That is, Using isin[] method
result_DataFrame = Data_Frame.loc[Data_Frame['Subjects_1'].isin(Subjects_2)]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
   Name_1  Age_1 Subjects_1  Percentage_1
1   Ashu     24        ADS            62
2  Yashi     21       ASPM            85
5   John     24        ADS            78
6    Ray     25       ASPM            70
7  Lilly     22        TOC            66

条件6

使用 loc[] 方法，从给定的DataFrame中选择所有行，其中列值“ Subjects_1 ”不在“ Subjects_2 ”列表中。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 24, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 62, 85, 71, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

Subjects_2 = ['ASPM', 'ADS', 'TOC']

# Then we will select rows based on condition, That is, Using isin[] method
result_DataFrame = Data_Frame.loc[~Data_Frame['Subjects_1'].isin(Subjects_2)]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出:

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     24          ADS            62
2   Yashi     21         ASPM            85
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
3    Mark     19          BCM            71
4  Joshua     21         MFCS            55
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

现在，我们将学习如何使用“&”运算符根据多列条件选择行。

条件7

从给定的DataFrame中选择所有行，其中“ Percentage_1 ”等于“ 71 ”并且“ Subject_1 ”存在于“ Subject_2 ”列表中，使用基本方法。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 21, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 71, 71, 82, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

Subjects_2 = ['ASPM', 'ADS', 'TOC']

# Then we will select rows based on condition, That is, Using isin[] method
result_DataFrame = Data_Frame[(Data_Frame['Percentage_1'] == 71) &
                              Data_Frame['Subjects_1'].isin(Subjects_2)]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     21          ADS            71
2   Yashi     21         ASPM            71
3    Mark     19          BCM            82
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
   Name_1  Age_1 Subjects_1  Percentage_1
1   Ashu     21        ADS            71
2  Yashi     21       ASPM            71

条件8

从给定的DataFrame中选择所有“Percentage_1”等于“ 71 ”且“ Subject_1 ”出现在“ Subject_2 ”列表中的行，使用“ loc[] ”方法。

代码：

# First, import pandas
import pandas as pnd
record_1 = {

 'Name_1': ['Anuj', 'Ashu', 'Yashi', 'Mark', 'Joshua', 'John', 'Ray', 'Lilly', 'Rose', 'Rachel' ],
 'Age_1': [23, 21, 21, 19, 21, 24, 25, 22, 23, 22],
 'Subjects_1': ['DBMS', 'ADS', 'ASPM', 'BCM', 'MFCS', 'ADS', 'ASPM', 'TOC', 'Data Mining', 'OOPS'],
 'Percentage_1': [88, 71, 71, 82, 55, 78, 70, 66, 71, 89] }

# Now, we are creating a dataframe
Data_Frame = pnd.DataFrame(record_1, columns = ['Name_1', 'Age_1', 'Subjects_1', 'Percentage_1'])

print("Given DataFrame: \n", Data_Frame) 

Subjects_2 = ['ASPM', 'ADS', 'TOC']

# Then we will select rows based on condition, That is, Using isin[] method
result_DataFrame = Data_Frame.loc[(Data_Frame['Percentage_1'] == 71) &
                              Data_Frame['Subjects_1'].isin(Subjects_2)]

print('\nFollowing is the Result DataFrame: \n', result_DataFrame)

输出：

Given DataFrame: 
    Name_1  Age_1   Subjects_1  Percentage_1
0    Anuj     23         DBMS            88
1    Ashu     21          ADS            71
2   Yashi     21         ASPM            71
3    Mark     19          BCM            82
4  Joshua     21         MFCS            55
5    John     24          ADS            78
6     Ray     25         ASPM            70
7   Lilly     22          TOC            66
8    Rose     23  Data Mining            71
9  Rachel     22         OOPS            89

Following is the Result DataFrame: 
   Name_1  Age_1 Subjects_1  Percentage_1
1   Ashu     21        ADS            71
2  Yashi     21       ASPM            71