Pandas DataFrame中找到列的百分位数排名

找到百分位数排名是一个常见的操作，用于比较单个数据集中的数据。这个操作的结果显示了一个特定的百分比大于或等于指定的百分位数。例如，假设一个学生的成绩大于或等于其他所有成绩的80%。那么，这个学生的百分位数排名就是80th。

要在Pandas DataFrame中找到列的百分位数排名，我们可以使用Python提供的名为“rank()”和“percentile()”的内置方法。

Python程序查找Pandas中列的百分位数排名

在进一步之前，让我们先熟悉一下Pandas DataFrame。它是一个开源的Python库，主要用于数据分析和操作。它可以通过对指定数据执行各种操作，如清理、过滤、分组、聚合和合并，来处理关系数据和标签数据。

现在，是时候深入到示例程序中了。

示例1

在下面的示例中，我们将使用内置方法“percentile()”来计算百分位数排名。

方法

首先导入pandas和numpy包。
创建一个名为“df”的DataFrame，包含两个列“Name”和“Score”。
接下来，使用“percentile()”方法来计算百分位数排名。我们将直接将这个方法应用于“Score”列，将列本身作为数据数组和期望的百分位数传递给它。它还接受一个可选参数“method”，用于指定当期望的百分位数介于两个数据点之间时要使用的插值方法。在这种情况下，它设置为“nearest”，这意味着将返回最近的等级。
最后，将结果百分位数分配给一个名为“Per_Rank”的新列，并使用“print()”方法显示结果。

# importing packages
import pandas as pd
import numpy as np
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Score': [75, 82, 68, 90, 88] }
df = pd.DataFrame(data)
# Calculating the percentile rank using numpy
df['Per_Rank'] = np.percentile(df['Score'], df['Score'], method = 'nearest')
# to show the result
print(df)

输出

Name  Score  Per_Rank
0    Ram     75        88
1  Shyam     82        88
2  Shrey     68        88
3  Mohan     90        90
4  Navya     88        90

示例2

以下示例说明了使用’rank()’方法来查找百分位排名。

步骤

首先，使用引用名称’pd’导入pandas包。
创建一个包含两列’Name’和’Score’的Pandas DataFrame。
接下来，创建一个名为’percentile_rank()’的用户定义方法，以及一个名为’column’的参数。在这个方法内部，使用内置方法’rank()’，并将’pct’参数设置为True，以便返回列的百分位排名。
现在，通过将df[‘Score’]作为参数传递给’percentile_rank()’方法，将结果存储到名为’Per_Rank’的新列中。
最后，使用’print()’方法显示结果并退出。

# importing the required package
import pandas as pd
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Score': [55, 92, 68, 70, 88] }
df = pd.DataFrame(data)
# user-defined method Calculating the percentile rank
def percentile_rank(column):
   return column.rank(pct = True)
# calling the user-defined method
df['Per_Rank'] = percentile_rank(df['Score'])
# to show the result
print(df)

输出

Name  Score  Per_Rank
0    Ram     55       0.2
1  Shyam     92       1.0
2  Shrey     68       0.4
3  Mohan     70       0.6
4  Navya     88       0.8

示例3

在这个示例中，我们将通过定义一个名为“Balance”的新列并对其应用rank()方法来修改前一个示例中的代码，而不是对“Score”列进行操作。

# importing the required package
import pandas as pd
# defining a sample DataFrame using pandas
data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'],
      'Balance': [5500, 9200, 6800, 7000, 8800]}
df = pd.DataFrame(data)
# user-defined method Calculating the percentile rank
def percentile_rank(column):
   return column.rank(pct = True)
# calling the user-defined method
df['Per_Rank'] = percentile_rank(df['Balance'])
# to show the result
print(df)

输出

Name  Balance  Per_Rank
0    Ram     5500       0.2
1  Shyam     9200       1.0
2  Shrey     6800       0.4
3  Mohan     7000       0.6
4  Navya     8800       0.8