如何在Pandas中获取两列之间的相关性
我们可以使用.corr()方法来获取Pandas中两列之间的相关性。让我们举个例子,看看如何应用这个方法。
步骤
- 创建一个二维的、可变大小的、可能是异构的表格数据 df 。
- 打印输入的DataFrame df 。
- 初始化两个变量 col1 和 col2 ,并将它们分配给你想要找到相关性的列。
- 使用df[col1].corr(df[col2])找到 col1 和 col2 之间的相关性,并将相关性值保存在变量corr中。
- 打印相关性值corr。
示例
import pandas as pd
df = pd.DataFrame(
{
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
}
)
print "Input DataFrame is:\n", df
col1, col2 = "x", "y"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "x", "x"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "x", "z"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "y", "x"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
输出
Input DataFrame is:
x y z
0 5 4 9
1 2 7 3
2 7 5 5
3 0 1 1
Correlation between x and y is: 0.41
Correlation between x and x is: 1.0
Correlation between x and z is: 0.72
Correlation between y and x is: 0.41