Pandas中的Pivot Table

在本文中，我们将介绍Pandas中的Pivot Table。Pivot Table是一种数据透视表的形式，可以从一个列数据集（dataframe）转换为一个新的表单，以提供更易于查看或更有用的格式的数据信息。

使用Pivot Table

使用Pivot Table方法旨在重构数据。数据的摆放方式会发生改变，但数据本身并不会随之改变。pivot_table()方法的常用参数如下：

index：用于设置索引列的列名或列名序列，默认索引就是操作中的索引。
values：用于设置汇总列的列名，而非汇总列将会被丢弃。
columns：用于重塑数据前的列名，它是可选的。
aggfunc：用于设置汇总函数，默认是np.mean，它指定了当数据在可能发生冲突或重叠的情况下，如何对数据进行合并处理。

让我们使用以下的示例数据集来说明pivot_table()的使用：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
                   'B': ['A', 'B', 'C'] * 4,
                   'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D': np.random.randn(12),
                   'E': np.random.randn(12)})
print(df)

# Output:
#         A  B    C         D         E
# 0     one  A  foo -0.054861 -0.753484
# 1     one  B  foo  0.293260  0.142676
# 2     two  C  foo -1.940675 -0.772475
# 3   three  A  bar  0.940166  0.531625
# 4     one  B  bar  0.230134  0.703728
# 5     one  C  bar -0.910558 -0.680082
# 6     two  A  foo -0.611765  0.566199
# 7   three  B  foo -0.423926  0.118055
# 8     one  C  foo -0.363204  1.185141
# 9     one  A  bar -0.078361  0.632515
# 10    two  B  bar  0.308853  0.089213
# 11  three  C  bar -1.854204  0.583568

我们将针对该数据集转换出汇总列为D的DataFrame表单。

table = pd.pivot_table(df, values='D', index=['A', 'B'],
               columns=['C'], aggfunc=np.sum)
print(table)

此后可以看到结果如下：

C          bar       foo
A     B                
one   A  -0.078361 -0.054861
      B   0.230134  0.293260
      C  -0.910558 -0.363204
three A   0.940166       NaN
      B        NaN -0.423926
      C  -1.854204       NaN
two   A        NaN -0.611765
      B   0.308853       NaN
      C        NaN -1.940675

结果集中，每一个单元格是模糊计数和，它是值和汇总函数的交叉。如果需要多个值，再加一个values参数即可。

sort_values列的排序

Pandas中的数据排序可以使用sort_values()方法，该方法可以完成DataFrame表单或指定列的升序或降序排列。

让我们使用以下示例数据集：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
                   'B': ['A', 'B', 'C'] * 4,
                   'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D': np.random.randn(12),
                   'E': np.random.randn(12)})
print(df)

# Output:
#         A  B    C         D         E
# 0     one  A  foo -0.054861 -0.753484
# 1     one  B  foo  0.293260  0.142676
# 2     two  C  foo -1.940675 -0.772475
# 3   three  A  bar  0.940166  0.531625
# 4     one  B  bar  0.230134  0.703728
# 5     one  C  bar -0.910558 -0.680082
# 6     two  A  foo -0.611765  0.566199
# 7   three  B  foo -0.423926  0.118055
# 8     one  C  foo -0.363204  1.185141
# 9     one  A  bar -0.078361  0.632515
# 10    two  B  bar  0.308853  0.089213
# 11  three  C  bar -1.854204  0.583568

我们可以将该数据集按D列的升序排列：

df_sort = df.sort_values(by='D')
print(df_sort)

输出结果如下：

#        A  B    C         D         E
# 2    two  C  foo -1.940675 -0.772475
# 11  three  C  bar -1.854204  0.583568
# 5    one  C  bar -0.910558 -0.680082
# 6    two  A  foo -0.611765  0.566199
# 7  three  B  foo -0.423926  0.118055
# 8    one  C  foo -0.363204  1.185141
# 0    one  A  foo -0.054861 -0.753484
# 9    one  A  bar -0.078361  0.632515
# 4    one  B  bar  0.230134  0.703728
# 10   two  B  bar  0.308853  0.089213
# 1    one  B  foo  0.293260  0.142676
# 3  three  A  bar  0.940166  0.531625

同时，我们也可以在排序中指定多列的优先级和升降序：

df_sort_multi = df.sort_values(by=['A', 'B', 'D'], ascending=[True, False, True])
print(df_sort_multi)

输出结果如下：

#        A  B    C         D         E
# 8    one  C  foo -0.363204  1.185141
# 5    one  C  bar -0.910558 -0.680082
# 4    one  B  bar  0.230134  0.703728
# 1    one  B  foo  0.293260  0.142676
# 0    one  A  foo -0.054861 -0.753484
# 9    one  A  bar -0.078361  0.632515
# 2    two  C  foo -1.940675 -0.772475
# 11  three  C  bar -1.854204  0.583568
# 10   two  B  bar  0.308853  0.089213
# 7  three  B  foo -0.423926  0.118055
# 3  three  A  bar  0.940166  0.531625
# 6    two  A  foo -0.611765  0.566199

总结

Pandas中的Pivot Table方法和sort_values方法都可以帮助我们更好地转换数据集和排序数据集，从而提高数据的可读性和可操作性，对于数据分析和统计来说非常有用。同时，Pandas还提供了很多其他的功能和方法，建议对Pandas的API文档进行详细了解和学习，以提高数据处理和分析的效率和准确性。