Pandas 如何在Python中创建数据透视表

数据透视表是一种强大的数据分析工具，它允许您根据不同的维度对数据进行总结和汇总。在Python中，您可以使用pandas库创建数据透视表，该库提供了灵活和高效的数据操作和分析工具。

要在pandas中创建数据透视表，首先需要将数据集加载到pandas DataFrame中。您可以从各种来源加载数据，例如CSV文件、Excel电子表格、SQL数据库等。

一旦将数据加载到DataFrame中，就可以使用pandas的pivot_table()函数创建数据透视表。以下是它的语法−

dataframe.pivot(self, index=None, columns=None, values=None, aggfunc)

pivot_table()函数接受几个参数，包括要使用的DataFrame、索引列、作为透视表列的列以及要聚合的值列。您还可以指定要使用的聚合函数，如sum、mean、max、min等。

在深入研究透视和pivot_table()函数之前，让我们先创建一个我们将使用的数据帧。

Pandas中的DataFrame

在pandas中，DataFrame是一个二维带标签的数据结构，其中的列可能具有不同的类型。它是pandas中用于数据操作和分析的主要数据结构。

DataFrame可以被看作是一个电子表格或SQL表，具有行和列。它可以轻松处理和操作数据，包括索引、选择、过滤、合并和分组。

考虑下面的代码。这段代码使用Python字典创建了一个名为df的DataFrame对象，具有四列’Product’、’Category’、’Quantity’和’Amount’。字典的每个键对应于列的名称，其值是一个包含该列值的列表。

示例

# importing pandas library
import pandas as pd

# creating a dataframe from a dictionary

# creating a column 'Product', 'Category', 'Quantity','Amount' with its values
df = pd.DataFrame({
   'Product': ['Litchi', 'Broccoli', 'Banana', 'Banana', 'Beans', 'Orange', 'Mango', 'Banana'],
   'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Fruit'],
   'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
   'Amount': [270, 239, 617, 384, 626, 610, 62, 90]
})

# print the dataframe
print(df)

输出

当你执行这段代码时，它会在终端上产生以下输出：

Product  Category  Quantity Amount
0  Litchi   Fruit      8       270
1  Broccoli Vegetable  5       239
2  Banana   Fruit      3       617
3  Banana   Fruit      4       384
4  Beans    Vegetable  5       626
5  Orange   Fruit      9       610
6  Mango    Fruit      11      62
7  Banana   Fruit       8      90

使用Pandas创建数据透视表

现在让我们使用pivot_table()函数来创建总销售额的数据透视表。请考虑下面显示的代码。

示例

# importing pandas library
import pandas as pd

# creating a dataframe from a dictionary

# creating a column 'Product', 'Category', 'Quantity','Amount' with its values
df = pd.DataFrame({
   'Product': ['Litchi', 'Broccoli', 'Banana', 'Banana', 'Beans', 'Orange', 'Mango', 'Banana'],
   'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Fruit'],
   'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
   'Amount': [270, 239, 617, 384, 626, 610, 62, 90]
})

# creating pivot table of total sales

# product-wise
pivot = df.pivot_table(index =['Product'], values =['Amount'], aggfunc ='sum')
print(pivot)

# print the dataframe
print(df)

解释

它创建了一个名为df的DataFrame对象，具有四列：’Product’、’Category’、’Quantity’和’Amount’。每一列都有自己的值，并且它们是使用Python字典创建的。
然后，代码创建了一个数据透视表，通过产品对销售数据进行分组，并使用pivot_table()函数计算每个产品的总销售额。
最后，将数据透视表打印到控制台，以显示每个产品的总销售数据，并将原始DataFrame打印到控制台，以显示生成数据透视表的原始数据。

输出

在执行时，您将在终端上获得以下输出−

Product  Amount
Banana    1091
Beans     626
Broccoli  239
Litchi    270
Mango     62
Orange    610 
  Product  Category  Quantity Amount
0  Litchi   Fruit      8       270
1  Broccoli Vegetable  5       239
2  Banana   Fruit      3       617
3  Banana   Fruit      4       384
4  Beans    Vegetable  5       626
5  Orange   Fruit      9       610
6  Mango    Fruit      11      62
7  Banana   Fruit       8      90