在Python中实现Apriori算法

在Python中实现Apriori算法

Apriori算法是一种机器学习算法,用于了解各种产品之间的关系模式。该算法最常用的用途是根据用户购物车中已有的商品来推荐商品。沃尔玛特别利用该算法向用户推荐商品。

数据集:Groceries数据

在Python中实现算法

步骤1:导入所需的库

import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

步骤2:加载和探索数据

# Now, we will load the Data
data1 = pnd.read_excel('Online_Retail.xlsx')
data1.head()

输入:

# here, we will explore the columns of the data
data1.columns

输出:

Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      Dtype = 'object')

输入:

# Now, we will explore the different regions of transactions
data1.Country.unique()

输出:

array(['United Kingdom', 'France', 'Australia', 'Netherlands', 'Germany',
       'Norway', 'EIRE', 'Switzerland', 'Spain', 'Poland', 'Portugal',
       'Italy', 'Belgium', 'Lithuania', 'Japan', 'Iceland',
       'Channel Islands', 'Denmark', 'Cyprus', 'Sweden', 'Austria',
       'Israel', 'Finland', 'Bahrain', 'Greece', 'Hong Kong', 'Singapore',
       'Lebanon', 'United Arab Emirates', 'Saudi Arabia',
       'Czech Republic', 'Canada', 'Unspecified', 'Brazil', 'USA',
       'European Community', 'Malta', 'RSA'], dtype = object)

步骤3:清洁数据

# here, we will strip the extra spaces in the description
data1['Description'] = data1['Description'].str.strip()

# Now, drop the rows which does not have any invoice number
data1.dropna(axis = 0, subset = ['InvoiceNo'], inplace = True)
data1['InvoiceNo'] = data1['InvoiceNo'].astype('str')

# Now, we will drop all transactions which were done on credit
data1 = data1[~data1['InvoiceNo'].str.contains('C')]

步骤4:根据交易区域拆分数据

# Transactions done in France
basket1_France = (data1[data1['Country'] == "France"]
        .groupby(['InvoiceNo', 'Description'])['Quantity']
        .sum().unstack().reset_index().fillna(0)
        .set_index('InvoiceNo'))

# Transactions done in the United Kingdom
basket1_UK = (data1[data1['Country'] == "United Kingdom"]
        .groupby(['InvoiceNo', 'Description'])['Quantity']
        .sum().unstack().reset_index().fillna(0)
        .set_index('InvoiceNo'))

# Transactions done in Portugal
basket1_Por = (data1[data1['Country'] == "Portugal"]
        .groupby(['InvoiceNo', 'Description'])['Quantity']
        .sum().unstack().reset_index().fillna(0)
        .set_index('InvoiceNo'))

basket1_Sweden = (data1[data1['Country'] == "Sweden"]
        .groupby(['InvoiceNo', 'Description'])['Quantity']
        .sum().unstack().reset_index().fillna(0)
        .set_index('InvoiceNo'))

步骤5:对数据进行热编码

# Here, we will define the hot encoding function 
# for making the data suitable
# for the concerned libraries
def hot_encode1(P):
    if(P<= 0):
        return 0
    if(P>= 1):
        return 1

# Here, we will encode the datasets
basket1_encoded = basket1_France.applymap(hot_encode1)
basket1_France = basket1_encoded

basket1_encoded = basket1_UK.applymap(hot_encode1)
basket1_UK = basket1_encoded

basket1_encoded = basket1_Por.applymap(hot_encode1)
basket1_Por = basket1_encoded

basket1_encoded = basket1_Sweden.applymap(hot_encode1)
basket1_Sweden = basket1_encoded

步骤6:建立模型并分析结果

a) 法国:

# Build the model
frq_items1 = AP(basket1_France, min_support = 0.05, use_colnames = True)

# Collect the inferred rules in a dataframe
rules1 = AR(frq_items1, metric = "lift", min_threshold = 1)
rules1 = rules1.sort_values(['confidence', 'lift'], ascending = [False, False])
print(rules1.head())

b) 英国:

frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True)
rules = association_rules(frq_items, metric ="lift", min_threshold = 1)
rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules.head())

在Python中实现Apriori算法

如果更详细地研究英国交易的准则,就会发现英国消费者购买各种颜色的茶碟。原因可能是因为英国人热爱喝茶,并且倾向于收集不同颜色的茶碟来适应不同的场合。

c)葡萄牙:

frq_items1 = AP(basket1_Sweden, min_support = 0.05, use_colnames = True)
rules1 = AR(frq_items1, metric ="lift", min_threshold = 1)
rules1 = rules1.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules1.head())

在分析与葡萄牙交易的关联法规时,可以找到使用脆皮套装(小玩意儿盒子)和彩色铅笔。这两样物品通常属于小学生。学生在学校需要这两样物品来携带午餐,以及需要创造力的工作,因此将它们搭配在一起是合理的。

d) 瑞典:

frq_items1 = AP(basket1_Sweden, min_support = 0.05, use_colnames = True)
rules1 = AR(frq_items1, metric ="lift", min_threshold = 1)
rules1 = rules1.sort_values(['confidence', 'lift'], ascending =[False, False])
print(rules1.head())

分析上述指导原则和规则,我们发现女孩儿和男孩儿的餐具放在一起。

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程