Python 如何使用Scikit-learn生成和绘制分类数据集

Scikit-learn提供了make_classification()函数，通过该函数我们可以生成具有不同数目的信息特征、每类簇数目和类别的随机生成的分类数据集。在本教程中，我们将学习如何使用Python Scikit-learn生成和绘制分类数据集。

具有一个信息特征和一个簇的分类数据集

要生成和绘制具有一个信息特征和一个簇的分类数据集，我们可以按照以下步骤进行 –

步骤 1 - 导入所需的sklearn.datasets.make_classification和matplotlib库以执行程序。

步骤 2 - 使用包含一个信息特征和一个簇的参数，创建数据点X和y。

步骤 3 - 使用matplotlib库绘制数据集。

示例

在下面的示例中，我们生成并打印一个具有一个信息特征和一个簇的分类数据集。

# Importing libraries
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Creating the classification dataset with one informative feature and one cluster per class
X, y = make_classification(n_features=2, n_redundant=0, n_informative=1, n_clusters_per_class=1)

# Plotting the dataset
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.subplot(111)
plt.title("Classification dataset with one informative feature and one cluster per class", fontsize="12")

plt.scatter(X[:, 0], X[:, 1], marker="o", c=y, s=40, edgecolor="k")
plt.show()

输出

将产生以下输出 –

Python 如何使用Scikit-learn生成和绘制分类数据集

每个类别有两个信息特征和一个簇的数据集

要生成和绘制每个类别有两个信息特征和一个簇的分类数据集，可以按照以下步骤进行 –

步骤 1 - 导入执行程序所必需的sklearn.datasets.make_classification和matplotlib库。

步骤 2 - 创建名为X和y的数据点，其中信息特征数量为2，每个类别的簇参数数量为1。

步骤 3 - 使用matplotlib库绘制数据集。

示例

在下面的示例中，我们生成并打印一个具有两个信息特征和一个簇的分类数据集。

# Importing libraries
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Creating the classification dataset with two informative feature and one cluster per class
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1)

# Plotting the dataset
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.subplot(111)
plt.title("Classification dataset with two informative feature and one cluster per class", fontsize="12")
plt.scatter(X[:, 0], X[:, 1], marker="o", c=y, s=40, edgecolor="k")
plt.show()

输出

它将产生以下输出 –

Python 如何使用Scikit-learn生成和绘制分类数据集

具有两个信息特征和每个类别两个簇的数据集

要生成并绘制具有两个信息特征和每个类别两个簇的分类数据集，我们可以执行以下步骤 –

步骤 1 - 导入所需的库sklearn.datasets.make_classification和matplotlib。

步骤 2 - 创建名为X和y的数据点，其中信息特征数和每个类别的簇数参数都为2。

步骤 3 - 使用matplotlib库绘制数据集。

示例

在下面的示例中，我们生成并打印一个具有两个信息特征和每个类别两个簇的分类数据集。

# Importing libraries
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Creating the classification dataset with two informative feature and two cluster per class
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=2)

# Plotting the dataset
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.subplot(111)
plt.title("Classification dataset with two informative feature and two cluster per class", fontsize="12")
plt.scatter(X[:, 0], X[:, 1], marker="o", c=y, s=40, edgecolor="k")
plt.show()

输出

它将产生以下输出 −

Python 如何使用Scikit-learn生成和绘制分类数据集

多类分类数据集

为了生成并绘制具有两个有信息特征和每个类别一个簇的多类分类数据集，我们可以执行以下步骤−

步骤 1 −导入执行程序所需的sklearn.datasets.make_classification和matplotlib库。

步骤 2 −创建名为X和y的数据点，其中有信息特征的数量为2，每个类别的簇数参数为1，类别数参数为3。

步骤 3 −使用matplotlib库来绘制数据集。

示例

在下面的示例中，我们生成并打印一个具有两个有信息特征和每个类别一个簇的多类分类数据集。

# Importing libraries
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Creating the multi-class classification dataset with two informative feature and one cluster per class
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, n_classes=3)

# Plotting the dataset
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.subplot(111)
plt.title("Multi-class classification dataset with two informative feature and one cluster per class", fontsize="12")
plt.scatter(X[:, 0], X[:, 1], marker="o", c=y, s=40, edgecolor="k")
plt.show()