最佳的Python机器学习库
机器学习是一种通过编程使计算机能够从不同类型的数据中学习的科学。根据机器学习的定义 Arthur Samuel 的定义:“给计算机赋予学习能力而无需明确编程的研究领域”。机器学习的概念主要用于解决各种生活问题。
在过去的日子里,用户通常通过手动编写所有算法并使用数学和统计公式来执行机器学习任务。
这个过程耗时、低效且繁琐,与Python的库、框架和模块相比。但在今天的世界中,用户可以使用Python语言进行机器学习,它是最流行和高效的机器学习语言。Python已经取代了许多语言,因为它拥有丰富的库,使工作变得更加简单和易于上手。
在本教程中,我们将讨论用于机器学习的Python最佳库:
- NumPy
- SciPy
- Scikit-learn
- Theano
- TensorFlow
- Keras
- PyTorch
- Pandas
- Matplotlib
NumPy
NumPy 是Python中最流行的库。该库用于使用大量高级数学函数和公式处理大型多维数组和矩阵形成。它主要用于机器学习中的基础科学计算。它广泛用于线性代数、傅立叶变换和随机数功能。还有其他高端库,如TensorFlow,它使用NumPy作为内部功能以操作张量。
示例:
import numpy as nup
# Then, create two arrays of rank 2
K = nup.array([[2, 4], [6, 8]])
R = nup.array([[1, 3], [5, 7]])
# Then, create two arrays of rank 1
P = nup.array([10, 12])
S = nup.array([9, 11])
# Then, we will print the Inner product of vectors
print ("Inner product of vectors: ", nup.dot(P, S), "\n")
# Then, we will print the Matrix and Vector product
print ("Matrix and Vector product: ", nup.dot(K, P), "\n")
# Now, we will print the Matrix and matrix product
print ("Matrix and matrix product: ", nup.dot(K, R))
输出:
Inner product of vectors: 222
Matrix and Vector product: [ 68 156]
Matrix and matrix product: [[22 34]
[46 74]]
SciPy
Scipy 是机器学习开发人员中流行的库,因为它包含了许多用于优化、线性代数、积分和统计学的模块。Scipy库与Scipy栈不同,因为Scipy库是组成Scipy栈的核心包之一。Scipy库用于图像处理任务。
示例1:
from scipy import signal as sg
import numpy as nup
K = nup.arange(45).reshape(9, 5)
domain_1 = nup.identity(3)
print (K, end = 'KK')
print (sg.order_filter (K, domain_1, 1))
输出:
r (K, domain_1, 1))
Output:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]] KK [[ 0. 1. 2. 3. 0.]
[ 5. 6. 7. 8. 3.]
[10. 11. 12. 13. 8.]
[15. 16. 17. 18. 13.]
[20. 21. 22. 23. 18.]
[25. 26. 27. 28. 23.]
[30. 31. 32. 33. 28.]
[35. 36. 37. 38. 33.]
[ 0. 35. 36. 37. 38.]]
示例2:
from scipy.signal import chirp as cp
from scipy.signal import spectrogram as sp
import matplotlib.pyplot as plot
import numpy as nup
t_T = nup.linspace(3, 10, 300)
w_W = cp(t_T, f0 = 4, f1 = 2, t1 = 5, method = 'linear')
plot.plot(t_T, w_W)
plot.title ("Linear Chirp")
plot.xlabel ('Time in Seconds)')
plot.show()
输出:
Scikit-learn
Scikit-learn是一个使用经典机器学习算法的Python库。它建立在Python的两个基本库NumPy和SciPy之上。Scikit-learn在机器学习开发人员中很受欢迎,因为它支持监督学习和无监督学习算法。该库还可以用于数据分析和数据挖掘过程。
示例:
from sklearn import datasets as ds
from sklearn import metrics as mt
from sklearn.tree import DecisionTreeClassifier as dtc
# load the iris datasets
dataset_1 = ds.load_iris()
# fit a CART model to the data
model_1 = dtc()
model_1.fit(dataset_1.data, dataset_1.target)
print(model)
# make predictions
expected_1 = dataset_1.target
predicted_1 = model_1.predict(dataset_1.data)
# summarize the fit of the model
print (mt.classification_report(expected_1, predicted_1))
print(mt.confusion_matrix(expected_1, predicted_1))
输出:
DecisionTreeClassifier()
precision recall f1-score support
0 1.00 1.00 1.00 50
1 1.00 1.00 1.00 50
2 1.00 1.00 1.00 50
accuracy 1.00 150
macro avg 1.00 1.00 1.00 150
weighted avg 1.00 1.00 1.00 150
[[50 0 0]
[ 0 50 0]
[ 0 0 50]]
Theano
Theano是一个著名的Python库,用于定义、评估和优化数学表达式,同时也有效地涉及多维数组。
这是通过优化CPU和GPU的利用来实现的。由于机器学习涉及数学和统计,Theano使用户可以轻松执行数学运算。
它广泛用于单元测试和自验证,用于检测和诊断不同类型的错误。Theano是一个强大的库,可用于大规模的计算密集型科学项目。它是一个简单且易于上手的库,个人可以用于他们的项目。
示例:
import theano as th
import theano.tensor as Tt
k = Tt.dmatrix('k')
r = 1 / (1 + Tt.exp(-k))
logistic_1 = th.function([k], r)
logistic_1([[0, 1], [-1, -2]])
输出:
array([[0.5, 0.71135838],
[0.26594342, 0.11420192]])
TensorFlow
TensorFlow 是一个Python的开源库,用于高性能的数值计算。它是一个流行的库,由谷歌的Brain团队开发。TensorFlow是一个涉及定义和运行涉及张量的计算的框架。TensorFlow可以用于训练和运行深度神经网络,这可以用于开发多种人工智能应用。
示例:
import tensorflow as tsf
# Initialize two constants
K_1 = tsf.constant([2, 4, 6, 8])
K_2 = tsf.constant([1, 3, 5, 7])
# Multiply
result = tsf.multiply(K_1, K_2)
# Initialize the Session
sess_1 = tsf.Session()
# Print the result
print (sess_1.run(result))
# Close the session
sess_1.close()
输出:
[ 2 12 30 56]
Keras
Keras 是一个高层次的神经网络API,能够运行在TensorFlow,CNTK和Theano库之上。它是Python中非常著名的机器学习开发者库。它可以在CPU和GPU上无缝运行。对于机器学习初学者和神经网络设计非常简单易用。它也被用于快速原型设计。
示例:
import numpy as nup
from tensorflow import keras as ks
from tensorflow.keras import layers as ls
number_classes = 10
input_shapes = (28, 28, 1)
# Here, we will import the data, and split it between train and test sets
(x_1_train, y_1_train), (x_2_test, y_2_test) = ks.datasets.mnist.load_data()
# now, we will Scale images to the [0, 1] range
x_1_train = x_1_train.astype("float32") / 255
x_2_test = x_2_test.astype("float32") / 255
# we have to make sure that the images have shape (28, 28, 1)
x_1_train = nup.expand_dims(x_1_train, -1)
x_2_test = nup.expand_dims(x_2_test, -1)
print ("x_train shape:", x_1_train.shape)
print (x_1_train.shape[0], "Training samples")
print (x_2_test.shape[0], "Testing samples")
# Then we will convert class vectors to binary class matrices
y_1_train = ks.utils.to_categorical(y_1_train, number_classes)
y_2_test = ks.utils.to_categorical(y_2_test, number_classes)
model_1 = ks.Sequential(
[
ks.Input(shape = input_shapes),
ls.Conv2D(32, kernel_size = (3, 3), activation = "relu"),
ls.MaxPooling2D(pool_size = (2, 2)),
ls.Conv2D(64, kernel_size = (3, 3), activation = "relu"),
ls.MaxPooling2D(pool_size = (2, 2)),
ls.Flatten(),
ls.Dropout(0.5),
ls.Dense(number_classes, activation = "softmax"),
]
)
model_1.summary()
输出:
x_train shape: (60000, 28, 28, 1)
60000 Training samples
10000 Testing samples
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 1600) 0
_________________________________________________________________
dropout (Dropout) (None, 1600) 0
_________________________________________________________________
dense (Dense) (None, 10) 16010
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
_________________________________________________________________
PyTorch
PyTorch 还是一个基于Torch的开源Python库,用于机器学习,Torch的实现语言是C语言,用于机器学习。它有许多工具和库在计算机版本上支持, 自然语言处理(NLP) 和许多其他机器学习程序。这个库还允许用户在具有GPU加速的张量上执行计算任务。
示例:
import torch as tch
d_type = tch.float
device_1 = tch.device("cpu")
# Use device = tch.device("cuda:0") for GPU
# Here, N_1 is batch size; D_in_1 is input dimension;
# H_1 is hidden dimension; D_out_1 is output dimension.
N_1 = 62
D_in_1 = 1000
H_1 = 110
D_out_1 = 11
# Now, we will create random input and output data
K = tch.randn(N_1, D_in_1, device = device_1, dtype = d_type)
R = tch.randn(N_1, D_out_1, device = device_1, dtype = d_type)
# Then, we will Randomly initialize weights
K_1 = tch.randn(D_in_1, H_1, device = device_1, dtype = d_type)
K_2 = tch.randn(H_1, D_out_1, device = device_1, dtype = d_type)
learning_rate_1 = 1e-6
for Q in range(500):
# Now, we will put Forward pass: compute predicted y
h_1 = K.mm(K_1)
h_relu_1 = h_1.clamp(min = 0)
y_pred_1 = h_relu_1.mm(K_2)
# Compute and print loss
loss = (y_pred_1 - R).pow(2).sum().item()
print (Q, loss)
# Then we will Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred_1 - R)
grad_K_2 = h_relu_1.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(K_2.t())
grad_h = grad_h_relu.clone()
grad_h[h_1 < 0] = 0
grad_K_1 = K.t().mm(grad_h)
# Then we will Update the weights by using gradient descent
K_1 -= learning_rate_1 * grad_K_1
K_2 -= learning_rate_1 * grad_K_2
输出:
0 35089116.0
1 33087792.0
2 42227192.0
3 56113208.0
4 61125684.0
5 45541204.0
6 21011108.0
7 6972017.0
8 2523046.5
9 1342124.5
10 950067.5625
11 753290.25
12 620475.875
13 519006.71875
14 437975.9375
15 372063.125
16 317840.8125
17 272874.46875
18 235348.421875
.
.
.
497 7.426088268402964e-05
498 7.348413055296987e-05
499 7.258950790856034e-05
Pandas
Pandas 是一个主要用于数据分析的Python库。用户在使用机器学习进行训练之前必须准备好数据集。Pandas使开发人员的工作变得简单,因为它是专门用于数据提取的。它具有各种各样的工具,可以详细分析数据,提供高级数据结构。
示例:
import pandas as pad
data_1 = {"Countries": ["Bhutan", "Cape Verde", "Chad", "Estonia", "Guinea", "Kenya", "Libya", "Mexico"],
"capital": ["Thimphu", "Praia", "N'Djamena", "Tallinn", "Conakry", "Nairobi", "Tripoli", "Mexico City"],
"Currency": ["Ngultrum", "Cape Verdean escudo", "CFA Franc", "Estonia Kroon; Euro", "Guinean franc", "Kenya shilling", "Libyan dinar", "Mexican peso"],
"population": [20.4, 143.5, 12.52, 135.7, 52.98, 76.21, 34.28, 54.32] }
data_1_table = pad.DataFrame(data_1)
print(data_1_table)
输出:
Countries capital Currency population
0 Bhutan Thimphu Ngultrum 20.40
1 Cape Verde Praia Cape Verdean escudo 143.50
2 Chad N'Djamena CFA Franc 12.52
3 Estonia Tallinn Estonia Kroon; Euro 135.70
4 Guinea Conakry Guinean franc 52.98
5 Kenya Nairobi Kenya shilling 76.21
6 Libya Tripoli Libyan dinar 34.28
7 Mexico Mexico City Mexican peso 54.32
Matplotlib
Matplotlib 是一个用于数据可视化的Python库。开发人员使用它来可视化数据及其模式。它是一个用于创建二维图形和图表的二维绘图库。
它有一个名为pyplot的模块,用于绘制图表,并提供不同的功能来控制线条样式、字体属性、格式化坐标轴等等。Matplotlib提供了不同类型的图形和图表,如直方图、误差图、柱状图等等。
示例1:
import matplotlib.pyplot as plot
import numpy as nup
# Prepare the data
K = nup.linspace(2, 4, 8)
R = nup.linspace(5, 7, 9)
Q = nup.linspace(0, 1, 3)
# Plot the data
plot.plot(K, K, label = 'K')
plot.plot(R, R, label = 'R')
plot.plot(Q, Q, label = 'Q')
# Add a legend
plot.legend()
# Show the plot
plot.show()
输出:
示例2:
import matplotlib.pyplot as plot
# Creating dataset-1
K_1 = [8, 4, 6, 3, 5, 10,
13, 16, 12, 21]
R_1 = [11, 6, 13, 15, 17, 5,
3, 2, 8, 19]
# Creating dataset2
K_2 = [6, 9, 18, 14, 16, 15,
11, 16, 12, 20]
R_2 = [16, 4, 10, 13, 18,
20, 6, 2, 17, 15]
plot.scatter(K_1, R_1, c = "Black",
linewidths = 2,
marker = "s",
edgecolor = "Brown",
s = 50)
plot.scatter(K_2, R_2, c = "Purple",
linewidths = 2,
marker = "^",
edgecolor = "Grey",
s = 200)
plt.xlabel ("X-axis")
plt.ylabel ("Y-axis")
print ("Scatter Plot")
plt.show()
输出:
结论
在本教程中,我们讨论了用于执行机器学习任务的不同Python库。我们还展示了每个库的不同示例。