Pandas 使用机器学习进行降雨预测

机器学习的力量使我们能够使用多种算法来预测降雨，包括随机森林和XGBoost。

没有最佳的预测降雨算法，每个算法都有其优点和缺点。随机森林对小型数据集高效，而XGboost对大型数据集高效。

同样，我们可以根据项目需求对其他算法进行分类。

我们的目标是基于随机森林构建一个降雨预测的机器学习模型。

步骤

导入所需的库，如Pandas、Numpy、Sklearn和matplotlib。
将历史降雨数据加载到Pandas数据框中。
通过删除任何不必要的列和处理缺失值（如果有的话）来对数据进行预处理。
将数据分成训练集和测试集。
选择一个机器学习算法，例如随机森林或XGBoost，用于预测。在这个示例中，我们选择了随机森林算法，因为它最适合我们选择的数据集。
在训练集上训练算法。
使用训练好的模型预测给定月份和年份的降雨。
评估模型的效率。

示例

# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt
#Load the dataset

df=pd.read_csv("Rainfall_dataset.csv")
df.head()
df.fillna(value = 0,inplace =True)
grouped = df.groupby(df.DIVISION)
UP = grouped.get_group("EAST UTTAR PRADESH")

UP.head()
UP.hist(figsize=(12,12))
# Split the dataset into training and testing sets

data = np.asarray(UP[['FEB', 'MAR', 'APR','MAY']])
print(np.shape(data))
X = data[:,0:3]
y = data[:,3]

data = np.asarray(UP[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])
print(np.shape(data))

X = None; y = None
for i in range(data.shape[1]-3):
   if X is None:
      X = data[:, i:i+3]
      y = data[:, i+3]
   else:
      X = np.concatenate((X, data[:, i:i+3]), axis=0)
      y = np.concatenate((y, data[:, i+3]), axis=0)
# Train the model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
np.shape(X_test)
rf = RandomForestRegressor(n_estimators = 100, max_depth=10, n_jobs=1)
rf.fit(X, y)

# Predict on the test set
y_pred = rf.predict(X)
# Evaluate the model
mean_absolute_error(y, y_pred)

print(mean_absolute_error(y, y_pred))
print(y_pred)

数据从Rainfall_dataset.csv文件中加载并存储到Python数据帧中。缺失的值被填充为0。然后将数据集分为训练集和测试集。从数据帧中提取2月、3月和4月的降雨值并存储在一个不同的数组中，而5月的降雨值则分别存储在另一个数组中。

在整个数据集上训练了一个随机森林回归模型，该模型用于对数据集进行预测。预测值随后被存储在一个数组中。使用实际从数据集中加载的降雨值和使用mean_absolute_error()函数计算的预测降雨值之间的平均绝对误差来评估模型的性能。

输出

25.71495399881942   //This is the mean absolute error (MAE) between the actual values y and the predicted values y_pred 

[18.15560485 28.51579025 18.42870772 ...  3.45343635  6.94081644
  8.22604943]  //These are the predicted values stored in the y_pred.

注意 − 在上面的示例中，降雨预测适用于北乌塔尔邦东部，您可以选择任何州或地区。

确保从上述链接中下载数据集以获取输出。