Python 如何使用Scikit-learn从数据集中获取类似字典的对象
借助Scikit-learn Python库的帮助,我们可以获取数据集的类似字典的对象。一些有趣的类似字典对象的属性如下:
- data - 表示要学习的数据。
-
target - 表示回归目标。
-
DESCR - 数据集的描述。
-
target_names - 数据集的目标名称。
-
feature_names - 从数据集中获取的特征名称。
示例1
在下面的示例中,我们使用加利福尼亚房屋数据集来获取其类似字典的对象。
# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing
# Loading the California housing dataset
housing = fetch_california_housing()
# Print dictionary-like objects
print(housing.keys())
输出
它将产生以下输出:
dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])
示例2
我们也可以通过以下方式获取这些类似字典的对象的更多详细信息-
# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing
print(housing.data.shape)
print('\n')
print(housing.target.shape)
print('\n')
print(housing.feature_names)
print('\n')
print(housing.target_names)
print('\n')
print(housing.DESCR)
输出
它将产生以下输出-
(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
:Number of Instances: 20640
:Number of Attributes: 8 numeric, predictive attributes and the target
:Attribute Information:
- MedInc median income in block group
- HouseAge median house age in block group
- AveRooms average number of rooms per household
- AveBedrms average number of bedrooms per household
- Population block group population
- AveOccup average number of household members
- Latitude block group latitude
- Longitude block group longitude
:Missing Attribute Values: None
Omitted due to length of the output…
示例3
# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing
# Loading the California housing dataset
housing = fetch_california_housing(as_frame=True)
print(housing.frame.info())
输出
它将产生以下输出 –
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MedInc 20640 non-null float64
1 HouseAge 20640 non-null float64
2 AveRooms 20640 non-null float64
3 AveBedrms 20640 non-null float64
4 Population 20640 non-null float64
5 AveOccup 20640 non-null float64
6 Latitude 20640 non-null float64
7 Longitude 20640 non-null float64
8 MedHouseVal 20640 non-null float64
dtypes: float64(9)
memory usage: 1.4 MB