Python 如何使用Scikit-learn从数据集中获取类似字典的对象

Python 如何使用Scikit-learn从数据集中获取类似字典的对象

借助Scikit-learn Python库的帮助,我们可以获取数据集的类似字典的对象。一些有趣的类似字典对象的属性如下:

  • data - 表示要学习的数据。

  • target - 表示回归目标。

  • DESCR - 数据集的描述。

  • target_names - 数据集的目标名称。

  • feature_names - 从数据集中获取的特征名称。

示例1

在下面的示例中,我们使用加利福尼亚房屋数据集来获取其类似字典的对象。

# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing

# Loading the California housing dataset
housing = fetch_california_housing()

# Print dictionary-like objects
print(housing.keys())

输出

它将产生以下输出:

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

示例2

我们也可以通过以下方式获取这些类似字典的对象的更多详细信息-

# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing
print(housing.data.shape)
print('\n')
print(housing.target.shape)
print('\n')
print(housing.feature_names)
print('\n')
print(housing.target_names)
print('\n')
print(housing.DESCR)

输出

它将产生以下输出-

(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
   :Number of Instances: 20640
   :Number of Attributes: 8 numeric, predictive attributes and the target
   :Attribute Information:
      - MedInc median income in block group
      - HouseAge median house age in block group
      - AveRooms average number of rooms per household
      - AveBedrms average number of bedrooms per household
      - Population block group population
      - AveOccup average number of household members
      - Latitude block group latitude
      - Longitude block group longitude
   :Missing Attribute Values: None
Omitted due to length of the output…

示例3

# Import necessary libraries
import sklearn
import pandas as pd
from sklearn.datasets import fetch_california_housing

# Loading the California housing dataset
housing = fetch_california_housing(as_frame=True)

print(housing.frame.info())

输出

它将产生以下输出 –

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
#    Column       Non-Null Count    Dtype
---  ------       --------------    -----
 0   MedInc       20640 non-null   float64
 1   HouseAge     20640 non-null   float64
 2   AveRooms     20640 non-null   float64
 3   AveBedrms    20640 non-null   float64
 4   Population   20640 non-null   float64
 5   AveOccup     20640 non-null   float64
 6   Latitude     20640 non-null   float64
 7   Longitude    20640 non-null   float64
 8   MedHouseVal  20640 non-null   float64
dtypes: float64(9)
memory usage: 1.4 MB

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程