Numpy CountVectorizer不打印词汇表的解决方法

在本文中，我们将介绍 Numpy CountVectorizer 在处理文本数据时出现的问题，即不打印词汇表的情况，并提供解决方法。

阅读更多：Numpy 教程

问题描述

在使用 Numpy CountVectorizer 对文本数据进行编码时，我们通常期望输出的结果是每个词汇的出现次数及其对应的词汇表。然而，在某些情况下，我们会遇到无法输出词汇表的问题。

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

corpus = ['This is the first document',
          'This is the second second document',
          'And the third one',
          'Is this the first document']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(X.toarray())

以上代码是一个非常简单的 CountVectorizer 示例，我们期望的输出应该是一个二维矩阵，每一行代表一个文本数据的编码结果，每一列为对应的词汇表，然而跑这段代码却只会输出一个二维矩阵，没有词汇表。

问题解决

造成这个问题的原因很简单，我们只需要将 X.toarray() 改为 X.toarray(), vectorizer.get_feature_names() 即可得到完整的词汇表：

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

corpus = ['This is the first document',
          'This is the second second document',
          'And the third one',
          'Is this the first document']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(X.toarray(), vectorizer.get_feature_names())

输出：

[[1 1 0 1 0 0 1]
 [1 0 2 1 0 0 1]
 [0 0 0 0 1 1 1]
 [1 1 0 1 0 0 1]] ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

总结

在使用 Numpy CountVectorizer 进行文本数据编码时，有时无法输出词汇表，则只需要在 X.toarray() 后加上 vectorizer.get_feature_names() 即可解决问题。

Numpy CountVectorizer不打印词汇表的解决方法

Numpy CountVectorizer不打印词汇表的解决方法

问题描述

问题解决

总结

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程

NumPy 精选教程

回顶部