Python 如何在文本文件中找到出现最多的单词

在本文中，我们将展示如何使用Python在给定的文本文件中找到出现最多的单词。

假设我们已经获取了一个名为 ExampleTextFile.txt 的文本文件，其中包含一些随机文本。我们将返回在给定的文本文件中出现最多的单词。

ExampleTextFile.txt

Good Morning TutorialsPoint
This is TutorialsPoint sample File
Consisting of Specific
source codes in Python,Seaborn,Scala
Summary and Explanation
Welcome TutorialsPoint
Learn with a joy

步骤

以下是执行所需任务的算法/步骤：

从collections模块中导入 Counter 函数（Counter类是Python3 collections模块提供的一种对象数据集形式。Collections模块向用户公开了专门的容器数据类型，作为Python常规内置功能（例如字典、列表和元组）的替代选择。Counter是一个计数可哈希对象的子类。调用时，它隐式地创建一个可迭代的哈希表）。
创建一个变量来存储文本文件的路径。
创建一个列表来存储所有的单词。
使用 open() 函数（打开一个文件并将文件对象作为结果返回）通过传递文件名和模式作为参数来以只读模式打开文本文件（这里的“ r ”表示只读模式）。

with open(inputFile, 'r') as filedata:

使用for循环遍历文件中的每一行。
使用 split() 函数（将字符串分割成列表。可以定义分隔符；默认的分隔符为任意空格），将文本文件内容分割成单词列表并将其存储在一个变量中。
使用for循环遍历单词列表。
使用 append() 函数（将元素添加到列表的末尾），将每个单词追加到列表中。
使用 Counter() 函数（以键值对的形式给出单词的频率），计算所有单词的频率（单词出现的次数）。
创建一个变量来存储最大频率。
使用for循环遍历以上单词频率字典。
使用if条件语句和in关键字，检查单词的频率是否大于最大频率。

The in keyword works in two ways:
The in keyword is used to determine whether a value exists in a sequence (list, range, string etc).
It is also used to iterate through a sequence in a for loop

如果单词的频率大于最大频率。
创建一个变量来存储文本文件中出现最多的单词。
打印文本文件中出现最多的单词。
使用 close() 函数关闭输入文件（用于关闭已打开的文件）。

示例

以下程序遍历文本文件的行，并使用collections模块的counter函数打印文本文件中键值对的频率。

# importing Counter function
from collections import Counter

# input text file
inputFile = "ExampleTextFile.txt"

# Storing all the words
newWordsList = []

# Opening the given file in read-only mode
with open(inputFile, 'r') as filedata:

   # Traverse in each line of the file
   for textline in filedata:

      # Splitting the text file content into list of words
      wordsList = textline.split()

      # Traverse in the above list of words
      for word in wordsList:

         # Appending each word to the new list
         newWordsList.append(word)

# Using the Counter() function, calculate the frequency of all the words
wordsFrequency = Counter(newWordsList)

# Taking a variable to store the maximum frequency value
maxFrequency = 0

# Loop in the above words frequency dictionary
for textword in wordsFrequency:

   # Checking whether the frequency of the word is greater than the maximum frequency
   if(wordsFrequency[textword] > maxFrequency):

      # If it is true then set maximum frequency to the corresponding frequency value of the word
      maxFrequency = wordsFrequency[textword]

      # As this is the word with maximum frequency store this word in a variable
      mostRepeatedWord = textword

# Printing the most repeated word in a text file
print("{",mostRepeatedWord,"} is the most repeated word in a text file")

# Closing the input file
filedata.close()

输出

执行上述程序时，将生成以下输出结果 –

{ TutorialsPoint } is the most repeated word in a text file

在这个程序中，我们从一个文本文件中读取一些随机文本。我们遍历整个文件，将其分解成单词，并将文本文件中的所有单词添加到列表中。我们使用Counter()方法计算文本文件中所有单词的频率，该方法返回一个字典，其中键为单词，值为单词的频率。然后，我们遍历字典的单词，检查频率是否大于最大频率。如果是，则这是最常见的单词，因此我们将结果保存在一个变量中，并将最大频率更新为当前单词的频率。最后，我们显示最常见的单词。