如何在Python函数中消除重复行

在本文中，我们将讨论如何从Python中删除多个重复的行。如果文件很小并且只有几行，可以手动执行删除重复行的过程。然而，当处理大型文件时，Python可以提供帮助。

使用文件处理方法

Python具有用于创建、打开和关闭文件的内置方法，这使得处理文件更加简便。Python还允许在文件打开时进行多个文件操作，如读取、写入和追加数据。

为了从Python文本文件或函数中删除重复行，我们使用Python中的文件处理方法。文本文件或函数必须与包含Python程序的.py文件位于相同的目录中。

算法

以下是消除Python函数中重复行的方法

由于我们只会读取此文件的内容，所以首先以只读模式打开输入文件。
现在，为了将内容写入此文件，以写模式打开输出文件。
逐行读取输入文件，然后检查输出文件，看看是否有任何类似于此行的行已经被写入。
如果没有，将此行添加到输出文件，并将该行的哈希值保存在一个集合中。我们将不再检查和存储整行，而是检查每行的哈希值。这样在处理大型文件时更加有效，并且占用更少的空间。
如果哈希值已经添加到集合中，则跳过该行。
完成所有操作后，输出文件将包含输入文件的每一行，没有重复内容。

这里，输入文件即”File.txt”包含以下数据−

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

例子

以下是一个在Python函数中消除重复行的示例 −

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

输出

我们可以看到以下输出中，输出文件中已经消除了输入文件中的所有重复行，输出文件中包含的数据如下所示−

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

示例

以下是一个在Python函数中消除重复行的另一个示例−

# path of the input and output files
# Create the output file in write mode
OutFile = open('C:\Users\Lenovo\Downloads\Work TP\pre.txt',"w")
11
# Create an input file in read mode
InFile = open('C:\Users\Lenovo\Downloads\Work TP\File.txt', "r")
# holding the line which is already seen
lines_present = set()
# iterate every line present in the file
for l in InFile:
   # check whether the lines are unique
   if l not in lines_present:
      # writing all the unique lines in the output file
      OutFile.write(l)
      # adding unique lines in the lines_present
      lines_present.add(l)
# closing the output text files
OutFile.close()
InFile.close()

输出

我们可以看到以下输出中，输出文件中消除了输入文件中的所有重复行，输出文件中包含的是下面所示的唯一数据。

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

如何在Python函数中消除重复行

如何在Python函数中消除重复行

使用文件处理方法

算法

例子

输出

示例

输出

Camera课程

Python教程

Java教程

Web教程

数据库教程

图形图像教程

办公软件教程

Linux教程

计算机教程

大数据教程

开发工具教程

Python 精选教程

回顶部