Python中的语法和拼写检查器
在以下教程中,我们将讨论一个名为 LanguageTool 的Python包,并了解如何使用Python编程语言创建一个简单的语法和拼写检查器。
所以,让我们开始吧。
理解Python中的LanguageTool库
LanguageTool 是一个开源工具,用于语法和拼写检查,也被称为OpenOffice的拼写检查器。这个包允许程序员通过Python代码段或命令行界面(CLI)来检测语法和拼写错误。
如何安装LanguageTool库
要安装Python库,我们需要 ‘pip’ ,这是一个管理从可信公共存储库安装模块所需包的框架。一旦我们有了 ‘pip’ ,我们可以使用以下命令从Windows命令提示符(CMD)或终端安装 LanguageTool 库:
语法:
$ pip install language-tool-python
language_tool_python库默认会下载一个LanguageTool服务器作为JAR文件,并在后台执行以在本地检测语法错误。但是LanguageTool也提供了一个公共的HTTP校对API,但是调用次数有限。
验证安装
安装完库之后,我们可以通过创建一个空的Python程序文件,并写入以下导入语句来验证:
文件:verify.py
import language_tool_python
现在,保存上述文件,并使用终端中的以下命令执行它:
语法:
$ python verify.py
如果上面的Python程序文件没有返回任何错误,则库已正确安装。但是,如果引发了异常,建议重新安装库,并建议参考模块的官方文档。
使用Python的LanguageTool库
在下面的部分中,我们将通过一个实际示例了解Python中 LanguageTool 库的工作方式。以下Python脚本演示了检测语法错误并对其进行纠正的过程。我们将使用以下文本进行操作:
文本:
LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the ‘Check Text’ button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021
上面的文本中有一些加粗显示的语法和拼写错误。让我们考虑以下Python脚本以了解 LanguageTool 实用工具的工作方式:
示例:
# importing the package
import language_tool_python
# using the tool
my_tool = language_tool_python.LanguageTool('en-US')
# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021"""
# getting the matches
my_matches = my_tool.check(my_text)
# printing matches
print(my_matches)
输出:
[Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['for'], 'offsetInContext': 43, 'context': "...Text' button. Click the colored phrases for for information on potential errors. or we ...", 'offset': 165, 'errorLength': 7, 'category': 'MISC', 'ruleIssueType': 'duplication', 'sentence': 'Click the colored phrases for for information on potential errors.'}), Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Or'], 'offsetInContext': 43, 'context': '...or for information on potential errors. or we can use this text too see an some of...', 'offset': 206, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'offsetInContext': 43, 'context': '...tential errors. or we can use this text too see an some of the issues that LanguageTool...', 'offset': 230, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of 'an' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'.', 'replacements': ['a'], 'offsetInContext': 43, 'context': '...errors. or we can use this text too see an some of the issues that LanguageTool ca...', 'offset': 238, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect', 'defect', 'deduct', 'deject'], 'offsetInContext': 43, 'context': '...ome of the issues that LanguageTool can dedect. Whot do someone thinks of grammar chec...', 'offset': 282, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Who', 'What', 'Shot', 'Whom', 'Hot', 'WHO', 'Whet', 'Whit', 'Whoa', 'Whop', 'WHT', 'Wot', 'W hot'], 'offsetInContext': 43, 'context': '...he issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? ...', 'offset': 290, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Whot do someone thinks of grammar checkers?'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'offsetInContext': 43, 'context': '...eone thinks of grammar checkers? Please not that they are not perfect. Style proble...', 'offset': 341, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Please not that they are not perfect.'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'offsetInContext': 43, 'context': '...yle problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 Nov...', 'offset': 414, 'errorLength': 19, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Style problems get a blue marker: It is 7 P.M. in the evening.'})]
解释:
在上面的代码片段中,我们导入了所需的库并定义了一个工具,该工具使用 LanguageTool 工具来检查文本中的语法和拼写错误。然后,我们定义了另一个字符串变量,用于存储我们想要检查的文本段落。我们然后使用 check() 函数检索匹配并将其打印给用户。
结果是,我们可以观察到我们有一个详细的字典,显示 ruleId,message,replacements,offsetInContext,context,offset 等等。我们可以在 LanguageTool 社区中找到关于每个规则ID的详细解释。
既然我们已经检测到了错误,那么是时候纠正它们了。让我们考虑以下演示同样情况的Python脚本:
示例:
# importing the package
import language_tool_python
# using the tool
my_tool = language_tool_python.LanguageTool('en-US')
# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021"""
# getting the matches
my_matches = my_tool.check(my_text)
# defining some variables
myMistakes = []
myCorrections = []
startPositions = []
endPositions = []
# using the for-loop
for rules in my_matches:
if len(rules.replacements) > 0:
startPositions.append(rules.offset)
endPositions.append(rules.errorLength + rules.offset)
myMistakes.append(my_text[rules.offset : rules.errorLength + rules.offset])
myCorrections.append(rules.replacements[0])
# creating new object
my_NewText = list(my_text)
# rewriting the correct passage
for n in range(len(startPositions)):
for i in range(len(my_text)):
my_NewText[startPositions[n]] = myCorrections[n]
if (i > startPositions[n] and i < endPositions[n]):
my_NewText[i] = ""
my_NewText = "".join(my_NewText)
# printing the text
print(my_NewText)
输出:
LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for information on potential errors. Or we can use this text to see a some of the issues that LanguageTool can detect. Who do someone thinks of grammar checkers? Please note that they are not perfect. Style problems get a blue marker: It is 7 P.M.. The weather was nice on Monday, 22 November 2021
解释:
我们在上述代码片段中添加了一些新变量来解决错误、修正、起始位置和结束位置的问题。然后,我们使用 for 循环来遍历 my_matches 中的规则,并将错误替换为其修正。然后,我们将这些修正后的文本存储在一个列表中。最后,我们再次使用 for 循环来遍历列表中的字符串元素,将它们连接在一起,并将结果文本打印出来供用户使用。
因此,我们成功地纠正了在前面的代码片段中发现的错误。
现在,让我们使用以下Python脚本观察之前捕获的错误及其相应的修正:
示例:
print(list(zip(myMistakes, myCorrections)))
输出:
[('for for', 'for'), ('or', 'Or'), ('too see', 'to see'), ('an', 'a'), ('dedect', 'detect'), ('Whot', 'Who'), ('not', 'note'), ('P.M. in the evening', 'P.M.')]
解释:
在上面的代码片段中,我们打印了文本中的错误列表和它们对应的更正。
自动应用建议到文本中
让我们来看一个简单的示例,演示如何使用Python中的 LanguageTool 库自动应用建议到文本中。
示例:
# importing the library
import language_tool_python
# creating the tool
my_tool = language_tool_python.LanguageTool('en-US')
# given text
my_text = 'A quick broun fox jumpps over a a little lazy dog.'
# correction
correct_text = my_tool.correct(my_text)
# printing some texts
print("Original Text:", my_text)
print("Text after correction:", correct_text)
输出:
Original Text: A quick broun fox jumpps over a a little lazy dog.
Text after correction: A quick brown fox jumps over a little lazy dog.
说明:
在上面的代码片段中,我们已经导入了所需的库,并定义了 LanguageTool 工具,指定语言为美式英语。然后,我们定义了一个字符串变量,并将一些文本存储在其中。然后,我们使用工具的 correct() 函数自动纠正文本中的错误,并将结果文本打印给用户。