如何在 Python 中从文本文件中分割段落并查找特定单词

55 阅读3分钟

某位用户在使用 Python 程序从文本文件中分割段落并查找特定单词时遇到了问题。他的程序只能分割文本文件的最后一段落,而无法分割其他段落。这是他的代码:

f = open("text.txt", "r")
listBill = []
for line in f:
    print(line)

    splitParagraph = line.split()
    print(splitParagraph)
    for eachBill in splitParagraph:
        if eachBill == "Bill":
            listBill.append("Bill")

print(f'段落中包含{len(listBill)}个Bill')

当他运行这段代码时,得到的输出是:

Bill & Ted's Excellent Adventure is a 1989 American science fiction comedy buddy film 
and the first film in the Bill & Ted franchise in which two slackers travel through time
to assemble a menagerie of historical figures for their high school history resentation.

The film was written by Chris Matheson and Ed Solomon and directed by Stephen Herek. It
stars Keanu Reeves as Ted "Theodore" Logan, Alex Winter as Bill S. Preston, Esquire, and
George Carlin as Rufus. Bill & Ted's Excellent Adventure received reviews which were 
mostly positive upon release and was commercially successful. It is now considered a 
cult classic. A sequel, Bill & Ted's Bogus Journey, was released two years later. An 
untitled third film is in development

['The', 'film', 'was', 'written', 'by', 'Chris', 'Matheson', 'and', 'Ed', 'Solomon',  
'and', 'directed', 'by', 'Stephen', 'Herek.', 'It', 'stars', 'Keanu', 'Reeves', 'as', 
'Ted', '"Theodore"', 'Logan,', 'Alex', 'Winter', 'as', 'Bill', 'S.', 'Preston,',  
'Esquire,',    'and', 'George', 'Carlin', 'as', 'Rufus.', 'Bill', '&', "Ted's",     
'Excellent',     'Adventure', 'received', 'reviews', 'which', 'were', 'mostly',      
 positive', 'upon', 'release', 'and', 'was', 'commercially', 'successful.', 'It', 'is',
 now', 'considered', 'a', 'cult', 'classic.', 'A', 'sequel,', 'Bill', '&', "Ted's", 
 Bogus', 'Journey,', 'was', 'released', 'two', 'years', 'later.', 'An', 'untitled', 
 'third', 'film', 'is', 'in', 'development']

段落中包含3个Bill

正如你所看到的,该程序只找到了文本文件的最后一段落中的 3 个 "Bill",而无法找到第一段落中的 "Bill"。

2、解决方案

问题的根源在于代码中的缩进错误。正确的代码应该是:

f = open("text.txt", "r")
listBill = []
for line in f:
    print(line)

    splitParagraph = line.split()
    print(splitParagraph)
    for eachBill in splitParagraph:
        if eachBill == "Bill":
            listBill.append("Bill")

print(f'段落中包含{len(listBill)}个Bill')

在修改后的代码中,splitParagraph = line.split()for eachBill in splitParagraph: 这两行代码缩进了正确的缩进层级。这样,程序就能正确地分割文本文件的每一行,并找到所有包含 "Bill" 的单词。运行修改后的代码,得到输出:

Bill & Ted's Excellent Adventure is a 1989 American science fiction comedy buddy film 
and the first film in the Bill & Ted franchise in which two slackers travel through time
to assemble a menagerie of historical figures for their high school history resentation.

Bill

['Bill']

The film was written by Chris Matheson and Ed Solomon and directed by Stephen Herek. It
stars Keanu Reeves as Ted "Theodore" Logan, Alex Winter as Bill S. Preston, Esquire, and
George Carlin as Rufus. Bill & Ted's Excellent Adventure received reviews which were 
mostly positive upon release and was commercially successful. It is now considered a 
cult classic. A sequel, Bill & Ted's Bogus Journey, was released two years later. An 
untitled third film is in development

Bill

['Bill']

段落中包含2个Bill

现在,程序可以正确地分割文本文件的每一行,并找到所有包含 "Bill" 的单词。

另一种更简单的方法是使用字符串的count()方法来查找特定单词。以下是使用count()方法的代码:

text = """Bill & Ted's Excellent Adventure is a 1989 American science fiction comedy buddy film and the first film in the Bill & Ted franchise in which two slackers travel through time to assemble a menagerie of historical figures for their high school history presentation.

The film was written by Chris Matheson and Ed Solomon and directed by Stephen Herek. It stars Keanu Reeves as Ted "Theodore" Logan, Alex Winter as Bill S. Preston, Esquire, and George Carlin as Rufus. Bill & Ted's Excellent Adventure received reviews which were mostly positive upon release and was commercially successful. It is now considered a cult classic. A sequel, Bill & Ted's Bogus Journey, was released two years later. An untitled third film is in development"""

print(f'段落中包含{text.count("Bill")}个Bill')

运行这段代码,得到输出:

段落中包含2个Bill

使用count()方法可以更轻松地找到字符串中特定单词出现的次数。