在处理大文本文件时,有时需要读取文件的最后部分。然而,如果文件非常大(例如超过4GB),直接读取整个文件可能会导致内存溢出。
find_str = "ERROR"
file = open(file_directory)
last_few_lines = file.readlines()[-20:]
error = False
for line in last_few_lines:
if find_str in line:
error = True
上面的代码试图读取文件最后20行,但是对于非常大的文件,这可能会导致内存溢出。
2. 解决方案
一种解决方案是使用file.size() 方法来获取文件的大小,然后读取文件的最后1MB。
import os
find_str = "ERROR"
error = False
# Open file with 'b' to specify binary mode
with open(file_directory, 'rb') as file:
file.seek(-1024 * 1024, os.SEEK_END) # Note minus sign
if find_str in file.read():
error = True
使用file.seek() 方法来定位到文件末尾,然后读取最后1MB。这样就可以避免内存溢出,并且仍然可以读取文件的最后部分。
另一种解决方案是使用tail命令。
from collections import deque
def tail(fn, n):
with open(fn) as fin:
return list(deque(fin, n))
print tail('/tmp/lines.txt', 20)
tail命令可以读取文件的最后n行,而不用将整个文件加载到内存中。这对于处理非常大的文件非常有用。
最后一种解决方案是使用file对象。
for line in file(file_directory):
if find_str in line:
error = True
file对象可以逐行读取文件,而不用将整个文件加载到内存中。这对于处理非常大的文件非常有用。
3. 代码例子
以下是一个使用file.seek()方法读取文件最后1MB的代码例子:
import os
def read_last_mb(file_directory, find_str):
"""
Read the last 1MB of a file.
Args:
file_directory: The path to the file.
find_str: The string to search for.
Returns:
A boolean indicating whether the string was found.
"""
with open(file_directory, 'rb') as file:
file.seek(-1024 * 1024, os.SEEK_END)
last_mb = file.read()
return find_str in last_mb
if __name__ == '__main__':
file_directory = 'path/to/file.txt'
find_str = 'ERROR'
found = read_last_mb(file_directory, find_str)
if found:
print('The string was found in the last 1MB of the file.')
else:
print('The string was not found in the last 1MB of the file.')
以下是一个使用tail命令读取文件最后n行的代码例子:
from collections import deque
def tail(fn, n):
"""
Read the last n lines of a file.
Args:
fn: The path to the file.
n: The number of lines to read.
Returns:
A list of the last n lines of the file.
"""
with open(fn) as fin:
return list(deque(fin, n))
if __name__ == '__main__':
file_directory = 'path/to/file.txt'
n = 20
last_lines = tail(file_directory, n)
print('The last {} lines of the file are:'.format(n))
for line in last_lines:
print(line)
以下是一个使用file对象逐行读取文件的代码例子:
def read_file(file_directory, find_str):
"""
Read a file line by line.
Args:
file_directory: The path to the file.
find_str: The string to search for.
Returns:
A boolean indicating whether the string was found.
"""
found = False
with open(file_directory) as file:
for line in file:
if find_str in line:
found = True
break
return found
if __name__ == '__main__':
file_directory = 'path/to/file.txt'
find_str = 'ERROR'
found = read_file(file_directory, find_str)
if found:
print('The string was found in the file.')
else:
print('The string was not found in the file.')