给定一个包含大约 1,200 个文本文件的文件夹,所有这些文本文件都采用这种格式...
Time range of SELECTION
From 1.133071 to 4.457098 seconds (duration: 3.324027 seconds)
Pitch:
Median pitch: 172.651 Hz
Mean pitch: 167.584 Hz
Standard deviation: 48.839 Hz
Minimum pitch: 59.460 Hz
Maximum pitch: 269.304 Hz
Pulses:
Number of pulses: 216
Number of periods: 141
Mean period: 6.646523E-3 seconds
Standard deviation of period: 2.969047E-3 seconds
Voicing:
Fraction of locally unvoiced frames: 46.348% (368 / 794)
Number of voice breaks: 13
Degree of voice breaks: 50.270% (1.670989 seconds / 3.324027 seconds)
Jitter:
Jitter (local): 5.795%
Jitter (local, absolute): 385.185E-6 seconds
Jitter (rap): 2.361%
Jitter (ppq5): 1.908%
Jitter (ddp): 7.083%
Shimmer:
Shimmer (local): 20.262%
Shimmer (local, dB): 1.841 dB
Shimmer (apq3): 10.382%
Shimmer (apq5): 22.335%
Shimmer (apq11): --undefined--
Shimmer (dda): 31.145%
Harmonicity of the voiced parts only:
Mean autocorrelation: 0.515841
Mean noise-to-harmonics ratio: 1.232685
Mean harmonics-to-noise ratio: 0.331 dB
...如何编写某种程序来处理所有 1,200 个文件,仅在字符串 "duration:", "Mean pitch:", "Minimum pitch:", "Maximum pitch:", "Jitter (local):", "Jitter (rap):", "Shimmer (local):", "Mean noise-to-harmonics ratio:", 和 "Mean harmonics-to-noise ratio:" 之后提取数值,并将它们写入一个大文件中(可以在 Excel 中打开或粘贴),其中每行都包含一个文本文件的值?
我已经在这些论坛中找到了类似的问题,这些问题已使用 Python 解决,但我很难搞清楚所有代码是如何工作的。 我不太擅长这种事情。 有人可以帮忙吗?
2、解决方案
import csv
import io
import os
def parse_file(file_path):
"""Parses a single text file and extracts the relevant values.
Args:
file_path: The path to the text file.
Returns:
A list of values extracted from the text file.
"""
with io.open(file_path, 'r', encoding='utf-8') as f:
values = []
for line in f:
# Search for the desired strings and extract the numerical values.
if 'duration:' in line:
values.append(float(line.split(':')[-1].strip()))
elif 'Mean pitch:' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Minimum pitch:' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Maximum pitch:' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Jitter (local):' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Jitter (rap):' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Shimmer (local):' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Mean noise-to-harmonics ratio:' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
elif 'Mean harmonics-to-noise ratio:' in line:
values.append(float(line.split(':')[-1].strip().split(' ')[0]))
return values
def main():
"""The main function."""
# Get the path to the directory containing the text files.
directory = input('Enter the path to the directory containing the text files: ')
# Create a CSV writer object.
with open('output.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile)
# Iterate over the files in the directory and extract the values.
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('.txt'):
file_path = os.path.join(root, file)
values = parse_file(file_path)
csv_writer.writerow(values)
print('Values successfully extracted and written to output.csv.')
if __name__ == '__main__':
main()