从文本文件中提取数值并写入单个电子表格

60 阅读2分钟

给定一个包含大约 1,200 个文本文件的文件夹,所有这些文本文件都采用这种格式...

Time range of SELECTION
   From 1.133071 to 4.457098 seconds (duration: 3.324027 seconds)
Pitch:
   Median pitch: 172.651 Hz
   Mean pitch: 167.584 Hz
   Standard deviation: 48.839 Hz
   Minimum pitch: 59.460 Hz
   Maximum pitch: 269.304 Hz
Pulses:
   Number of pulses: 216
   Number of periods: 141
   Mean period: 6.646523E-3 seconds
   Standard deviation of period: 2.969047E-3 seconds
Voicing:
   Fraction of locally unvoiced frames: 46.348%   (368 / 794)
   Number of voice breaks: 13
   Degree of voice breaks: 50.270%   (1.670989 seconds / 3.324027 seconds)
Jitter:
   Jitter (local): 5.795%
   Jitter (local, absolute): 385.185E-6 seconds
   Jitter (rap): 2.361%
   Jitter (ppq5): 1.908%
   Jitter (ddp): 7.083%
Shimmer:
   Shimmer (local): 20.262%
   Shimmer (local, dB): 1.841 dB
   Shimmer (apq3): 10.382%
   Shimmer (apq5): 22.335%
   Shimmer (apq11): --undefined--
   Shimmer (dda): 31.145%
Harmonicity of the voiced parts only:
   Mean autocorrelation: 0.515841
   Mean noise-to-harmonics ratio: 1.232685
   Mean harmonics-to-noise ratio: 0.331 dB

...如何编写某种程序来处理所有 1,200 个文件,仅在字符串 "duration:", "Mean pitch:", "Minimum pitch:", "Maximum pitch:", "Jitter (local):", "Jitter (rap):", "Shimmer (local):", "Mean noise-to-harmonics ratio:", 和 "Mean harmonics-to-noise ratio:" 之后提取数值,并将它们写入一个大文件中(可以在 Excel 中打开或粘贴),其中每行都包含一个文本文件的值?

我已经在这些论坛中找到了类似的问题,这些问题已使用 Python 解决,但我很难搞清楚所有代码是如何工作的。 我不太擅长这种事情。 有人可以帮忙吗?

2、解决方案

import csv
import io
import os

def parse_file(file_path):
  """Parses a single text file and extracts the relevant values.

  Args:
    file_path: The path to the text file.

  Returns:
    A list of values extracted from the text file.
  """

  with io.open(file_path, 'r', encoding='utf-8') as f:
    values = []
    for line in f:
      # Search for the desired strings and extract the numerical values.
      if 'duration:' in line:
        values.append(float(line.split(':')[-1].strip()))
      elif 'Mean pitch:' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Minimum pitch:' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Maximum pitch:' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Jitter (local):' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Jitter (rap):' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Shimmer (local):' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Mean noise-to-harmonics ratio:' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))
      elif 'Mean harmonics-to-noise ratio:' in line:
        values.append(float(line.split(':')[-1].strip().split(' ')[0]))

  return values


def main():
  """The main function."""

  # Get the path to the directory containing the text files.
  directory = input('Enter the path to the directory containing the text files: ')

  # Create a CSV writer object.
  with open('output.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)

    # Iterate over the files in the directory and extract the values.
    for root, dirs, files in os.walk(directory):
      for file in files:
        if file.endswith('.txt'):
          file_path = os.path.join(root, file)
          values = parse_file(file_path)
          csv_writer.writerow(values)

  print('Values successfully extracted and written to output.csv.')


if __name__ == '__main__':
  main()