DSV 并行处理器通过规范文件获取输入文件和查询规范

191 阅读1分钟

DSV并行处理器

规格文件

DSV并行处理器通过一个spec文件(通常命名为 "spec.toml")接受输入文件和查询规范。

示例 spec.toml

[[input]]
# filePaths = [                          # list all input file in a list
#     "test_data/test_data2.txt"
# ]

directory = "test_data"                  # or specify just the input directory
separator = "|"

[[output]]
outputFile = "output.tsv"                # name the output file
separator = "\t"

# each filter condition is listed below

# example of string filter
[[filters]]
column = 16                              # specify column to filter (0th-index)
columnType = "string"                    # available type are string, number, datetime
values = [                               # list accepted value as a list
    "OPTION2",
    "OPTION1"
]

# valueFile = "filter.txt"               # or read value from a file, one line per one value

[[filters]]
column = 1
columnType = "string"
valueFile = "account_list.txt"

# Example of number filter
# [[filters]]
# column = 6                             # specify column to filter (0th-index)
# columnType = "number"                  # available type are string, number, datetime
# condition = "<"                        # available condition "<", "<=", ">", ">=", "=="
# value = "250"                          # condition value to check

# Example of datetime filter
# [[filters]]
# column = 3                             # specify column to filter (0th-index)
# columnType = "datetime"                # available type are string, number, datetime
# condition = "<"                        # available condition "<", "<=", ">", ">=", "=="
# datetimeFormat = "02/01/2006"          # specify datetime format using Golang's notation.
# value = "01/01/2015"                   # condition value to check

# Golang datetime format can be found at https://programming.guide/go/format-parse-string-time-date-example.html

使用Docker运行

提供的filter_csv.sh 脚本将作为Docker容器运行程序。

./filter_csv.sh spec.toml data_dir

请注意以下细节。

  1. 所有的输入和输出文件都被挂载到容器的/data目录中。因此,spec文件中的所有_data_dir_路径都必须被替换为/data/

GitHub

github.com/wattanit/ds…