DSV并行处理器
规格文件
DSV并行处理器通过一个spec文件(通常命名为 "spec.toml")接受输入文件和查询规范。
示例 spec.toml
[[input]]
# filePaths = [ # list all input file in a list
# "test_data/test_data2.txt"
# ]
directory = "test_data" # or specify just the input directory
separator = "|"
[[output]]
outputFile = "output.tsv" # name the output file
separator = "\t"
# each filter condition is listed below
# example of string filter
[[filters]]
column = 16 # specify column to filter (0th-index)
columnType = "string" # available type are string, number, datetime
values = [ # list accepted value as a list
"OPTION2",
"OPTION1"
]
# valueFile = "filter.txt" # or read value from a file, one line per one value
[[filters]]
column = 1
columnType = "string"
valueFile = "account_list.txt"
# Example of number filter
# [[filters]]
# column = 6 # specify column to filter (0th-index)
# columnType = "number" # available type are string, number, datetime
# condition = "<" # available condition "<", "<=", ">", ">=", "=="
# value = "250" # condition value to check
# Example of datetime filter
# [[filters]]
# column = 3 # specify column to filter (0th-index)
# columnType = "datetime" # available type are string, number, datetime
# condition = "<" # available condition "<", "<=", ">", ">=", "=="
# datetimeFormat = "02/01/2006" # specify datetime format using Golang's notation.
# value = "01/01/2015" # condition value to check
# Golang datetime format can be found at https://programming.guide/go/format-parse-string-time-date-example.html
使用Docker运行
提供的filter_csv.sh 脚本将作为Docker容器运行程序。
./filter_csv.sh spec.toml data_dir
请注意以下细节。
- 所有的输入和输出文件都被挂载到容器的/data目录中。因此,spec文件中的所有_data_dir_路径都必须被替换为
/data/