[TOC]
近年来filebeat强势崛起,以其轻量级引起众多开发者的青睐。笔者认为filebeat远没有达到能够完全取代logstash的程度,filebeat很多功能都是以logstash作为互补品出现的。
官网 Logstash Reference [7.16] | Elastic
一、使用supervisord保证logstash长期运行
1. 使用 supervisord 管理logstash,使用此方式启动logstash,便于维护
- 停止
- 查看状态
- 启动
service supervisord stop
service supervisord status
service supervisord start
3. 修改/etc/supervisord.conf 文件新增配置项
directory: 工作目录user: 以某用户执行命令commnad: 要执行的命令
[program:info]
environment=LS_HEAP_SIZE=5000m
directory=/home/elastic/logstash-7.16.2
user=elastic
command=/home/elastic/logstash-7.16.2/bin/logstash -f /home/elastic/logstash-7.16.2/config/fullinfo.yml --api.http.port 9600 -w 10 -l /home/elastic/logstash-7.16.2/logs/fullinfo
[program:lexicon]
environment=LS_HEAP_SIZE=5000m
directory=/home/elastic/logstash-7.16.2
user=elastic
command=/home/elastic/logstash-7.16.2/bin/logstash -f /home/elastic/logstash-7.16.2/config/smallinfo.yml --api.http.port 9601 -w 10 -l /home/elastic/logstash-7.16.2/logs/smallinfo
3.通过supervisordctl启动或者停止进程
- 启动
start - 停止
stop - 更新配置文件( 只改动了某个配置文件,只想重载这个配置文件然后重启)
update - 重新读取配置文件
reread - 重启
reload - 重启所有
supervisorctl restart all
supervisorctl start info
supervisorctl stop info
supervisorctl update
supervisorctl reread
supervisorctl reload
二、logstash常见问题总结
1. input消费Kafka中消息时,group_id 的名字的设置
使用机器ip,通过kafka consumers 可以很快找到哪台机器的Logstash做的消费 如果是两台机器使用同一个消费者组,将两台机器的ip通过字符"_"拼接做为消费组名
2. 修改数据
- 移除某个字段
- 丢弃不满足条件的数据
- 显示
@metadata数据stdout { codec => rubydebug { metadata => true } }
filter{
mutate {
copy => [ "sourceTime" , "indexTag" ] # 新增 indexTag field,值使用sourceTime的值得
remove_field => ["sourceName"] # 移除sourceName字段
rename => ["title","dictionaries"] #将源数据中的title重命名为dictionaries
}
truncate { # 截取 indexTag 保留7位长度
fields => [ "indexTag" ]
length_bytes => 7
}
if [indexTag] <= "2021-11" { # 将 indexTag <= "2021-11" 数据丢弃
drop {
}
}
}
3. 同步到两个Elasticsearch
通过添加元数据字段来控制数据最终到哪一个索引
input {
kafka {
bootstrap_servers => ["172.26.17.97:9092,172.26.17.98:9092,172.26.17.99:9092"]
client_id => "info"
group_id => "logstash_ip91_to_es_ip91_info"
auto_offset_reset => "earliest"
consumer_threads => 6
decorate_events => "true"
topics => ["info.news.before2022-01-01","info.news.increment"]
codec => "json"
}
}
filter{
mutate {
copy => { "sourceTime" => "year_tag" }
}
mutate {
copy => { "sourceTime" => "month_tag" }
}
truncate {
fields => [ "month_tag" ]
length_bytes => 7
}
truncate {
fields => [ "year_tag" ]
length_bytes => 4
}
mutate {
add_field => { "[@metadata][fullinfo_tag]" => "fullinfo-%{year_tag}" }
add_field => { "[@metadata][smallinfo_tag]" => "smallinfo-%{month_tag}" }
remove_field => ["month_tag"]
remove_field => ["year_tag"]
}
}
output {
elasticsearch {
hosts => ["http://172.26.17.91:9200"]
index => "%{[@metadata][fullinfo_tag]}"
template => "/home/elastic/logstash-7.16.2/config/info-template.json"
template_name => "info-template"
template_overwrite => false
timeout => 300
user => "elastic"
password => "a#123456"
document_id => "%{id}"
}
if [@metadata][smallinfo_tag] >= "smallinfo-2021-12" {
elasticsearch {
hosts => ["http://172.26.17.91:9200"]
index => "%{[@metadata][smallinfo_tag]}"
template => "/home/elastic/logstash-7.16.2/config/info-template.json"
template_name => "info-template"
template_overwrite => false
timeout => 300
user => "elastic"
password => "a#123456"
document_id => "%{id}"
}
}
}
三、使用pipelines.yml
./bin/logstash使用的pipelines.yml创建流水线,而无法通过-f 参数指定pipelines.yml文件。笔者曾被这个问题坑了
Multiple Pipelinesedit
If you need to run more than one pipeline in the same process, Logstash provides a way to do this through a configuration file called pipelines.yml. This file must be placed in the path.settings folder and follows this structure:
- pipeline.id: my-pipeline_1
path.config: "/etc/path/to/p1.config"
pipeline.workers: 3
- pipeline.id: my-other-pipeline
path.config: "/etc/different/path/p2.cfg"
queue.type: persisted
This file is formatted in YAML and contains a list of dictionaries, where each dictionary describes a pipeline, and each key/value pair specifies a setting for that pipeline. The example shows two different pipelines described by their IDs and configuration paths. For the first pipeline, the value of pipeline.workers is set to 3, while in the other, the persistent queue feature is enabled. The value of a setting that is not explicitly set in the pipelines.yml file will fall back to the default specified in the logstash.yml settings file.
When you start Logstash without arguments, it will read the pipelines.yml file and instantiate all pipelines specified in the file. On the other hand, when you use -e or -f, Logstash ignores the pipelines.yml file and logs a warning about it.
-f 只是指定配置文件,此时会 ignores the pipelines.yml,而pipelines.yml文件能配置多个流水线
注意通过 ./bin/logstah -f <file,file_dir>
-f, --path.config CONFIG_PATH Load the logstash config from a specific file
or directory. If a directory is given, all
files in that directory will be concatenated
in lexicographical order and then parsed as a
single config file. You can also specify
wildcards (globs) and any matched files will
be loaded in the order described above.