Logstash安装与使用

312 阅读4分钟

Logstash安装与使用

一、 介绍

集中、转换和存储数据

logstash是使用jruby实现的,非常耗资源。

www.elastic.co/cn/products…

Logstash 是开源的服务器端数据处理管道,能够同时从多个来源采集数据,转换数据,然后将数据发送到您最喜欢的 “存储库” 中。(我们的存储库当然是 Elasticsearch。)

二、安装

1. 导入公钥

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

2. 添加yum源

把以下内容添加至/etc/yum.repos.d/目录下,文件以.repo结尾,例如:logstash.repo

[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

安装源为官方提供,速度比较慢,可换为国内镜像网站的地址,如清华大学开源镜像站的地址:mirrors.tuna.tsinghua.edu.cn/elasticstac…

3. 安装Logstash

~]# yum install -y logstash

4. 配置文件

jvm参数配置:/etc/logstash/jvm.options

日志配置:/etc/logstash/log4j2.properties

把自己写好的配置文件,统一放在 /etc/logstash/ 目录下(注意目录下所有配置文件都应该是 .conf 结尾,且不能有其他文本文件存在。因为 logstash agent 启动的时候是读取全文件夹的),然后运行 systemctl start logstash 命令即可。

三、 配置讲解

1. 配置段

Logstash分三个配置段,分别是input, filter, output

示例:

input {								# 输入插件,定义接收从何而来的文件流。支持多种插件,如stdin, redis, kafka, file, beats等多种插件
	PluginName {
        
	}
}

filter {							# 过滤器插件,对input输入的原始数据进行各种处理。
    
}

output {							# 输出插件。定义处理后的日志输出至何处。
	stdout {
		codec => rubydebug 			# 输出使用codec参数,并指明值的格式为rubydebug
	}
}

2. 配置示例:

2.1 从标准输入收集日志:

input {
        stdin {
        
        }
}
output {
        stdout {
                codec => rubydebug
        }
}

# 输出
{
    "@timestamp" => 2021-05-29T09:32:29.271Z,
          "host" => "logstash",
      "@version" => "1",
       "message" => "hello elk"
}

2.2 file插件

path:读取文件的路径,数据类型是array,需要加中括号

start_position:文件从哪儿读,可用beginning(文件读取停下之后,下次从停下的位置开始读)、end

delimiter:指定分行符,默认换行符

input {
        file {
              path => ["/var/log/httpd/access_log"]
              start_position => "beginning"
        }
}
output {
        stdout {
                codec => rubydebug
        }
}

输出:

{
          "path" => "/var/log/httpd/access_log",
          "host" => "logstash",
      "@version" => "1",
       "message" => "192.168.100.1 - - [29/May/2021:17:55:47 +0800] \"GET / HTTP/1.1\" 304 - \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36\"",
    "@timestamp" => 2021-05-29T09:55:48.444Z
}

2.3 grok

解析文本并进行结构化处理。

日志信息格式都一样,但是整个信息是一行,分别加键,用正则表达式分析每一行,做特定匹配,对每个特定匹配加不同的键。每次匹配都取出来不同的内容,加不同的键。

grok能实现把对应的文本信息读出来之后,额外单独加键。

2.3.1 示例1

日志:

55.3.244.1 GET /index.html 15824 0.043

pattern:

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

logstash配置:

input {
	file {
		path => "/tmp/tmpfile.log"
		}
}
filter {
	grok {
		match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
	}
}
output {
        stdout {
                codec => rubydebug
        }
}

经过filter处理后,会输出如下信息:

{
      "@version" => "1",
        "client" => "55.3.244.1",
    "@timestamp" => 2021-05-29T10:17:32.599Z,
        "method" => "GET",
       "request" => "/index.html",
         "bytes" => "15824",
          "path" => "/tmp/tmpfile.log",
      "duration" => "0.043",
          "host" => "logstash",
       "message" => "55.3.244.1 GET /index.html 15824 0.043"
}

gork patterns:grokdebug.herokuapp.com/patterns#

正则表达式库:github.com/kkos/onigur…

2.3.2 示例2:自定义pattern

格式:

(?<field_name>the pattern here)

自定义pattern(/etc/logstash/conf.d/patterns):

REQUESTMETHOD \b[A-Z]{,5}\b
input {
	file {
		path => "/tmp/tmpfile.log"
		}
}
filter {
	grok {
		patterns_dir => "/etc/logstash/conf.d/patterns"
		match => { "message" => "%{IPV4:clientip} %{REQUESTMETHOD:request_method} %{URIPATH:request_uri} %{INT:bytes}" }
	}
}
output {
        stdout {
                codec => rubydebug
        }
}
2.3.3 内建键

有些正则表达式匹配非常困难,grok加入了一些内建的键。如COMBINEDAPACHELOG(apache日志)

示例:

input {
        file {
              path => ["/var/log/httpd/access_log"]
              start_position => "beginning"
        }
}
filter {
        grok {
               match => {
               "message" => "%{COMBINEDAPACHELOG}"
              }
         }
}
output {
        stdout {
                codec => rubydebug
        }
}

输出:

{
           "path" => "/var/log/httpd/access_log",
           "host" => "logstash",
       "@version" => "1",
       "clientip" => "192.168.100.1",
    "httpversion" => "1.1",
           "auth" => "-",
       "response" => "304",
        "request" => "/",
          "ident" => "-",
      "timestamp" => "29/May/2021:17:49:43 +0800",
          "agent" => "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36\"",
     "@timestamp" => 2021-05-29T09:49:43.993Z,
       "referrer" => "\"-\"",
        "message" => "192.168.100.1 - - [29/May/2021:17:49:43 +0800] \"GET / HTTP/1.1\" 304 - \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36\"",
           "verb" => "GET"
}

2.5 删除多余的行

示例:

input {
        file {
              path => ["/var/log/httpd/access_log"]
              start_position => "beginning"
        }
}
filter {
         grok {
               match => {
               "message" => "%{HTTPD_COMBINEDLOG}"
               }
         remove_field => "message"
         }
}
output {
        stdout {
                codec => rubydebug
        }
}

2.6 datafiler插件

把timestamp转换为@timestamp的格式并存入@timestamp内

@timestamp" => 2018-06-20T14:50:05.061Z, #获取日志时间

timestamp" => "20/Jun/2018:22:50:04 +0800 #日志中字段timestamp,生成日志的时间。

data插件的作用是把日志中的时间转换为@timestamp的格式并存入@timestamp内,并把timestamp删除

input {
        file {
              path => ["/var/log/httpd/access_log"]
              start_position => "beginning"
        }
}
filter {
         grok {
               match => {
               "message" => "%{HTTPD_COMBINEDLOG}"
               }
         }
         date {
                match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
                remove_field => "timestamp"
               }
}
output {
        stdout {
                codec => rubydebug
        }
}

2.6 从beats收集日志并输出至elasticsearch:

input {
        beats {
                port => "5044"
        }
}
output {
        elasticsearch {
                hosts => "192.168.100.43"
                index => "logstash-%{+YYYY.MM.dd}"
        }
}

查看生成的索引:

~]# curl -XGET http://192.168.100.43:9200/_cat/indices
yellow open logstash-2021.05.29 s_DLkrkNQUGH1Xh54gfQbw 1 1 15 0 71.8kb 71.8kb