LogStash-JDBC 同步篇
Logstash由三个组件构造成,分别是input、filter以及output。我们可以吧Logstash三个组件的工作流理解为:input收集数据,filter处理数据,output输出数据。
环境准备
docker镜像
#下载docker镜像
$ docker pull logstash:7.11.2
映射文件及文件夹的创建
#方法一
#通过运行logstash镜像,将其config、data、log、pipeline文件夹复制到指定文件夹内
#在对其修改
$ mkdir logstash-7.11.2
$ cd logstash-7.11.2
#运行logstash实例
$ docker run --name logstash -d logstash:7.11.2
$ cd docker exec -it logstash /bin/bash
$ ls
bin CONTRIBUTORS Gemfile jdk logs logstash-core-plugin-api pipeline vendor
config data Gemfile.lock lib logstash-core modules tools x-pack
$ pwd
/usr/share/logstash
$exit
# logstash 配置文件
$ docker cp logstash:/usr/share/logstash/config/* ./config/
# logstash 存储文件
$ docker cp logstash:/usr/share/logstash/data/* ./data/
# logstash 日志文件
$ docker cp logstash:/usr/share/logstash/logs/* ./logs/
# logstash 存储要运行的logstash.conf文件
$ docker cp logstash:/usr/share/logstash/pipeline/* ./pipeline/
#方法二 如果对logstash文件结构特别了解,那么可以通过以下方式创建文件
$ cd logstash-7.11.2
$ mkdir config
$ cd config
$ touch logstash.conf
$ touch logstash.yml
$ touch pipeline.yml
$ cd ..
$ mkdir pipeline
$ mkdir data
$ mkdir logs
$ chmod 777 ../logstash-7.11.2
构建映射文件###### 一、logstash.yml
node.name: logstash-192
# 日志文件目录配置
path.logs: /usr/share/logstash/logs
# 验证配置文件及存在性
config.test_and_exit: false
# 配置文件改变时是否自动加载
config.reload.automatic: false
# 重新加载配置文件间隔
config.reload.interval: 60s
# debug模式 开启后会打印解析后的配置文件 包括密码等信息 慎用
# 需要同时配置日志等级为debug
config.debug: true
log.level: info
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: [ "http://10.132.5.192:9200" ]
#xpack.monitoring.elasticsearch.username: "elastic"
#xpack.monitoring.elasticsearch.password: "ZcPg1CyTW46hbPR2MXgq"
二、pipelines.yml
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
# https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
- pipeline.id: main
path.config: "/usr/share/logstash/config/logstash.conf"
- 此文件用于扫描logstash的配置文件,此处可以配置多个
三、logstash.conf
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
stdin {
}
jdbc {
#jdbc sql server 驱动,各个数据库都有对应的驱动,需自己下载
jdbc_driver_library => "/usr/share/logstash/config/jdbc-sql/ojdbc14-10.2.0.4.0.jar"
#jdbc class 不同数据库有不同的 class 配置
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
#配置数据库连接 ip 和端口,以及数据库
jdbc_connection_string => "jdbc:oracle:thin:@39.98.194.176:1521:orcl"
#配置数据库用户名
jdbc_user => "fc_t1"
#配置数据库密码
jdbc_password => "Bjfc888888"
#上面这些都不重要,要是这些都看不懂的话,你的老板估计要考虑换人了。重要的是接下来的内容。
#设置数据库时区
jdbc_default_timezone => "Asia/Shanghai"
# 定时器 多久执行一次SQL,默认是一分钟
# schedule => 分 时 天 月 年
# schedule => * 22 * * * 表示每天22点执行一次
schedule => "* * * * *"
#你的SQL的位置,当然,你的SQL也可以直接写在这里。
#statement => SELECT * FROM tabeName t WHERE t.creat_time > :last_sql_value
statement_filepath => "/usr/share/logstash/config/jdbc-sql/bjfc_datasource.sql"
#是否记录上次执行结果, 如果为真,将会把上次执行到的 tracking_column 字段的值记录下来,保存到 last_run_metadata_path 指定的文件中
record_last_run => true
#此时该参数就要为 true. 否则默认 track 的是 timestamp 的值.
use_column_value => true
#如果 use_column_value 为真,需配置此参数. 这个参数就是数据库给出的一个字段名称。当然该字段必须是递增的,可以是 数据库的数据时间这类的
#是否需要记录某个column 的值,如果 record_last_run 为真,可以自定义我们需要表的字段名称,
tracking_column => "userid"
codec => plain { charset => "UTF-8"}
tracking_column_type => numeric
# numeric
#们只需要在 SQL 语句中 WHERE MY_ID > :sql_last_value 即可. 其中 :sql_last_value 取得就是该文件中的值
last_run_metadata_path => "/usr/share/logstash/logs/bjfc_datasource_id"
#是否清除 last_run_metadata_path 的记录,如果为真那么每次都相当于从头开始查询所有的数据库记录
clean_run => false
jdbc_paging_enabled => false
#jdbc_page_size => 10000
#是否将字段名称转小写。
#这里有个小的提示,如果你这前就处理过一次数据,并且在Kibana中有对应的搜索需求的话,还是改为true,
#因为默认是true,并且Kibana是大小写区分的。准确的说应该是ES大小写区分
lowercase_column_names => true
#数据类型,标明你属于那一方势力。单了ES哪里好给你安排不同的山头。
type => "bjfc_datasource"
}
}
filter {
ruby {
code => "event.set('timestamp', event.get('@timestamp').time.utc+8*60*60)"
}
ruby {
code => "event.set('@timestamp',event.get('timestamp'))"
}
mutate {
remove_field => ["timestamp"]
}
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["http://10.132.5.192:9200","http://10.132.5.192:9201","http://10.132.5.192:9202"]
index => "bjfc-datasource-%{+YYYY.MM.dd}"
document_id => "%{id}"
user => "elastic"
password => "ZcPg1CyTW46hbPR2MXgq"
}
stdout {
#以JSON格式输出
codec => json_lines
}
}
@注意:
-
这里要特别注意一下@timestamp这个字段日期的日期时间问题,这里需要进行通过以上filter配置来解决
-
statement_filepath:用于指定执行SQL文件的位置
- SQL实例
select to_number(t.id) as userid, t.* from secu_user t where t.id > :sql_last_value order by userid asc -
tracking_column:这个字段是用于标识,用数据那个文件作为查询标识,此标识用来增量查询数据,并存入ES中,必须要求标识字段为数值型,日期型。数值型一定是自增列
-
jdbc_driver_library: 用什么数据,就要搞什么数据的驱动,这个在启动docker时,一定要记得将文件夹映射进去
-
关于input 、filter、output 可以进行logstash官方文档进行查看
四、jvm.options
## JVM configuration
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms1g
-Xmx1g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## Locale
# Set the locale language
#-Duser.language=en
# Set the locale country
#-Duser.country=US
# Set the locale variant, if any
#-Duser.variant=
## basic
# set the I/O temp directory
#-Djava.io.tmpdir=$HOME
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# 这里需要特别注意一下,如果没有配置,打印日志会与系统时间相差8小时
-Duser.timezone=Asia/Shanghai
# use our provided JNA always versus the system one
#-Djna.nosys=true
# Turn on JRuby invokedynamic
-Djruby.compile.invokedynamic=true
# Force Compilation
-Djruby.jit.threshold=0
# Make sure joni regexp interruptability is enabled
-Djruby.regexp.interruptible=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${LOGSTASH_HOME}/heapdump.hprof
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${LS_GC_LOG_FILE}
# Entropy source for randomness
-Djava.security.egd=file:/dev/urandom
# Copy the logging context from parent threads to children
-Dlog4j2.isThreadContextMapInheritable=true
五、startup.options
################################################################################
# These settings are ONLY used by $LS_HOME/bin/system-install to create a custom
# startup script for Logstash and is not used by Logstash itself. It should
# automagically use the init system (systemd, upstart, sysv, etc.) that your
# Linux distribution uses.
#
# After changing anything here, you need to re-run $LS_HOME/bin/system-install
# as root to push the changes to the init script.
################################################################################
# Override Java location
#JAVACMD=/usr/bin/java
# Set a home directory
LS_HOME=/usr/share/logstash
# logstash settings directory, the path which contains logstash.yml
LS_SETTINGS_DIR=/etc/logstash
# Arguments to pass to logstash
LS_OPTS="--path.settings ${LS_SETTINGS_DIR}"
# Arguments to pass to java
LS_JAVA_OPTS=""
# pidfiles aren't used the same way for upstart and systemd; this is for sysv users.
LS_PIDFILE=/var/run/logstash.pid
# user and group id to be invoked as
LS_USER=logstash
LS_GROUP=logstash
# Enable GC logging by uncommenting the appropriate lines in the GC logging
# section in jvm.options
LS_GC_LOG_FILE=/var/log/logstash/gc.log
# Open file limit
LS_OPEN_FILES=16384
# Nice level
LS_NICE=19
# Change these to have the init script named and described differently
# This is useful when running multiple instances of Logstash on the same
# physical box or vm
SERVICE_NAME="logstash"
SERVICE_DESCRIPTION="logstash"
# If you need to run a command or script before launching Logstash, put it
# between the lines beginning with `read` and `EOM`, and uncomment those lines.
###
## read -r -d '' PRESTART << EOM
## EOM
六、log4j2.properties
status = error
name = LogstashPropertiesConfig
appender.console.type = Console
appender.console.name = plain_console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c]%notEmpty{[%X{pipeline.id}]}%notEmpty{[%X{plugin.id}]} %m%n
appender.json_console.type = Console
appender.json_console.name = json_console
appender.json_console.layout.type = JSONLayout
appender.json_console.layout.compact = true
appender.json_console.layout.eventEol = true
rootLogger.level = ${sys:ls.log.level}
rootLogger.appenderRef.console.ref = ${sys:ls.log.format}_console
启动logstash
$ docker run --name logstash \
-v /etc/localtime:/etc/localtime \
-v /elk/logstash-7.11.2/config:/usr/share/logstash/config \
-v /elk/logstash-7.11.2/data:/usr/share/logstash/data \
-v /elk/logstash-7.11.2/logs:/usr/share/logstash/logs \
-v /elk/logstash-7.11.2/pipeline:/usr/share/logstash/pipeline \
-d logstash:7.11.2
验证
$ docker logs -f logstash
[2021-07-16T15:40:00,407][DEBUG][org.logstash.execution.PeriodicFlush][main] Pushing flush onto pipeline.
[2021-07-16T15:40:00,728][INFO ][logstash.inputs.jdbc ][main][5d53df41e3f9932ffcbeea89d04525b197a0e4b64fad6727c2c9a515d41e7100] (0.038733s) select version from PRODUCT_COMPONENT_VERSION where lower(product) like 'oracle%'
[2021-07-16T15:40:00,861][INFO ][logstash.inputs.jdbc ][main][5d53df41e3f9932ffcbeea89d04525b197a0e4b64fad6727c2c9a515d41e7100] (0.025380s) SELECT * FROM (SELECT count(*) "COUNT" FROM (select to_number(t.id) as userid ,t.* from secu_user t where t.id > 114104 order by userid asc
) "T1") "T1" WHERE (ROWNUM <= 1)
[2021-07-16T15:40:00,887][INFO ][logstash.inputs.jdbc ][main][5d53df41e3f9932ffcbeea89d04525b197a0e4b64fad6727c2c9a515d41e7100] (0.018192s) SELECT * FROM (SELECT count(*) "COUNT" FROM (select to_number(t.id) as userid ,t.* from secu_user t where t.id > 114104 order by userid asc
) "T1") "T1" WHERE (ROWNUM <= 1)
[2021-07-16T15:40:00,888][DEBUG][logstash.inputs.jdbc ][main][5d53df41e3f9932ffcbeea89d04525b197a0e4b64fad6727c2c9a515d41e7100] Executing JDBC query {:statement=>"select to_number(t.id) as userid ,t.* from secu_user t where t.id > :sql_last_value order by userid asc\n", :parameters=>{:sql_last_value=>114104}, :count=>0}
[2021-07-16T15:40:00,913][INFO ][logstash.inputs.jdbc ][main][5d53df41e3f9932ffcbeea89d04525b197a0e4b64fad6727c2c9a515d41e7100] (0.021819s) select to_number(t.id) as userid ,t.* from secu_user t where t.id > 114104 order by userid asc
[2021-07-16T15:40:02,872][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2021-07-16T15:40:02,872][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"} ~~~