\
一、背景
k8s集群中的服务需要收集日志,服务在集群中是以pod的形式部署,即收集容器中的日志。
二、方案
容器日志分为两种
-
控制台日志:控制台日志存在宿主机docker目录下,将容器目录挂载到fluentd中,fluentd收集日志并通过kafka达到ELK中。
-
用户自己打的日志:需要把容器内部日志挂载到第三方存储里面,然后通过fluted将日志通过kafka送到ELK。
用户可在ELK中根据自己的需求进行日志检索、服务监控及报警等配置。
三、配置
本方案中主要是配置yaml文件,yaml中包含fluentd的配置,以configMap给出,以及fluentd的deployment。
fluentd的配置文件,可以参考官网介绍。
主要包含: Prometheus指标配置:
@type tail 日志收集相关配置
增加自定义Prometheus指标
配置kafka
- configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config-pro #这个名称要和deployment中的对应
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
data:
fluent.conf: |
# Prometheus暴露指标
# input plugin that exports metrics
<source>
@type prometheus
bind 0.0.0.0
port 24231 # 对外暴露Prometheus指标的端口
metrics_path /metrics
aggregated_metrics_path /aggregated_metrics
</source>
<source>
@type monitor_agent
bind 0.0.0.0
port 24220
</source>
<source>
@type forward
bind 0.0.0.0
port 24224
</source>
# input plugin that collects metrics for output plugin
<source>
@type prometheus_output_monitor
<labels>
host ${hostname}
</labels>
</source>
# input plugin that collects metrics for in_tail plugin
<source>
@type prometheus_tail_monitor
<labels>
host ${hostname}
</labels>
</source>
## 收集文件日志相关配置
<source>
@type tail
path /home/work/logs/hadoop_nm_test/* # 需要收集的日志路径,可以用*匹配
tag logpath.* # 添加tag,后面根据tag筛选和过滤
rotate_wait 120
refresh_interval 10
read_from_head true
#time_key event_time
keep_time_key true
#path_key sourceid
#limit_recently_modified 86400
#timekey_zone Asia/Shanghai
pos_file_compaction_interval 1h
pos_file /home/work/log-pos/fluentd-log-1.pos # fluentd读取日志文件的位置,最好保存在容器外,否则pod重启会导致日志从头开始收集。
<parse>
@type none
</parse>
</source>
<filter **>
@type record_transformer
<record>
nodeip "#{ENV['MY_NODE_IP']}"
</record>
</filter>
<filter **>
@type record_transformer
<record>
message ${record["nodeip"]} ${record["message"]}
</record>
</filter>
<filter logpath.home.work.face.logs.hadoop_nm_test.**>
@type prometheus
# You can use counter type without specifying a key
# This just increments counter by 1
<metric>
name de_input_num_records_total # 读取的总行数指标,用于Prometheus监控
type counter
desc The total number of input records
<labels>
tag ${tag}
host ${hostname}
host $.nodeip
</labels>
</metric>
</filter>
# 配置kafka
<match logpath.home.work.face.logs.hadoop_nm_test.**>
@type copy
# for MonitorAgent sample
<store>
@type kafka2
brokers [kafka节点] # kafka集群的节点
<buffer topic>
@type file
path /fluentd/buffer/td/api
flush_interval 3s
</buffer>
<format>
@type single_value
</format>
default_topic [kafka topic] # kafka集群的topic
</store>
<store>
@id caelus_prometheus
@type prometheus
<metric>
name de_output_num_records_total # 输出的总行数,用于Prometheus监控
type counter
desc The total number of outgoing records
<labels>
tag ${tag}
host ${hostname}
host $.nodeip
</labels>
</metric>
</store>
</match>
- deployment
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-pro
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
annotations:
configmap.reloader.stakater.com/reload: "fluentd-config-name" # fluentd-config-name是configmap名称
spec:
selector:
matchLabels:
k8s-app: fluentd-logging
version: v1
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
env:
- name: MY_NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
# image: fluentd:v1.14.0-1.0
image: registry.ke.com/docker-virtual/common/fluentd:v1.10.2-1.0
args:
#- -v
# - -c /fluentd/etc/fluent.conf
#command: [ "sleep", "3600" ]
securityContext:
runAsUser: 0
allowPrivilegeEscalation: false
imagePullPolicy: IfNotPresent
ports:
- name: fluentd
containerPort: 24231 # 指标暴露方式采用的是NodePort的形式
hostPort: 24231
resources:
limits:
cpu: 500m
memory: 1000Mi
requests:
cpu: 500m
memory: 1000Mi
volumeMounts:
- name: api-logs
mountPath: /home/work/face/logs
readOnly: true
- name: tz-config
mountPath: /etc/localtime
readOnly: true
- name: config-source
mountPath: /fluentd/etc/fluent.conf
subPath: fluent.conf
- name: fluted-log-pos
mountPath: /home/work/log-pos
terminationGracePeriodSeconds: 30
volumes:
- name: api-logs
hostPath:
path: /home/work/face/logs # 宿主机上日志文件位置
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
- name: fluted-log-pos
hostPath:
path: /home/work/face/log-ops
- name: config-source
configMap:
name: fluentd-config-pro # 对应上面的configmap
items:
- key: fluent.conf
path: fluent.conf
四、fluentd的配置文件热加载
在部署fluentd的过程中,经常有新增加的服务也需要收集日志,此时只需要修改configmap 增加fluentd的配置即可,但是仅仅修改configmap,pod中并不会热加载这个配置。
所以使用一个插件Reloader,部署该插件。
kubectl apply -f https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
并在deployment中添加
annotations: configmap.reloader.stakater.com/reload: "fluentd-config-name"
【注意】:这种方式的热加载会导致pod重启,所以fluentd中的pos_file
一定要挂载在容器外,否则每次都会重新收集日志。
五、踩坑
-
日志文件下有子目录情况 如果source位置配置的目录下面还有子目录,fluentd是不会收集子目下的文件的,所以需要把每个子目录单独添加。source是支持同时配置多个目录的,中间用,分隔即可。
-
收集docker容器控制台日志报错
2018-08-03 06:36:53 +0000 [warn]: /var/log/containers/samplelog-79bd66868b-t7xn9_logging1_fluentd-70e85c5d6328e7d.log unreadable. It is excluded and would be examined next time. 2018-08-03 06:37:53 +0000 [warn]: /var/log/containers/samplelog-79bd66868b-t7xn9_logging1_fluentd-70e85c5bc89ab24.log unreadable. It is excluded and would be examined next time.
是因为/var/log/containers是软链到/var/lib/containers或者/home/work/docker/containers下面的,所以实际的日志是存储在软连的源地址。
需要把原地址也挂载到pod中