一、背景
-
背景:本地搭建一套稳定性组件
-
技术定位:高级
-
目标群体:研发/运维
-
技术应用场景:操作sop
-
整体思路:
-
application生产数据(logs,traces,metrics)
-
opentelemetry agent采集数据
-
数据存储
- otel-agent发送给otel-collector,转发给jaeger存储traces jaeger-ui展示
- prometheus对接 otel-agent存储metrics grafana展示
- filebeat收集日志,转发elasticsearch存储logs,kibana展示
-
二、操作步骤
2.1 开发前的准备工作
准备工作一 docker环境准备
- 安装docker desktop
- 配置docker镜像源为国内镜像源及环境变量
- 熟悉docker基础命令
| container | images | run | exec |
|---|---|---|---|
| docker ps -a | docker images -a | docker run -d --name--network=''-p 1010:1010-v {pod的路径}Container | Docker exec-itcontainer_idbash |
| docker rm <container_id> | docker images rm <image_id> | ||
| docker stop <container_id> |
准备工作二 docker镜像拉取
-
otel-collector-contrib(latest)
-
Jaeger/all-in-one (1.72.0)
-
Elasticsearch(9.1.2)
-
Filebeat
-
Promethus
-
Grafana
docker pull otel-collector-contrib:latest
docker pull jaeger/all-in-one:1.72.0
docker pull elasticsearch:9.1.2
docker pull prom/prometheus:3.4.1
docker pull grafana:latest
2.2 操作阶段
mac不支持localhost访问pod内
故使用本机的真实ip 10.12.179.210
操作前需自行替换为 自己机器的ip
Jaeger
jaeger提供了直接对接opentelemetry的collector,为了使用otel-collector
映射端口为 14317和14318
docker stop jaeger
docker rm jaeger
docker run --rm --name jaeger -d\
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 16686:16686 \
-p 14317:4317 \
-p 14318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
-e JAEGER_TLS_INSECURE=false \
jaegertracing/all-in-one:1.72.0
浏览器访问localhost:16686/search是否刷出ui页面
Prometheus
配置文件
# prometheus/prometheus.yml
global:
scrape_interval: 15s # 抓取间隔(默认 15s)
evaluation_interval: 15s
scrape_configs:
- job_name: "otel-agent"
scrape_interval: 30s # 仅抓取 OTel Agent 的指标(可选调整)
static_configs:
- targets: ["10.12.179.210:9090"] # Prometheus 自身地址(无需修改)
# 可选:添加 OTel Agent 暴露的指标端点(若 OTel Agent 暴露 /metrics)
# - job_name: "otel-agent-metrics"
# static_configs:
# - targets: ["otel-agent:4317"] # OTel Agent 的 metrics 端口(默认 4317)
remote_write:
- url: "http://10.12.179.210:9090/api/v1/write"
docker启动脚本
docker stop prometheus
docker rm prometheus
mkdir -p $(pwd)/data
docker run -d --name=prometheus\
-p 9090:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
-v $(pwd)/data:/prometheus \
prometheus:3.4.1 \
--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --web.enable-remote-write-receiver
#这里一定要覆盖镜像内的CMD 使用--config.file --storage.tsdb.path --web.enable-remote-write-receiver
检查
ELK
Filebeat
配置文件
# filebeat/filebeat.yml
filebeat.inputs:
- type: filestream # 推荐使用 filestream(支持日志轮转、事件去重)
id: test-service
paths:
- /var/log/test-service.log # 宿主机日志路径(需与实际一致)
fields:
service.name: test-service # 自定义字段(ES 中筛选用)
environment: dev
fields_under_root: true # 字段提升到根层级
parsers:
- ndjson: # 若日志为 NDJSON 格式(推荐)
target: "log" # 指定解析后的字段路径
output.elasticsearch:
hosts: ["http://localhost:9200"] # HTTPS 地址(9.x 默认启用 TLS)
# username: elastic # 默认用户名
# password: "your-elastic-password" # 替换为步骤 3 中设置的密码
# api_key: "" # 若使用 API Key 认证(可选,优先推荐用户名密码)
index: "logs" # 索引名(按日期滚动)
index_pattern: "logs-%Y-%m-%d" # 显式声明动态模式(可选)
ssl.verification_mode: none # 验证 SSL 证书(生产环境推荐)
timeout: 30 # 连接超时时间(秒)
setup.template.name: "desktop"
setup.template.pattern: "desktop-*"
docker_run.sh
docker stop filebeat
docker rm filebeat
docker run -it \
-d \
--name filebeat \
-e TZ=Asia/Shanghai \
--network=host \
-v ${日志目录}:/var/log \
-v $(pwd)/filebeat.yml:/usr/share/filebeat/filebeat.yml \
-v $(pwd)/data:/usr/share/filebeat/data \
-v $(pwd)/logs:/usr/share/filebeat/logs \
elastic/filebeat:9.1.2 \
filebeat -e -c /usr/share/filebeat/filebeat.yml
Elasticsearch
docker stop elasticsearch
docker rm elasticsearch
mkdir -p $(pwd)/logs
mkdir -p $(pwd)/data
mkdir -p $(pwd)/plugins
docker run -d --name elasticsearch --network=elastic -p 9200:9200 -p 9300:9300 -it -m 2GB \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "xpack.security.http.ssl.enabled=false" \
-e "cluster.routing.allocation.disk.threshold_enabled=false" \
-v $(pwd)/data:/usr/share/elasticsearch/data \
-v $(pwd)/logs:/usr/share/elasticsearch/logs \
elasticsearch:9.1.2
Otel-collector-contrib
选择otel-collector-contrib而不选择otel-collector,是应为前者内置支持的内容多
在config.yaml中
collector需要定义的配置比较多,不能单纯的使用环境变量来操作,所以这里我们新建文件挂载到pod目录
receivers相当于数据源(输入源)
processors定义操作的步骤
exporters定义输出源(对接源)
services通过pipelines来编排他们
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
send_batch_size: 50
send_batch_max_size: 256
timeout: 10s
exporters:
otlphttp:
endpoint: http://10.12.179.210:14317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://10.12.179.210:9090/api/v1/write
tls:
insecure: true
service:
# extensions:
# - file_storage
pipelines:
# 跟踪数据处理流水线
traces:
receivers: [ otlp ] # 接收 OTLP 和 Jaeger 数据
processors: [ ] # 应用批处理
exporters: [ otlphttp ] # 导出到 OTLP 后端
metrics:
receivers: [ otlp ] # 接收 OTLP 和 Jaeger 数据
processors: [ ] # 应用批处理
exporters: [ prometheusremotewrite ] # 导出到 OTLP 后端
# logs:
# receivers: [filelog]
# processors: [batch]
# exporters: [elasticsearch]
sh文件和config.yaml在同一文件夹下即可
mkdir -p ~/data/docker_data/otel/data
docker stop otel-collector-contrib
docker rm otel-collector-contrib
docker run --name=otel-collector-contrib -d \
-p 4317:4317 -p 4318:4318 \
-v $(pwd)/contrib_config.yaml:/etc/otelcol-contrib/config.yaml \
otel/opentelemetry-collector-contrib
Application
创建一个springboot的项目,通过opentelemetry agent方式将信息发给otel-collector
<? xml version="1.0" encoding="UTF-8" ?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.0</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>project-start</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>21</java.version>
<spring-ai.version>1.0.0-M1</spring-ai.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.springframework.ai</groupId>-->
<!-- <artifactId>spring-ai-openai-spring-boot-starter</artifactId>-->
<!-- </dependency>-->
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.mybatis.spring.boot</groupId>-->
<!-- <artifactId>mybatis-spring-boot-starter</artifactId>-->
<!-- <version>3.0.3</version>-->
<!-- </dependency>-->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-spring-boot3-starter</artifactId>
<version>3.5.5</version>
</dependency>
<dependency>
<groupId>jakarta.persistence</groupId>
<artifactId>jakarta.persistence-api</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>easyexcel</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>cn.afterturn</groupId>
<artifactId>easypoi-base</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>cn.afterturn</groupId>
<artifactId>easypoi-web</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>cn.afterturn</groupId>
<artifactId>easypoi-annotation</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-jaeger</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-logging-otlp</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>1.20.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>1.20.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
</project>
agent配置文件
仅支持properties,不支持yaml
#服务名
otel.service.name=test-service
otel.logs.exporter = none otel.traces.exporter = otlp otel.metrics.exporter = otlp otel.exporter.otlp.endpoint = http://10.12.179.210:4318
java
-javaagent: ${agent所在目录}/opentelemetry-javaagent.jar
-Dotel.javaagent.configuration-file=${配置所在目录所在目录}/file.properties
-jar app.jar
application.properties
这里我在本地启动了一个mysql,没有的话可以移除datasource和jpa相关的配置.自己启动一个web项目
spring.application.name=project-start
spring.datasource.url=jdbc:mysql://10.12.179.210:3306/srm_v5_xinyacb?allowPublicKeyRetrieval=true
spring.datasource.driverClassName=com.mysql.cj.jdbc.Driver
spring.datasource.username=root
spring.datasource.password=123456
spring.jpa.database=MySQL
#spring.jpa.database-platform=org.hibernate.dialect.MySQLDialect
spring.jpa.show-sql=true
log.dir.base=./logs/
logback-spring.xml 文件配置
<? xml version="1.0" encoding="UTF-8" ?>
<configuration>
<springProperty scope="context" name="logDirBase" source="log.dir.base"/>
<!-- 自定义参数 -->
<property name="log.maxHistory" value="1"/>
<property name="log.baseDir" value="${logDirBase}"/>
<property name="log.appName" value="testService"/>
<property name="log.project" value="test-service"/>
<property name="log.charset" value="UTF-8"/>
<property name="log.level" value="INFO"/>
<property name="log.pattern" value="%date{yyyy-MM-dd HH:mm:ss.SSS Z} [%thread] %-5p [%c] [%F:%L] - %X{trace_id} %msg%n"/>
<!-- 本地环境 -->
<!-- <springProfile name="local">-->
<!-- <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">-->
<!-- <encoder>-->
<!-- <pattern>${log.pattern}</pattern>-->
<!-- </encoder>-->
<!-- </appender>-->
<!-- </springProfile>-->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>${log.pattern}</pattern>
</encoder>
</appender>
<root level="${log.level}">
<appender-ref ref="CONSOLE"/>
</root>
<!-- 非本地环境 -->
<springProfile name="!local">
<!-- info日志 -->
<appender name="INFO_LOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${log.baseDir}/${log.project}/${log.appName}_info.log</file>
<encoder>
<pattern>${log.pattern}</pattern>
<charset>${log.charset}</charset>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<FileNamePattern>${log.baseDir}/${log.project}/${log.appName}_info.%d{yyyyMMdd}.%i.log
</FileNamePattern>
<maxFileSize>500MB</maxFileSize>
<MaxHistory>${log.maxHistory}</MaxHistory>
<totalSizeCap>10GB</totalSizeCap>
</rollingPolicy>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>INFO</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
</appender>
<!--warning、error日志-->
<appender name="ERROR_LOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${log.baseDir}/${log.project}/${log.appName}_error.log</file>
<encoder>
<pattern>${log.pattern}</pattern>
<charset>${log.charset}</charset>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<FileNamePattern>${log.baseDir}/${log.project}/${log.appName}_error.%d{yyyyMMdd}.%i.log
</FileNamePattern>
<maxFileSize>500MB</maxFileSize>
<MaxHistory>10</MaxHistory>
<totalSizeCap>10GB</totalSizeCap>
</rollingPolicy>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>WARN</level>
</filter>
</appender>
<logger name="RocketmqClient" level = "OFF"/>
<logger name="org.apache.rocketmq" level = "OFF"/>
<logger name="org.apache.catalina" level = "OFF"/>
<logger name="org.apache.tomcat" level = "OFF"/>
<logger name="org.apache.coyote" level = "OFF"/>
<root level="${log.level}">
<appender-ref ref="INFO_LOG"/>
<appender-ref ref="ERROR_LOG"/>
</root>
</springProfile>
</configuration>
三、总结
测试物料 gcnv83pcpspw.feishu.cn/wiki/AHZWwr…
Docker compose
version: "3.8"
name: "otel_in_action"
networks:
elastic:
host:
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "14318:4318"
- "14317:4317"
- "9411:9411"
- "16686:16686"
- "14250:14250"
- "14268:14268"
- "14269:14269"
container_name: jaeger
environment:
COLLECTOR_ZIPKIN_HOST_PORT: 9411
JAEGER_TLS_INSECURE: true
prometheus:
image: prometheus:3.4.1
ports:
- "9090:9090"
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/data:/data:/prometheus
command: --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --web.enable-remote-write-receiver
elasticsearch:
image: elasticsearch:9.1.2
ports:
- "9200:9200"
- "9300:9300"
container_name: elasticsearch
volumes:
- ./elk/elasticsearch/data:/usr/share/elasticsearch/data
- ./elk/elasticsearch/logs:/usr/share/elasticsearch/logs
networks:
- elastic
environment:
- "discovery.type=single-node"
- "xpack.security.enabled=false"
- "xpack.security.http.ssl.enabled=false"
- "cluster.routing.allocation.disk.threshold_enabled=false"
filebeat:
image: elastic/filebeat:9.1.2
container_name: filebeat
environment:
- TZ=Asia/Shanghai
networks:
- host
volumes:
- ../logs:/var/log
- ./elk/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml
- ./elk/filebeat/data:/usr/share/filebeat/data
- ./elk/filebeat/logs:/usr/share/filebeat/logs
command: ["filebeat", "-e", "-c", "/usr/share/filebeat/filebeat.yml"]
otel-collector-contrib:
image: otel/opentelemetry-collector-contrib:latest
ports:
- "4317:4317"
- "4318:4318"
container_name: otel-collector-contrib
volumes:
- ./otel/contrib_config.yaml:/etc/otelcol-contrib/config.yaml
docker_compose_run.sh
docker compose -f docker-compose.yml down
./docker_init.sh
# 检查退出状态码
if [ $? -eq 0 ]; then
echo "初始化成功"
else
echo "初始化执行失败,退出码:$?"
exit 1
fi
docker compose up -d
docker_init.sh
echo "执行init"
mkdir -p $(pwd)/prometheus/data
mkdir -p $(pwd)/elk/elasticsearch/data
mkdir -p $(pwd)/elk/elasticsearch/logs
mkdir -p $(pwd)/elk/filebeat/data
mkdir -p $(pwd)/elk/filebeat/logs
if [ ! -f "$(pwd)/prometheus/prometheus.yml" ]; then
echo "prometheus yaml not exist ,create"
# >> 追加模式 > 覆盖模式
cat > $(pwd)/prometheus/prometheus.yml <<EOF
# prometheus/prometheus.yml
global:
scrape_interval: 15s # 抓取间隔(默认 15s)
evaluation_interval: 15s
scrape_configs:
- job_name: "otel-agent"
scrape_interval: 30s # 仅抓取 OTel Agent 的指标(可选调整)
static_configs:
- targets: ["10.12.179.210:9090"] # Prometheus 自身地址(无需修改)
# 可选:添加 OTel Agent 暴露的指标端点(若 OTel Agent 暴露 /metrics)
# - job_name: "otel-agent-metrics"
# static_configs:
# - targets: ["otel-agent:4317"] # OTel Agent 的 metrics 端口(默认 4317)
remote_write:
- url: "http://10.12.179.210:9090/api/v1/write"
remote_timeout: 30s
queue_config:
max_samples_per_send: 5000 # 增大队列容量
max_shards: 1
EOF
fi
if [ ! -f "$(pwd)/elk/filebeat/filebeat.yml" ]; then
echo "filebeat yaml not exist ,create"
# >> 追加模式 > 覆盖模式
cat > $(pwd)/elk/filebeat/filebeat.yml <<EOF
# filebeat/filebeat.yml
filebeat.inputs:
- type: filestream # 推荐使用 filestream(支持日志轮转、事件去重)
id: test-service
paths:
- /var/log/test-service.log # 宿主机日志路径(需与实际一致)
fields:
service.name: test-service # 自定义字段(ES 中筛选用)
environment: dev
fields_under_root: true # 字段提升到根层级
parsers:
- ndjson: # 若日志为 NDJSON 格式(推荐)
target: "log" # 指定解析后的字段路径
output.elasticsearch:
hosts: ["http://localhost:9200"] # HTTPS 地址(9.x 默认启用 TLS)
# username: elastic # 默认用户名
# password: "your-elastic-password" # 替换为步骤 3 中设置的密码
# api_key: "" # 若使用 API Key 认证(可选,优先推荐用户名密码)
index: "logs" # 索引名(按日期滚动)
index_pattern: "logs-%Y-%m-%d" # 显式声明动态模式(可选)
ssl.verification_mode: none # 验证 SSL 证书(生产环境推荐)
timeout: 30 # 连接超时时间(秒)
setup.template.name: "desktop"
setup.template.pattern: "desktop-*"
EOF
fi
if [ ! -f "$(pwd)/otel/contrib_config.yaml" ]; then
echo "contrib_config yaml not exist ,create"
# >> 追加模式 > 覆盖模式
cat > $(pwd)/otel/contrib_config.yaml <<EOF
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
send_batch_size: 50
send_batch_max_size: 256
timeout: 10s
exporters:
otlphttp:
endpoint: http://10.12.179.210:14317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://10.12.179.210:9090/api/v1/write
tls:
insecure: true
service:
pipelines:
# 跟踪数据处理流水线
traces:
receivers: [ otlp ] # 接收 OTLP 和 Jaeger 数据
processors: [ ] # 应用批处理
exporters: [ otlphttp ] # 导出到 OTLP 后端
metrics:
receivers: [ otlp ] # 接收 OTLP 和 Jaeger 数据
processors: [ ] # 应用批处理
exporters: [ prometheusremotewrite ] # 导出到 OTLP 后端
EOF
fi
docker_run_app.sh
Dokerfile
# 基础镜像(根据项目需求选择,如 openjdk:17-jdk-slim)
FROM openjdk:21-jdk
# 采样率(100%)
# 设置工作目录
WORKDIR /app
# 复制应用 JAR 包(假设应用打包为 app.jar)
COPY target/demo-0.0.1-SNAPSHOT.jar app.jar
# 复制 OTel Agent 到镜像中(路径可自定义,如 /opt/opentelemetry)
COPY ${文件夹目录}/opentelemetry-javaagent.jar /opt/opentelemetry/
COPY ${文件夹目录}/file.properties /opt/opentelemetry/
# 启动命令:通过 -javaagent 加载 OTel Agent,然后启动应用
CMD ["java", \
"-javaagent:/opt/opentelemetry/opentelemetry-javaagent.jar", \
"-Dotel.javaagent.configuration-file=/opt/opentelemetry/file.properties", \
"-jar", "app.jar"]
docker stop test-service
docker rm test-service
docker image rm test-service
#注意dockerfile文件位置 docker只能访问子目录的文件
docker build -t test-service --tag=test-service --squash ../../.
docker run -d --network=host -p 8080:8080--name=test-service test-service