背景
在大型分布式系统中,服务之间调用复杂,链路追踪可以帮助梳理请求流向,现代系统也需要实时监控来快速响应事件以及故障,让我们了解系统瓶颈和高负载路径,从而可以进行优化。
Ingress-Nginx 是在 Kubernetes 环境中使用的,专门用于管理进入 Kubernetes 集群的外部访问流量。它基于 Nginx,利用其作为反向代理和负载均衡器的能力,但专门配置和优化以适应 Kubernetes 的架构。Ingress Controller 的主要任务是根据预先定义的规则(通过 Kubernetes Ingress 资源设置)将外部请求路由到集群内的特定服务。
前提
- Ingress-Nginx 版本 >= 1.10.0
- 应用服务已经接入 Opentelemetry 采集链路数据
- K8s 集群版本:
1. 部署示例服务
这里我们会部署一个 spring boot 的服务,A 服务会调用 B 服务。本示例中 java 版本是 17,Maven 版本是 3.9.10。
由于采集 Ingress-Nginx 的链路需要和后端链路打通,所以在部署业务镜像的时候需要将 OTEL 探针一并打包到业务镜像。
以下是在服务 Dockerfile 中将 Agent 打包到业务服务容器镜像的配置,为服务提供采集链路数据的基础能力。
FROM curlimages/curl:latest AS agent-download
USER root
RUN curl -Lo /opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=agent-download /opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar
COPY target/serviceb-1.0-SNAPSHOT.jar /app/service-b.jar
ENV OTEL_SERVICE_NAME="service-b" \
OTEL_EXPORTER_OTLP_ENDPOINT="http://datakit-endpoint:4317" \
OTEL_TRACES_SAMPLER="parentbased_always_on" \
OTEL_PROPAGATORS="tracecontext,baggage" \
OTEL_METRICS_EXPORTER="none" \
OTEL_LOGS_EXPORTER="none"
# 修改启动命令,添加 Java Agent
CMD ["java", "-javaagent:/app/opentelemetry-javaagent.jar", "-jar", "/app/service-b.jar"]
创建 k8s-java-app.yaml 部署服务:
apiVersion: apps/v1
kind: Deployment
metadata:
name: service-a
spec:
replicas: 1
selector:
matchLabels:
app: service-a
template:
metadata:
labels:
app: service-a
spec:
containers:
- name: service-a
image: <your-repo>/service-a:otel-1.0
ports:
- containerPort: 9090
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: SPRING_MAIN_ALLOW_CIRCULAR_REFERENCES
value: "true"
- name: OTEL_SERVICE_NAME
value: "service-a"
- name: OTEL_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(HOST_IP):4317"
- name: OTEL_PROPAGATORS
value: "tracecontext,baggage"
apiVersion: v1
kind: Service
metadata:
name: service-a
spec:
ports:
- port: 9090
targetPort: 9090
selector:
app: service-a
apiVersion: apps/v1
kind: Deployment
metadata:
name: service-b
spec:
replicas: 1
selector:
matchLabels:
app: service-b
template:
metadata:
labels:
app: service-b
spec:
containers:
- name: service-b
image: <your-repo>/service-b:otel-1.0
ports:
- containerPort: 8090
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_SERVICE_NAME
value: "service-b"
- name: OTEL_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(HOST_IP):4317"
- name: OTEL_PROPAGATORS
value: "tracecontext,baggage"
apiVersion: v1
kind: Service
metadata:
name: service-b
spec:
ports:
- port: 8090
targetPort: 8090
selector:
app: service-b
2. 安装 Ingress Nginx
创建一个 ingress-nginx.yaml 文件:
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: ingress-nginx
namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ingress-nginx
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
- services
verbs:
- list
- watch
- get
- apiGroups:
- "discovery.k8s.io"
resources:
- endpointslices
verbs:
- list
- watch
- apiGroups:
- "coordination.k8s.io"
resources:
- leases
verbs:
- get
- watch
- list
- create
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingresses
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- "networking.k8s.io"
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
replicas: 1
selector:
matchLabels:
app: ingress-nginx
template:
metadata:
labels:
app: ingress-nginx
spec:
hostNetwork: true
serviceAccountName: ingress-nginx
containers:
- name: controller
image: k8s.gcr.io/ingress-nginx/controller:v1.10.0
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --controller-class=k8s.io/ingress-nginx
- --ingress-class=nginx
- --configmap=ingress-nginx/ingress-nginx-controller
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(HOST_IP):4317"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
---
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: 80
- name: https
port: 443
targetPort: 443
selector:
app: ingress-nginx
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
data:
enable-opentelemetry: "true"
otel-sampler: AlwaysOn
opentelemetry-operation-name: "HTTP $request_method $service_name $uri $opentelemetry_trace_id"
opentelemetry-trust-incoming-span: "true"
# Defaults
# otel-service-name: "nginx"
# otel-sampler-ratio: 0.01
应用该配置:
kubectl apply -f ingress-nginx.yaml
3. 采集 Ingress-nginx 链路配置
3.1 DataKit 开启 OTEL 采集器
datakit.yaml 中采用 CM 挂载方式开启集群的 OTEL 采集器。
在 volumeMounts 添加:
- mountPath: /usr/local/datakit/conf.d/opentelemetry/opentelemetry.conf
name: datakit-conf
subPath: opentelemetry.conf
在 CM 处添加采集器:
opentelemetry.conf: |-
[[inputs.opentelemetry]]
[inputs.opentelemetry.http]
enable = true
http_status_ok = 200
trace_api = "/otel/v1/traces"
[inputs.opentelemetry.grpc]
trace_enable = true
metric_enable = true
addr = "0.0.0.0:4317"
重启 DataKit:
kubectl apply -f datakit.yaml
3.2 OTEL Agent 采集链路数据
在服务 Dockerfile 中将 Agent 打包到业务服务容器镜像,为服务提供采集链路数据的基础能力。
FROM curlimages/curl:latest AS agent-download
USER root
RUN curl -Lo /opentelemetry-javaagent.jar \
https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=agent-download /opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar
COPY target/serviceb-1.0-SNAPSHOT.jar /app/service-b.jar
ENV OTEL_SERVICE_NAME="service-b" \
OTEL_EXPORTER_OTLP_ENDPOINT="http://datakit-endpoint:4317" \
OTEL_TRACES_SAMPLER="parentbased_always_on" \
OTEL_PROPAGATORS="tracecontext,baggage" \
OTEL_METRICS_EXPORTER="none" \
OTEL_LOGS_EXPORTER="none"
# 修改启动命令,添加 Java Agent
CMD ["java", "-javaagent:/app/opentelemetry-javaagent.jar", "-jar", "/app/service-b.jar"]
在服务部署的 yaml 中配置环境变量。
- name: OTEL_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(HOST_IP):4317"
- name: OTEL_PROPAGATORS
value: "tracecontext,baggage"
3.3 编辑 ingress-controller CM 资源
如果 ingress-controller 服务有 configmap 则在 CM 中增加如下四行:
enable-opentelemetry: "true"
otel-sampler: AlwaysOn
opentelemetry-operation-name: "HTTP $request_method $service_name $uri $opentelemetry_trace_id"
opentelemetry-trust-incoming-span: "true"
Apply 相应的 ingress 的 yaml,并重启 ingress-controller。
3.4 增加 ingress-controller 环境变量
在部署 ingress-controller 配置文件 ingress-nginx.yaml 的 deployment 部分中添加 OTEL 配置,位置在 spec.template.spec.containers.env 下,注意端口开启。
- name: OTEL_EXPORTER
value: "otlp"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "grpc"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(HOST_IP):4317"
- name: OTEL_SERVICE_NAME
value: "nginx"
- name: OTEL_TRACES_SAMPLER
value: "always_on"
- name: OTEL_PROPAGATORS
value: "tracecontext,baggage"
重新 apply ingress-nginx.yaml,重启 ingress-controller 容器。
观测云
再次访问 ingress 域名制造数据。
到观测云控制台「应用性能监测」,可以看到 Ingress-Nginx 链路数据正常上报。