云原生网关 Ingress-Nginx 链路追踪实战:OpenTelemetry 采集与观测云集成方案

0 阅读4分钟

背景

在大型分布式系统中,服务之间调用复杂,链路追踪可以帮助梳理请求流向,现代系统也需要实时监控来快速响应事件以及故障,让我们了解系统瓶颈和高负载路径,从而可以进行优化。

Ingress-Nginx 是在 Kubernetes 环境中使用的,专门用于管理进入 Kubernetes 集群的外部访问流量。它基于 Nginx,利用其作为反向代理和负载均衡器的能力,但专门配置和优化以适应 Kubernetes 的架构。Ingress Controller 的主要任务是根据预先定义的规则(通过 Kubernetes Ingress 资源设置)将外部请求路由到集群内的特定服务。

前提

  • Ingress-Nginx 版本 >= 1.10.0
  • 应用服务已经接入 Opentelemetry 采集链路数据
  • K8s 集群版本:

1. 部署示例服务

这里我们会部署一个 spring boot 的服务,A 服务会调用 B 服务。本示例中 java 版本是 17,Maven 版本是 3.9.10。

由于采集 Ingress-Nginx 的链路需要和后端链路打通,所以在部署业务镜像的时候需要将 OTEL 探针一并打包到业务镜像。

以下是在服务 Dockerfile 中将 Agent 打包到业务服务容器镜像的配置,为服务提供采集链路数据的基础能力。

FROM curlimages/curl:latest AS agent-download
USER root
RUN curl -Lo /opentelemetry-javaagent.jar \
    https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

FROM openjdk:17-jdk-slim

WORKDIR /app

COPY --from=agent-download /opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar

COPY target/serviceb-1.0-SNAPSHOT.jar /app/service-b.jar

ENV OTEL_SERVICE_NAME="service-b" \
    OTEL_EXPORTER_OTLP_ENDPOINT="http://datakit-endpoint:4317" \
    OTEL_TRACES_SAMPLER="parentbased_always_on" \
    OTEL_PROPAGATORS="tracecontext,baggage" \
    OTEL_METRICS_EXPORTER="none" \
    OTEL_LOGS_EXPORTER="none"

# 修改启动命令,添加 Java Agent
CMD ["java", "-javaagent:/app/opentelemetry-javaagent.jar", "-jar", "/app/service-b.jar"]

创建 k8s-java-app.yaml 部署服务:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-a
spec:
  replicas: 1
  selector:
    matchLabels:
      app: service-a
  template:
    metadata:
      labels:
        app: service-a
    spec:
      containers:
        - name: service-a
          image: <your-repo>/service-a:otel-1.0
          ports:
            - containerPort: 9090
          env:
          - name: HOST_IP
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: SPRING_MAIN_ALLOW_CIRCULAR_REFERENCES
            value: "true"
          - name: OTEL_SERVICE_NAME
            value: "service-a"
          - name: OTEL_EXPORTER
            value: "otlp"
          - name: OTEL_EXPORTER_OTLP_PROTOCOL
            value: "grpc"
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: "http://$(HOST_IP):4317"
          - name: OTEL_PROPAGATORS
            value: "tracecontext,baggage"

apiVersion: v1
kind: Service
metadata:
  name: service-a
spec:
  ports:
    - port: 9090
      targetPort: 9090
  selector:
    app: service-a
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: service-b
  template:
    metadata:
      labels:
        app: service-b
    spec:
      containers:
        - name: service-b
          image: <your-repo>/service-b:otel-1.0
          ports:
            - containerPort: 8090
          env:
          - name: HOST_IP
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: OTEL_SERVICE_NAME
            value: "service-b"
          - name: OTEL_EXPORTER
            value: "otlp"
          - name: OTEL_EXPORTER_OTLP_PROTOCOL
            value: "grpc"
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: "http://$(HOST_IP):4317"
          - name: OTEL_PROPAGATORS
            value: "tracecontext,baggage"
apiVersion: v1
kind: Service
metadata:
  name: service-b
spec:
  ports:
    - port: 8090
      targetPort: 8090
  selector:
    app: service-b

2. 安装 Ingress Nginx

创建一个 ingress-nginx.yaml 文件:

apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
      - services
    verbs:
      - list
      - watch
      - get
  - apiGroups:
      - "discovery.k8s.io"
    resources:
      - endpointslices
    verbs:
      - list
      - watch
  - apiGroups:
      - "coordination.k8s.io"
    resources:
      - leases
    verbs:
      - get
      - watch
      - list
      - create
      - update
  - apiGroups:
      - "networking.k8s.io"
    resources:
      - ingresses
      - ingressclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "networking.k8s.io"
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - "extensions"
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "extensions"
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ingress-nginx
subjects:
  - kind: ServiceAccount
    name: ingress-nginx
    namespace: ingress-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ingress-nginx
  template:
    metadata:
      labels:
        app: ingress-nginx
    spec:
      hostNetwork: true
      serviceAccountName: ingress-nginx
      containers:
      - name: controller
        image: k8s.gcr.io/ingress-nginx/controller:v1.10.0
        args:
        - /nginx-ingress-controller
        - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
        - --election-id=ingress-controller-leader
        - --controller-class=k8s.io/ingress-nginx
        - --ingress-class=nginx
        - --configmap=ingress-nginx/ingress-nginx-controller
        env:
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://$(HOST_IP):4317"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
---
apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    targetPort: 80
  - name: https
    port: 443
    targetPort: 443
  selector:
    app: ingress-nginx
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  enable-opentelemetry: "true"
  otel-sampler: AlwaysOn
  opentelemetry-operation-name: "HTTP $request_method $service_name $uri $opentelemetry_trace_id"
  opentelemetry-trust-incoming-span: "true"
  # Defaults
  # otel-service-name: "nginx"
  # otel-sampler-ratio: 0.01

应用该配置:

kubectl apply -f ingress-nginx.yaml

3. 采集 Ingress-nginx 链路配置

3.1 DataKit 开启 OTEL 采集器

datakit.yaml 中采用 CM 挂载方式开启集群的 OTEL 采集器。

在 volumeMounts 添加:

        - mountPath: /usr/local/datakit/conf.d/opentelemetry/opentelemetry.conf
          name: datakit-conf
          subPath: opentelemetry.conf

在 CM 处添加采集器:

    opentelemetry.conf: |-
        [[inputs.opentelemetry]]
          [inputs.opentelemetry.http]
           enable = true
           http_status_ok = 200
           trace_api = "/otel/v1/traces"
          [inputs.opentelemetry.grpc]
           trace_enable = true
           metric_enable = true
           addr = "0.0.0.0:4317"

重启 DataKit:

kubectl apply -f datakit.yaml

3.2 OTEL Agent 采集链路数据

在服务 Dockerfile 中将 Agent 打包到业务服务容器镜像,为服务提供采集链路数据的基础能力。

FROM curlimages/curl:latest AS agent-download
USER root
RUN curl -Lo /opentelemetry-javaagent.jar \
    https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

FROM openjdk:17-jdk-slim

WORKDIR /app

COPY --from=agent-download /opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar

COPY target/serviceb-1.0-SNAPSHOT.jar /app/service-b.jar

ENV OTEL_SERVICE_NAME="service-b" \
    OTEL_EXPORTER_OTLP_ENDPOINT="http://datakit-endpoint:4317" \
    OTEL_TRACES_SAMPLER="parentbased_always_on" \
    OTEL_PROPAGATORS="tracecontext,baggage" \
    OTEL_METRICS_EXPORTER="none" \
    OTEL_LOGS_EXPORTER="none"

# 修改启动命令,添加 Java Agent
CMD ["java", "-javaagent:/app/opentelemetry-javaagent.jar", "-jar", "/app/service-b.jar"]

在服务部署的 yaml 中配置环境变量。

          - name: OTEL_EXPORTER
            value: "otlp"
          - name: OTEL_EXPORTER_OTLP_PROTOCOL
            value: "grpc"
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: "http://$(HOST_IP):4317"
          - name: OTEL_PROPAGATORS
            value: "tracecontext,baggage"

3.3 编辑 ingress-controller CM 资源

如果 ingress-controller 服务有 configmap 则在 CM 中增加如下四行:

enable-opentelemetry: "true"
otel-sampler: AlwaysOn
opentelemetry-operation-name: "HTTP $request_method $service_name $uri $opentelemetry_trace_id"
opentelemetry-trust-incoming-span: "true"

Apply 相应的 ingress 的 yaml,并重启 ingress-controller。

3.4 增加 ingress-controller 环境变量

在部署 ingress-controller 配置文件 ingress-nginx.yaml 的 deployment 部分中添加 OTEL 配置,位置在 spec.template.spec.containers.env 下,注意端口开启。

        - name: OTEL_EXPORTER
          value: "otlp"
        - name: OTEL_EXPORTER_OTLP_PROTOCOL
          value: "grpc"

        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://$(HOST_IP):4317"

        - name: OTEL_SERVICE_NAME
          value: "nginx"
        - name: OTEL_TRACES_SAMPLER
          value: "always_on"
        - name: OTEL_PROPAGATORS
          value: "tracecontext,baggage"

重新 apply ingress-nginx.yaml,重启 ingress-controller 容器。

观测云

再次访问 ingress 域名制造数据。

到观测云控制台「应用性能监测」,可以看到 Ingress-Nginx 链路数据正常上报。