Apisix(五)自服务监控-prometheus

1,201 阅读10分钟

一、前言

为什么要监控apisix(apisix的重要性不能黑盒扔在哪里,作为流量入口需要保证主动观测性; 有些参数需要进行主动观测,例如:服务发现中的shared_size等;)

apisix自带prometheus插件,通过简单配置即可完成对于apisix监控指标数据的暴露;

不要将prometheus指标通过public-api对外暴露

如果 Prometheus 插件收集的指标数量过多,在通过 URI 获取指标时,会占用 CPU 资源来计算指标数据,可能会影响 APISIX 处理正常请求。为解决此问题,APISIX 在 privileged agent 中暴露 URI 并且计算指标。 如果使用 public-api 插件暴露该 URI,那么 APISIX 将在普通的 worker 进程中计算指标数据,这仍可能会影响 APISIX 处理正常请求。

二、Apisix Prometheus插件介绍

Apache APISIX 集成了 Prometheus 监控功能,提供实时的 API 性能指标和健康状况监控。通过 Prometheus 的抓取机制,APISIX 能够自动暴露关键指标,如请求数、响应时间、错误率等,帮助开发者实时了解 API 的运行状态。

该功能支持自定义指标配置,用户可以根据需求选择监控特定的 API 或插件。同时,APISIX 提供丰富的标签功能,使得监控数据更加灵活和可视化。用户可以利用 Prometheus 的强大查询语言,对 API 性能进行深入分析,并结合 Grafana 等可视化工具进行数据展示,从而及时发现并解决潜在问题。这种监控能力使得 APISIX 成为微服务架构中重要的监控解决方案,提升了系统的可观测性和可靠性。

三、Apisix Prometheus插件使用

Apisix config中启用prometheus

在apisix config中添加插件属性,暴露prometheus的访问地址,端口以及uri等信息。下面代码中标红的部分

  config.yaml: |-
    #
    
    apisix:    # universal configurations
      node_listen:    # APISIX listening port
        - 9080
      enable_heartbeat: true
      enable_admin: true
      enable_admin_cors: true
      enable_debug: false

      enable_dev_mode: false                       # Sets nginx worker_processes to 1 if set to true
      enable_reuseport: true                       # Enable nginx SO_REUSEPORT switch if set to true.
      enable_ipv6: true # Enable nginx IPv6 resolver
      enable_server_tokens: true # Whether the APISIX version number should be shown in Server header
      proxy_cache:                         # Proxy Caching configuration
        cache_ttl: 10s                     # The default caching time if the upstream does not specify the cache time
        zones:                             # The parameters of a cache
        - name: disk_cache_one             # The name of the cache, administrator can be specify
                                           # which cache to use by name in the admin api
          memory_size: 50m                 # The size of shared memory, it's used to store the cache index
          disk_size: 1G                    # The size of disk, it's used to store the cache data
          disk_path: "/tmp/disk_cache_one" # The path to store the cache data
          cache_levels: "1:2"              # The hierarchy levels of a cache

      router:
        http: radixtree_host_uri  # radixtree_uri: match route by uri(base on radixtree)
                                    # radixtree_host_uri: match route by host + uri(base on radixtree)
                                    # radixtree_uri_with_parameter: match route by uri with parameters
        ssl: 'radixtree_sni'        # radixtree_sni: match route by SNI(base on radixtree)

      proxy_mode: http
      dns_resolver_valid: 30
      resolver_timeout: 5
      ssl:
        enable: false
        listen:
          - port: 9443
            enable_http2: true
        ssl_protocols: "TLSv1.2 TLSv1.3"
        ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA"

    nginx_config:    # config for render the template to genarate nginx.conf
      error_log: "/dev/stderr"
      error_log_level: "warn"    # warn,error
      worker_processes: "auto"
      enable_cpu_affinity: true
      worker_rlimit_nofile: 20480  # the number of files a worker process can open, should be larger than worker_connections
      event:
        worker_connections: 10620
      http:
        enable_access_log: true
        access_log: "/dev/stdout"
        access_log_format: '$remote_addr - $remote_user [$time_local] $http_host "$request" $status $body_bytes_sent $request_time "$http_referer" "$http_user_agent" $upstream_addr $upstream_status $upstream_response_time "$upstream_scheme://$upstream_host$upstream_uri"'
        access_log_format_escape: default
        keepalive_timeout: "60s"
        client_header_timeout: 60s     # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
        client_body_timeout: 60s       # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
        send_timeout: 10s              # timeout for transmitting a response to the client.then the connection is closed
        underscores_in_headers: "on"   # default enables the use of underscores in client request header fields
        real_ip_header: "X-Real-IP"    # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
        real_ip_from:                  # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
          - 127.0.0.1
          - 'unix:'


    discovery:                      # Service Discovery
     kubernetes:                     # Kubernetes service discovery
       service:
         schema: https                     # apiserver schema, options [http, https], default https
         host: "kubernetes.default.svc.cluster.local" #${KUBERNETES_SERVICE_HOST}  # apiserver host, options [ipv4, ipv6, domain, environment variable], default ${KUBERNETES_SERVICE_HOST}
         port: "443" #${KUBERNETES_SERVICE_PORT}  # apiserver port, options [port number, environment variable], default ${KUBERNETES_SERVICE_PORT}
       client:
         # serviceaccount token or path of serviceaccount token_file
         token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
       namespace_selector:
         match: 
         - apisix
       # label_selector: |-
       #   app=cmp,
       # shared_size: 1m #default 1m

    plugin_attr:          # Plugin attributes            # export opentelemetry variables to nginx variables
      prometheus:                               # Plugin: prometheus
        export_uri: /apisix/prometheus/metrics  # Set the URI for the Prometheus metrics endpoint.
        metric_prefix: apisix_                  # Set the prefix for Prometheus metrics generated by APISIX.
        enable_export_server: true              # Enable the Prometheus export server.
        export_addr:                            # Set the address for the Prometheus export server.
          ip: 0.0.0.0                         # Set the IP.
          port: 9091        

    deployment:
      role: traditional
      role_traditional:
        config_provider: etcd
      admin:
        allow_admin:    # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
          - 127.0.0.1/24
        admin_listen:
          ip: 0.0.0.0
          port: 9180
        admin_key:
          # admin: can everything for configuration data
          - name: "admin"
            key: edd1c9f034335f136f87ad84b625c8f1
            role: admin
          # viewer: only can view configuration data
          - name: "viewer"
            key: 4054f7cf07e344346cd3f287985e76a2
            role: viewer
      etcd:
        host:                          # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
          - "http://apisix-etcd.apisix.svc.cluster.local:2379"
        prefix: "/apisix"    # configuration prefix in etcd
        timeout: 30    # 30 seconds

暴露Apisix prometheus监控指标

为了方便外部访问apisix prometheus metric url,创建一个svc暴露apisix pod中的prometheus对应的端口到外部

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: apisix
    meta.helm.sh/release-namespace: apisix
  creationTimestamp: "2024-03-07T05:35:32Z"
  labels:
    app.kubernetes.io/instance: apisix
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: apisix
    app.kubernetes.io/service: apisix-prometheus
    app.kubernetes.io/version: 3.8.0
    helm.sh/chart: apisix-2.6.0
  name: apisix-prometheus
  namespace: apisix
spec:
  ports:
  - name: apisix-prometheus
    nodePort: 30291
    port: 80
    protocol: TCP
    targetPort: 9091
  selector:
    app.kubernetes.io/instance: apisix
    app.kubernetes.io/name: apisix
  type: NodePort

查看apisix prometheus指标信息

通过访问apisix-prometheus服务,查看apisix prometheus监控指标暴露情况

  apisix-helm-chart git:(0da1bd0)  curl -v http://192.168.101.23:30291/apisix/prometheus/metrics
*   Trying 192.168.101.23:30291...
* Connected to 192.168.101.23 (192.168.101.23) port 30291 (#0)
> GET /apisix/prometheus/metrics HTTP/1.1
> Host: 192.168.101.23:30291
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: openresty
< Date: Thu, 18 Apr 2024 07:22:30 GMT
< Content-Type: text/plain; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
<
# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys
# TYPE apisix_etcd_modify_indexes gauge
apisix_etcd_modify_indexes{key="consumers"} 547
apisix_etcd_modify_indexes{key="global_rules"} 555
apisix_etcd_modify_indexes{key="max_modify_index"} 555
apisix_etcd_modify_indexes{key="prev_index"} 584
apisix_etcd_modify_indexes{key="protos"} 0
apisix_etcd_modify_indexes{key="routes"} 554
apisix_etcd_modify_indexes{key="services"} 0
apisix_etcd_modify_indexes{key="ssls"} 0
apisix_etcd_modify_indexes{key="stream_routes"} 0
apisix_etcd_modify_indexes{key="upstreams"} 0
apisix_etcd_modify_indexes{key="x_etcd_index"} 585
# HELP apisix_etcd_reachable Config server etcd reachable from APISIX, 0 is unreachable
# TYPE apisix_etcd_reachable gauge
apisix_etcd_reachable 1
# HELP apisix_http_requests_total The total number of client requests since APISIX started
# TYPE apisix_http_requests_total gauge
apisix_http_requests_total 11
# HELP apisix_nginx_http_current_connections Number of HTTP connections
# TYPE apisix_nginx_http_current_connections gauge
apisix_nginx_http_current_connections{state="accepted"} 32
apisix_nginx_http_current_connections{state="active"} 10
apisix_nginx_http_current_connections{state="handled"} 32
apisix_nginx_http_current_connections{state="reading"} 0
apisix_nginx_http_current_connections{state="waiting"} 0
apisix_nginx_http_current_connections{state="writing"} 10
# HELP apisix_nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE apisix_nginx_metric_errors_total counter
apisix_nginx_metric_errors_total 0
# HELP apisix_node_info Info of APISIX node
# TYPE apisix_node_info gauge
apisix_node_info{hostname="apisix-68d87594b5-8s6p8"} 1
# HELP apisix_shared_dict_capacity_bytes The capacity of each nginx shared DICT since APISIX start
# TYPE apisix_shared_dict_capacity_bytes gauge
apisix_shared_dict_capacity_bytes{name="access-tokens"} 1048576
apisix_shared_dict_capacity_bytes{name="balancer-ewma"} 10485760
apisix_shared_dict_capacity_bytes{name="balancer-ewma-last-touched-at"} 10485760
apisix_shared_dict_capacity_bytes{name="balancer-ewma-locks"} 10485760
apisix_shared_dict_capacity_bytes{name="cas_sessions"} 10485760
apisix_shared_dict_capacity_bytes{name="discovery"} 1048576
apisix_shared_dict_capacity_bytes{name="etcd-cluster-health-check"} 10485760
apisix_shared_dict_capacity_bytes{name="ext-plugin"} 1048576
apisix_shared_dict_capacity_bytes{name="internal-status"} 10485760
apisix_shared_dict_capacity_bytes{name="introspection"} 10485760
apisix_shared_dict_capacity_bytes{name="jwks"} 1048576
apisix_shared_dict_capacity_bytes{name="kubernetes"} 1048576
apisix_shared_dict_capacity_bytes{name="lrucache-lock"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-api-breaker"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-conn"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-count"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-count-redis-cluster-slot-lock"} 1048576
apisix_shared_dict_capacity_bytes{name="plugin-limit-count-reset-header"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-req"} 10485760
apisix_shared_dict_capacity_bytes{name="prometheus-metrics"} 10485760
apisix_shared_dict_capacity_bytes{name="upstream-healthcheck"} 10485760
apisix_shared_dict_capacity_bytes{name="worker-events"} 10485760
# HELP apisix_shared_dict_free_space_bytes The free space of each nginx shared DICT since APISIX start
# TYPE apisix_shared_dict_free_space_bytes gauge
apisix_shared_dict_free_space_bytes{name="access-tokens"} 1032192
apisix_shared_dict_free_space_bytes{name="balancer-ewma"} 10412032
apisix_shared_dict_free_space_bytes{name="balancer-ewma-last-touched-at"} 10412032
apisix_shared_dict_free_space_bytes{name="balancer-ewma-locks"} 10412032
apisix_shared_dict_free_space_bytes{name="cas_sessions"} 10412032
apisix_shared_dict_free_space_bytes{name="discovery"} 1032192
apisix_shared_dict_free_space_bytes{name="etcd-cluster-health-check"} 10412032
apisix_shared_dict_free_space_bytes{name="ext-plugin"} 1032192
apisix_shared_dict_free_space_bytes{name="internal-status"} 10407936
apisix_shared_dict_free_space_bytes{name="introspection"} 10412032
apisix_shared_dict_free_space_bytes{name="jwks"} 1032192
apisix_shared_dict_free_space_bytes{name="kubernetes"} 1024000
apisix_shared_dict_free_space_bytes{name="lrucache-lock"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-api-breaker"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-conn"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-count"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-count-redis-cluster-slot-lock"} 1036288
apisix_shared_dict_free_space_bytes{name="plugin-limit-count-reset-header"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-req"} 10412032
apisix_shared_dict_free_space_bytes{name="prometheus-metrics"} 10387456
apisix_shared_dict_free_space_bytes{name="upstream-healthcheck"} 10412032
apisix_shared_dict_free_space_bytes{name="worker-events"} 10412032
* Connection #0 to host 192.168.101.23 left intact
  apisix-helm-chart git:(0da1bd0) 
  1. prometheus中接入apisix prometheus监控

将apisix prometheus暴露的指标项,接入prometheus server中

scrape_configs:
  - job_name: "apisix"
    scrape_interval: 15s # 该值会跟 Prometheus QL 中 rate 函数的时间范围有关系,rate 函数中的时间范围应该至少两倍于该值。
    metrics_path: "/apisix/prometheus/metrics"
    static_configs:
      - targets: ["127.0.0.1:9091"]

四、总结

通过本文我们能够配置对于apisix服务本身进行prometheus监控,同时在使用prometheus进行监控的时候,需要避免使用apisix public-api进行暴露prometheus指标,以免造成不必要的性能损失或者性能风险。