一、前言
为什么要监控apisix(apisix的重要性不能黑盒扔在哪里,作为流量入口需要保证主动观测性; 有些参数需要进行主动观测,例如:服务发现中的shared_size等;)
apisix自带prometheus插件,通过简单配置即可完成对于apisix监控指标数据的暴露;
不要将prometheus指标通过public-api对外暴露
如果 Prometheus 插件收集的指标数量过多,在通过 URI 获取指标时,会占用 CPU 资源来计算指标数据,可能会影响 APISIX 处理正常请求。为解决此问题,APISIX 在 privileged agent 中暴露 URI 并且计算指标。 如果使用 public-api 插件暴露该 URI,那么 APISIX 将在普通的 worker 进程中计算指标数据,这仍可能会影响 APISIX 处理正常请求。
二、Apisix Prometheus插件介绍
Apache APISIX 集成了 Prometheus 监控功能,提供实时的 API 性能指标和健康状况监控。通过 Prometheus 的抓取机制,APISIX 能够自动暴露关键指标,如请求数、响应时间、错误率等,帮助开发者实时了解 API 的运行状态。
该功能支持自定义指标配置,用户可以根据需求选择监控特定的 API 或插件。同时,APISIX 提供丰富的标签功能,使得监控数据更加灵活和可视化。用户可以利用 Prometheus 的强大查询语言,对 API 性能进行深入分析,并结合 Grafana 等可视化工具进行数据展示,从而及时发现并解决潜在问题。这种监控能力使得 APISIX 成为微服务架构中重要的监控解决方案,提升了系统的可观测性和可靠性。
三、Apisix Prometheus插件使用
Apisix config中启用prometheus
在apisix config中添加插件属性,暴露prometheus的访问地址,端口以及uri等信息。下面代码中标红的部分
config.yaml: |-
#
apisix: # universal configurations
node_listen: # APISIX listening port
- 9080
enable_heartbeat: true
enable_admin: true
enable_admin_cors: true
enable_debug: false
enable_dev_mode: false # Sets nginx worker_processes to 1 if set to true
enable_reuseport: true # Enable nginx SO_REUSEPORT switch if set to true.
enable_ipv6: true # Enable nginx IPv6 resolver
enable_server_tokens: true # Whether the APISIX version number should be shown in Server header
proxy_cache: # Proxy Caching configuration
cache_ttl: 10s # The default caching time if the upstream does not specify the cache time
zones: # The parameters of a cache
- name: disk_cache_one # The name of the cache, administrator can be specify
# which cache to use by name in the admin api
memory_size: 50m # The size of shared memory, it's used to store the cache index
disk_size: 1G # The size of disk, it's used to store the cache data
disk_path: "/tmp/disk_cache_one" # The path to store the cache data
cache_levels: "1:2" # The hierarchy levels of a cache
router:
http: radixtree_host_uri # radixtree_uri: match route by uri(base on radixtree)
# radixtree_host_uri: match route by host + uri(base on radixtree)
# radixtree_uri_with_parameter: match route by uri with parameters
ssl: 'radixtree_sni' # radixtree_sni: match route by SNI(base on radixtree)
proxy_mode: http
dns_resolver_valid: 30
resolver_timeout: 5
ssl:
enable: false
listen:
- port: 9443
enable_http2: true
ssl_protocols: "TLSv1.2 TLSv1.3"
ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA"
nginx_config: # config for render the template to genarate nginx.conf
error_log: "/dev/stderr"
error_log_level: "warn" # warn,error
worker_processes: "auto"
enable_cpu_affinity: true
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
event:
worker_connections: 10620
http:
enable_access_log: true
access_log: "/dev/stdout"
access_log_format: '$remote_addr - $remote_user [$time_local] $http_host "$request" $status $body_bytes_sent $request_time "$http_referer" "$http_user_agent" $upstream_addr $upstream_status $upstream_response_time "$upstream_scheme://$upstream_host$upstream_uri"'
access_log_format_escape: default
keepalive_timeout: "60s"
client_header_timeout: 60s # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
client_body_timeout: 60s # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
send_timeout: 10s # timeout for transmitting a response to the client.then the connection is closed
underscores_in_headers: "on" # default enables the use of underscores in client request header fields
real_ip_header: "X-Real-IP" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
real_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
- 127.0.0.1
- 'unix:'
discovery: # Service Discovery
kubernetes: # Kubernetes service discovery
service:
schema: https # apiserver schema, options [http, https], default https
host: "kubernetes.default.svc.cluster.local" #${KUBERNETES_SERVICE_HOST} # apiserver host, options [ipv4, ipv6, domain, environment variable], default ${KUBERNETES_SERVICE_HOST}
port: "443" #${KUBERNETES_SERVICE_PORT} # apiserver port, options [port number, environment variable], default ${KUBERNETES_SERVICE_PORT}
client:
# serviceaccount token or path of serviceaccount token_file
token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
namespace_selector:
match:
- apisix
# label_selector: |-
# app=cmp,
# shared_size: 1m #default 1m
plugin_attr: # Plugin attributes # export opentelemetry variables to nginx variables
prometheus: # Plugin: prometheus
export_uri: /apisix/prometheus/metrics # Set the URI for the Prometheus metrics endpoint.
metric_prefix: apisix_ # Set the prefix for Prometheus metrics generated by APISIX.
enable_export_server: true # Enable the Prometheus export server.
export_addr: # Set the address for the Prometheus export server.
ip: 0.0.0.0 # Set the IP.
port: 9091
deployment:
role: traditional
role_traditional:
config_provider: etcd
admin:
allow_admin: # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
- 127.0.0.1/24
admin_listen:
ip: 0.0.0.0
port: 9180
admin_key:
# admin: can everything for configuration data
- name: "admin"
key: edd1c9f034335f136f87ad84b625c8f1
role: admin
# viewer: only can view configuration data
- name: "viewer"
key: 4054f7cf07e344346cd3f287985e76a2
role: viewer
etcd:
host: # it's possible to define multiple etcd hosts addresses of the same etcd cluster.
- "http://apisix-etcd.apisix.svc.cluster.local:2379"
prefix: "/apisix" # configuration prefix in etcd
timeout: 30 # 30 seconds
暴露Apisix prometheus监控指标
为了方便外部访问apisix prometheus metric url,创建一个svc暴露apisix pod中的prometheus对应的端口到外部
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: apisix
meta.helm.sh/release-namespace: apisix
creationTimestamp: "2024-03-07T05:35:32Z"
labels:
app.kubernetes.io/instance: apisix
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: apisix
app.kubernetes.io/service: apisix-prometheus
app.kubernetes.io/version: 3.8.0
helm.sh/chart: apisix-2.6.0
name: apisix-prometheus
namespace: apisix
spec:
ports:
- name: apisix-prometheus
nodePort: 30291
port: 80
protocol: TCP
targetPort: 9091
selector:
app.kubernetes.io/instance: apisix
app.kubernetes.io/name: apisix
type: NodePort
查看apisix prometheus指标信息
通过访问apisix-prometheus服务,查看apisix prometheus监控指标暴露情况
➜ apisix-helm-chart git:(0da1bd0) ✗ curl -v http://192.168.101.23:30291/apisix/prometheus/metrics
* Trying 192.168.101.23:30291...
* Connected to 192.168.101.23 (192.168.101.23) port 30291 (#0)
> GET /apisix/prometheus/metrics HTTP/1.1
> Host: 192.168.101.23:30291
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: openresty
< Date: Thu, 18 Apr 2024 07:22:30 GMT
< Content-Type: text/plain; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
<
# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys
# TYPE apisix_etcd_modify_indexes gauge
apisix_etcd_modify_indexes{key="consumers"} 547
apisix_etcd_modify_indexes{key="global_rules"} 555
apisix_etcd_modify_indexes{key="max_modify_index"} 555
apisix_etcd_modify_indexes{key="prev_index"} 584
apisix_etcd_modify_indexes{key="protos"} 0
apisix_etcd_modify_indexes{key="routes"} 554
apisix_etcd_modify_indexes{key="services"} 0
apisix_etcd_modify_indexes{key="ssls"} 0
apisix_etcd_modify_indexes{key="stream_routes"} 0
apisix_etcd_modify_indexes{key="upstreams"} 0
apisix_etcd_modify_indexes{key="x_etcd_index"} 585
# HELP apisix_etcd_reachable Config server etcd reachable from APISIX, 0 is unreachable
# TYPE apisix_etcd_reachable gauge
apisix_etcd_reachable 1
# HELP apisix_http_requests_total The total number of client requests since APISIX started
# TYPE apisix_http_requests_total gauge
apisix_http_requests_total 11
# HELP apisix_nginx_http_current_connections Number of HTTP connections
# TYPE apisix_nginx_http_current_connections gauge
apisix_nginx_http_current_connections{state="accepted"} 32
apisix_nginx_http_current_connections{state="active"} 10
apisix_nginx_http_current_connections{state="handled"} 32
apisix_nginx_http_current_connections{state="reading"} 0
apisix_nginx_http_current_connections{state="waiting"} 0
apisix_nginx_http_current_connections{state="writing"} 10
# HELP apisix_nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE apisix_nginx_metric_errors_total counter
apisix_nginx_metric_errors_total 0
# HELP apisix_node_info Info of APISIX node
# TYPE apisix_node_info gauge
apisix_node_info{hostname="apisix-68d87594b5-8s6p8"} 1
# HELP apisix_shared_dict_capacity_bytes The capacity of each nginx shared DICT since APISIX start
# TYPE apisix_shared_dict_capacity_bytes gauge
apisix_shared_dict_capacity_bytes{name="access-tokens"} 1048576
apisix_shared_dict_capacity_bytes{name="balancer-ewma"} 10485760
apisix_shared_dict_capacity_bytes{name="balancer-ewma-last-touched-at"} 10485760
apisix_shared_dict_capacity_bytes{name="balancer-ewma-locks"} 10485760
apisix_shared_dict_capacity_bytes{name="cas_sessions"} 10485760
apisix_shared_dict_capacity_bytes{name="discovery"} 1048576
apisix_shared_dict_capacity_bytes{name="etcd-cluster-health-check"} 10485760
apisix_shared_dict_capacity_bytes{name="ext-plugin"} 1048576
apisix_shared_dict_capacity_bytes{name="internal-status"} 10485760
apisix_shared_dict_capacity_bytes{name="introspection"} 10485760
apisix_shared_dict_capacity_bytes{name="jwks"} 1048576
apisix_shared_dict_capacity_bytes{name="kubernetes"} 1048576
apisix_shared_dict_capacity_bytes{name="lrucache-lock"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-api-breaker"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-conn"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-count"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-count-redis-cluster-slot-lock"} 1048576
apisix_shared_dict_capacity_bytes{name="plugin-limit-count-reset-header"} 10485760
apisix_shared_dict_capacity_bytes{name="plugin-limit-req"} 10485760
apisix_shared_dict_capacity_bytes{name="prometheus-metrics"} 10485760
apisix_shared_dict_capacity_bytes{name="upstream-healthcheck"} 10485760
apisix_shared_dict_capacity_bytes{name="worker-events"} 10485760
# HELP apisix_shared_dict_free_space_bytes The free space of each nginx shared DICT since APISIX start
# TYPE apisix_shared_dict_free_space_bytes gauge
apisix_shared_dict_free_space_bytes{name="access-tokens"} 1032192
apisix_shared_dict_free_space_bytes{name="balancer-ewma"} 10412032
apisix_shared_dict_free_space_bytes{name="balancer-ewma-last-touched-at"} 10412032
apisix_shared_dict_free_space_bytes{name="balancer-ewma-locks"} 10412032
apisix_shared_dict_free_space_bytes{name="cas_sessions"} 10412032
apisix_shared_dict_free_space_bytes{name="discovery"} 1032192
apisix_shared_dict_free_space_bytes{name="etcd-cluster-health-check"} 10412032
apisix_shared_dict_free_space_bytes{name="ext-plugin"} 1032192
apisix_shared_dict_free_space_bytes{name="internal-status"} 10407936
apisix_shared_dict_free_space_bytes{name="introspection"} 10412032
apisix_shared_dict_free_space_bytes{name="jwks"} 1032192
apisix_shared_dict_free_space_bytes{name="kubernetes"} 1024000
apisix_shared_dict_free_space_bytes{name="lrucache-lock"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-api-breaker"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-conn"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-count"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-count-redis-cluster-slot-lock"} 1036288
apisix_shared_dict_free_space_bytes{name="plugin-limit-count-reset-header"} 10412032
apisix_shared_dict_free_space_bytes{name="plugin-limit-req"} 10412032
apisix_shared_dict_free_space_bytes{name="prometheus-metrics"} 10387456
apisix_shared_dict_free_space_bytes{name="upstream-healthcheck"} 10412032
apisix_shared_dict_free_space_bytes{name="worker-events"} 10412032
* Connection #0 to host 192.168.101.23 left intact
➜ apisix-helm-chart git:(0da1bd0) ✗
-
prometheus中接入apisix prometheus监控
将apisix prometheus暴露的指标项,接入prometheus server中
scrape_configs:
- job_name: "apisix"
scrape_interval: 15s # 该值会跟 Prometheus QL 中 rate 函数的时间范围有关系,rate 函数中的时间范围应该至少两倍于该值。
metrics_path: "/apisix/prometheus/metrics"
static_configs:
- targets: ["127.0.0.1:9091"]
四、总结
通过本文我们能够配置对于apisix服务本身进行prometheus监控,同时在使用prometheus进行监控的时候,需要避免使用apisix public-api进行暴露prometheus指标,以免造成不必要的性能损失或者性能风险。