k8s基于自定义监控指标的扩缩容

447 阅读3分钟

一、部署Prometheus adapter

Prometheus adapter 提供了 custom metrics API,通过它我们可以将Prometheus采集的指标转换成k8s可以使用的指标。

helm pull prometheus-community/prometheus-adapter --version 4.8.3

helm install prometheus-adapter . -f values.yaml -n monitoring

注意替换实际values中实际的Prometheus地址:prometheus-k8s.monitoring:9090

二、配置adapter策略

本次我们的需求是服务在扩缩容时,支持limit阀值扩容及缩容速率限制。

第一步我们需要计算出服务在实际工作中占用limit的百分比,CPU和内存。

    rules:
      #计算cpu使用百分比
      - seriesQuery: '{__name__=~"^container_cpu_usage_seconds_total.*",namespace!="",pod!="",container !=""}'
        seriesFilters: []
        resources:
          overrides:
            namespace:
             resource: namespace
            pod: 
             resource: pod
        name:
          matches: container_cpu_usage_seconds_total
          as: "podCPUUtilization"
        metricsQuery: sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}[2m])) by (<<.GroupBy>>) / (sum(container_spec_cpu_quota{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}/100000) by (<<.GroupBy>>)) * 100
      
      #计算内存使用百分比
      - seriesQuery: '{__name__=~"^container_memory_working_set_bytes.*",namespace!="",pod!="",container !=""}'
        seriesFilters: []
        resources:
          overrides:
            namespace:
             resource: namespace
            pod: 
             resource: pod
        name:
          matches: container_memory_working_set_bytes
          as: "podMemoryUtilization"
        metricsQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}) by (<<.GroupBy>>) / (sum(container_spec_memory_limit_bytes{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}) by (<<.GroupBy>>)) * 100

三、测试custom.metrics.k8s.io

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" |jq .

能获取到我们采集的指标接口如下:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "namespaces/podCPUUtilization",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "pods/podCPUUtilization",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/podMemoryUtilization",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "pods/podMemoryUtilization",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

以/kafka/kafka-0为例

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/kafka/pods/kafka-0/podCPUUtilizationPercentage" |jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "kafka",
        "name": "kafka-0",
        "apiVersion": "/v1"
      },
      "metricName": "podCPUUtilization",
      "timestamp": "2023-12-08T09:19:54Z",
      "value": "2800m",  
      "selector": null
    }
  ]
}

注意单位是m,千分之一,如上结果为2.8%,通过grafana监控我们可以验证,结果正确。

四、集成到基础chart

这里你可以单独配置hpa规则,我这边是集成在了chart模板里。 为了限制扩缩容的速率,增加了behavior参数

values中的配置如下,举例:

autoscaling:
  enabled: true
  minReplicas: 1  #最小副本数
  maxReplicas: 5  #最大副本数
  podCPUUtilization: 80  #cpu扩容阈值
  podMemoryUtilization: 80 #内存扩容阈值
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300 #缩容窗口观察期300s自动扩缩算法查看之前计算的期望状态,并使用指定时间间隔内的最大值。 在本例子中,过去 5 分钟的所有期望状态都会被考虑
      policies:
      - type: Pods
        value: 1 
        periodSeconds: 60  #每60s最多缩容1个
    scaleUp:
      stabilizationWindowSeconds: 60 #窗口观察期60s
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60  #每60s最多扩容2个

也可以通过百分比来控制如:

    - type: Percent
      value: 30
      periodSeconds: 60
      #每60s最多扩(缩)30%

五、压测

以上面的例子来测试

对缩容的速率限制为,窗口观察期5分钟,每60秒最多只能缩1个

对扩容的速率限制为,窗口观察期1分钟,每60秒最多可以扩2个

ab -n 100000000 -c 200 http://10.23.130.111/index.html

结果如下图:

扩容和缩容的速率符合我们的预期