一、部署Prometheus adapter
Prometheus adapter 提供了 custom metrics API,通过它我们可以将Prometheus采集的指标转换成k8s可以使用的指标。
helm pull prometheus-community/prometheus-adapter --version 4.8.3
helm install prometheus-adapter . -f values.yaml -n monitoring
注意替换实际values中实际的Prometheus地址:prometheus-k8s.monitoring:9090
二、配置adapter策略
本次我们的需求是服务在扩缩容时,支持limit阀值扩容及缩容速率限制。
第一步我们需要计算出服务在实际工作中占用limit的百分比,CPU和内存。
rules:
#计算cpu使用百分比
- seriesQuery: '{__name__=~"^container_cpu_usage_seconds_total.*",namespace!="",pod!="",container !=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: container_cpu_usage_seconds_total
as: "podCPUUtilization"
metricsQuery: sum(irate(container_cpu_usage_seconds_total{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}[2m])) by (<<.GroupBy>>) / (sum(container_spec_cpu_quota{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}/100000) by (<<.GroupBy>>)) * 100
#计算内存使用百分比
- seriesQuery: '{__name__=~"^container_memory_working_set_bytes.*",namespace!="",pod!="",container !=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: container_memory_working_set_bytes
as: "podMemoryUtilization"
metricsQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}) by (<<.GroupBy>>) / (sum(container_spec_memory_limit_bytes{<<.LabelMatchers>>,pod!="",container !="",container!="POD"}) by (<<.GroupBy>>)) * 100
三、测试custom.metrics.k8s.io
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" |jq .
能获取到我们采集的指标接口如下:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "namespaces/podCPUUtilization",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "pods/podCPUUtilization",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/podMemoryUtilization",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "pods/podMemoryUtilization",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}
以/kafka/kafka-0为例
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/kafka/pods/kafka-0/podCPUUtilizationPercentage" |jq .
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "kafka",
"name": "kafka-0",
"apiVersion": "/v1"
},
"metricName": "podCPUUtilization",
"timestamp": "2023-12-08T09:19:54Z",
"value": "2800m",
"selector": null
}
]
}
注意单位是m,千分之一,如上结果为2.8%,通过grafana监控我们可以验证,结果正确。
四、集成到基础chart
这里你可以单独配置hpa规则,我这边是集成在了chart模板里。 为了限制扩缩容的速率,增加了behavior参数
values中的配置如下,举例:
autoscaling:
enabled: true
minReplicas: 1 #最小副本数
maxReplicas: 5 #最大副本数
podCPUUtilization: 80 #cpu扩容阈值
podMemoryUtilization: 80 #内存扩容阈值
behavior:
scaleDown:
stabilizationWindowSeconds: 300 #缩容窗口观察期300s自动扩缩算法查看之前计算的期望状态,并使用指定时间间隔内的最大值。 在本例子中,过去 5 分钟的所有期望状态都会被考虑
policies:
- type: Pods
value: 1
periodSeconds: 60 #每60s最多缩容1个
scaleUp:
stabilizationWindowSeconds: 60 #窗口观察期60s
policies:
- type: Pods
value: 2
periodSeconds: 60 #每60s最多扩容2个
也可以通过百分比来控制如:
- type: Percent
value: 30
periodSeconds: 60
#每60s最多扩(缩)30%
五、压测
以上面的例子来测试
对缩容的速率限制为,窗口观察期5分钟,每60秒最多只能缩1个
对扩容的速率限制为,窗口观察期1分钟,每60秒最多可以扩2个
ab -n 100000000 -c 200 http://10.23.130.111/index.html
结果如下图:
扩容和缩容的速率符合我们的预期