k8s series 24: calico初级(监控)

1,985 阅读5分钟

「这是我参与11月更文挑战的第12天,活动详情查看:2021最后一次更文挑战」。

前言

calico官方提供了prometheus的监控,且还详细说明了相关指标

因为我们是默认安装的calico,并没有启用Typha组件,所以kubernetes集群中就只有Felix和kube-controlles两大组件在运行

Felix的详细指标: docs.projectcalico.org/reference/f…

kube-controlles的详细指标:docs.projectcalico.org/reference/k…

calico组件配置

虽然官方提供了相关prometheus的监控,但默认配置是禁用的,需要手动开启,且还需要提供端点供prometheus拉取监控数据

Felix配置

启用Felix的prometheus指标

calicoctl patch felixConfiguration default  --patch '{"spec":{"prometheusMetricsEnabled": true}}'

创建Felix指标端点

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: felix-metrics-svc
  namespace: kube-system
spec:
  selector:
    k8s-app: calico-node
  ports:
  - port: 9091
    targetPort: 9091
EOF

kube-controlles配置

kube-controlles的prometheus指标默认是启用的,需要无须改动,如果想更改它的监控端口,可以使用如下命令,如果端口改为0,则为禁用

#默认端口监控在9094
calicoctl patch kubecontrollersconfiguration default  --patch '{"spec":{"prometheusMetricsPort": 9094}}'

创建kube-controlles指标端点

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: kube-controllers-metrics-svc
  namespace: kube-system
spec:
  selector:
    k8s-app: calico-kube-controllers
  ports:
  - port: 9094
    targetPort: 9094
EOF

两个组件的Service创建成功后,查看一下

image.png

prometheus安装配置

在安装prometheus之前需要提前 创建相关服务账号和权限

创建namespace

创建一个独立的命令空间,供监控使用

kubectl apply -f -<<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: calico-monitoring
  labels:
    app:  ns-calico-monitoring
    role: monitoring
EOF

创建服务账号

创建一个具从calico采集数据的账号,然后授于相关权限

下面配置分为三部分,创建角色,创建账号,绑定角色账号

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: calico-prometheus-user
rules:
- apiGroups: [""]
  resources:
  - endpoints
  - services
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-prometheus-user
  namespace: calico-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-prometheus-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-prometheus-user
subjects:
- kind: ServiceAccount
  name: calico-prometheus-user
  namespace: calico-monitoring
EOF

prometheus配置文件

创建prometheus的配置文件,如果二制进安装过prometheus,应该发现下列的配置几乎是一样的,后期想修改相关配置,直接编辑既可

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: calico-monitoring
data:
  prometheus.yml: |-
    global:
      scrape_interval:   15s
      external_labels:
        monitor: 'tutorial-monitor'
    scrape_configs:
    - job_name: 'prometheus'
      scrape_interval: 5s
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'felix_metrics'
      scrape_interval: 5s
      scheme: http
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        regex: felix-metrics-svc
        replacement: $1
        action: keep
    - job_name: 'typha_metrics'
      scrape_interval: 5s
      scheme: http
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        regex: typha-metrics-svc
        replacement: $1
        action: keep
    - job_name: 'kube_controllers_metrics'
      scrape_interval: 5s
      scheme: http
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        regex: kube-controllers-metrics-svc
        replacement: $1
        action: keep
EOF

安装prometheus

以上步骤成功后,执行下列安装步骤

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: prometheus-pod
  namespace: calico-monitoring
  labels:
    app: prometheus-pod
    role: monitoring
spec:
  serviceAccountName: calico-prometheus-user
  containers:
  - name: prometheus-pod
    image: prom/prometheus
    resources:
      limits:
        memory: "128Mi"
        cpu: "500m"
    volumeMounts:
    - name: config-volume
      mountPath: /etc/prometheus/prometheus.yml
      subPath: prometheus.yml
    ports:
    - containerPort: 9090
  volumes:
  - name: config-volume
    configMap:
      name: prometheus-config
EOF

查看安装进度,如果返回的状态是Running说明安装完成

kubectl get pods prometheus-pod -n calico-monitoring

访问prometheus

因为我们没有给promethesu创建Service,所以这里先使用端口转发,简单验证一下prometheus是否获取到了calico的数据

kubectl port-forward --address 0.0.0.0 pod/prometheus-pod 9090:9090 -n calico-monitoring

访问 http://ip:9090 端口

Grafana安装配置

在配置Grafana之前,需要声明prometheus访问方式,便于访问数据显示图表

创建prometheus Service

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: prometheus-dashboard-svc
  namespace: calico-monitoring
spec:
  selector:
      app:  prometheus-pod
      role: monitoring
  ports:
  - port: 9090
    targetPort: 9090
EOF

创建grafana配置

创建grafana连接数据库的类型,地址,端口,以及连接方式

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-config
  namespace: calico-monitoring
data:
  prometheus.yaml: |-
    {
        "apiVersion": 1,
        "datasources": [
            {
               "access":"proxy",
                "editable": true,
                "name": "calico-demo-prometheus",
                "orgId": 1,
                "type": "prometheus",
                "url": "http://prometheus-dashboard-svc.calico-monitoring.svc:9090",
                "version": 1
            }
        ]
    }
EOF

Felix的仪表盘配置

kubectl apply -f https://docs.projectcalico.org/manifests/grafana-dashboards.yaml

安装Grafana

直接应用如下配置,会从grafana官方下载最新的镜像

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: grafana-pod
  namespace: calico-monitoring
  labels:
    app:  grafana-pod
    role: monitoring
spec:
  containers:
  - name: grafana-pod
    image: grafana/grafana:latest
    resources:
      limits:
        memory: "128Mi"
        cpu: "500m"
    volumeMounts:
    - name: grafana-config-volume
      mountPath: /etc/grafana/provisioning/datasources
    - name: grafana-dashboards-volume
      mountPath: /etc/grafana/provisioning/dashboards
    - name: grafana-storage-volume
      mountPath: /var/lib/grafana
    ports:
    - containerPort: 3000
  volumes:
  - name: grafana-storage-volume
    emptyDir: {}
  - name: grafana-config-volume
    configMap:
      name: grafana-config
  - name: grafana-dashboards-volume
    configMap:
      name: grafana-dashboards-config
EOF

访问grafana

因暂时没有写Service配置,先转发端口来访问,验证一下监控是否正常

kubectl port-forward --address 0.0.0.0 pod/grafana-pod 3000:3000 -n calico-monitoring

访问http://IP:3000 访问Grafana的web-ui登陆页,默认账号密码都是: admin/admin

登陆成功后,会提示修改密码或跳过,后续在设置中修改

image.png

登陆好之看,是没有任何东西的,需要访问一下这个地址: http://ip:3000/d/calico-felix-dashboard/felix-dashboard-calico?orgId=1

会打开calico给我们提供的Dashborad,这里点一下加星,后面就可以在主页上找到该面版了

image.png

创建Service

直接使用expose命令创建一个NodePort类型的Service

#创建Service
kubectl expose pod grafana-pod --port=3000 --target-port=3000 --type=NodePort -n calico-monitoring
#查看暴露的端口
kubectl get svc -n calico-monitoring

访问集群节点ip+30538端口 就可以打开grafana了 image.png

卸载

如果觉得该套监控比较占用集群资源,如果单纯的只是想看看效果,可执行下列命令来删除这套监控

kubectl delete service felix-metrics-svc -n kube-system
kubectl delete service typha-metrics-svc -n kube-system
kubectl delete service kube-controllers-metrics-svc -n kube-system
kubectl delete namespace calico-monitoring
kubectl delete ClusterRole calico-prometheus-user
kubectl delete clusterrolebinding calico-prometheus-user

kubectl delete namespace calico-monitoring