K8S部署self-hosted模式可观测工具pixie

2,223 阅读8分钟

简介

Pixie 是一个用于 Kubernetes 应用程序的开源可观察性工具。Pixie 使用eBPF自动捕获遥测数据,无需手动检测。开发人员可以使用 Pixie 查看其集群的高级状态(服务地图、集群资源、应用程序流量),还可以深入查看更详细的视图(pod 状态、火焰图、单独的全身应用程序请求)。

Pixie 由New Relic, Inc.于 2021 年 6 月作为沙盒项目贡献给云原生计算基金会。

亮点

  • 自动遥测:Pixie 使用 eBPF 自动收集遥测数据,例如全身请求、资源和网络指标、应用程序配置文件。 
  • 集群内边缘计算:Pixie在集群内收集、存储和查询所有遥测数据。Pixie 使用的集群 CPU 不到 5%,在大多数情况下不到 2%。 
  • Scriptability:PxL是由Pixie开发的灵活的 Pythonic 查询语言,可以跨 Pixie 的 UI、CLI 和客户端 API 使用。Pixie为常见用例提供了一组社区脚本。

架构

Pixie 平台由多个组件组成:

  • Pixie Edge Module (PEM):Pixie 的代理,安装在每个节点上。PEM 使用 eBPF 收集数据,这些数据存储在节点本地。 
  •  Vizier:Pixie 的收集器,按集群安装。负责查询执行和管理 PEM。 Pixie Cloud:用于用户管理、身份验证和数据代理。可以托管或自托管。 
  •  Pixie CLI:用于部署 Pixie。也可用于运行查询和管理 API 密钥等资源。 
  •  Pixie 客户端 API:用于以编程方式访问 Pixie(例如集成、Slackbots 和需要 Pixie 数据作为输入的自定义用户逻辑)

数据源

Pixie 自动收集以下数据: 

  •  协议跟踪:应用程序 pod 之间的完整消息。Tracing 当前支持以下协议。有关详细信息,请参阅请求跟踪服务性能数据库查询分析教程。 
  • 资源指标:您的 pod 的 CPU、内存和 I/O 指标。有关详细信息,请参阅Infra Health教程。 
  •  网络指标:网络层和连接级 RX/TX 统计数据。有关详细信息,请参阅网络监控教程。 
  •  JVM 指标:Java 应用程序的 JVM 内存管理指标。 
  •  应用程序 CPU 配置文件:来自应用程序的采样堆栈跟踪。Pixie 的连续分析器始终在运行,以帮助您在需要时识别应用程序性能瓶颈。目前支持编译语言(Go、Rust、C/C++)。有关详细信息,请参阅连续应用程序分析教程。 
  •  Pixie 还可以由用户配置为从 Go 应用程序代码中收集动态日志并运行自定义 BPFTrace 脚本

支持的协议

Pixie 自动跟踪以下协议数据:

协议支持笔记
HTTP支持
HTTP2支持Golang gRPC(带和不带 TLS)。Golang 应用程序必须有调试信息
DNS支持
NATS支持需要带有调试信息的NATS 构建。
MySQL支持
PostgreSQL支持
Cassandra支持
Redis支持
Kafka支持
AMQP支持

其他协议的支持也在进行中

加密库

Pixie 支持跟踪使用以下库加密的流量:

LibraryNotes
OpenSSLVersion 1.1.0 or 1.1.1, dynamically linked.
Go TLSRequires a build with debug information.

部署

环境要求

  • k8s v.1.21+

  • centos 7.3+ Debian 10+ Ubuntu 18.04+

  • kernel v4.14+

  • cpu x86-64 ARM暂时不支持

  • mem 1G+(每个节点)

  • Pod Security Context

    • Pixie 与 Linux 内核交互安装 BPF 程序来收集遥测数据。为了安装 BPF 程序,Pixie vizier-pem *pod 需要特权访问。

官方提供了多种环境的部署方案,本文采用self-Hosted模式+nginx-ingress访问。其他模式请参考官方文档

nginx-ingress安装

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.6.4/deploy/static/provider/baremetal/deploy.yaml

使用hostport 映射 80 443 端口供后续访问

需要安装默认网络存储,并配置默认storageclass

本文使用nfs作为provide,nfs部署不再赘述

以下为nfs-provide部署yaml,根据需求修改NFS_SERVER NFS_PATH

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nfs-client-provisioner
  namespace: kube-system
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: mynfs
            - name: NFS_SERVER
              value: 10.0.0.1
            - name: NFS_PATH
              value: /data/nfs
      volumes:
        - name: nfs-client-root
          nfs:
            server: 10.0.0.1
            path: /data/nfs
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: kube-system
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: kube-system
roleRef:
  kind: Role
  name: leader-locki
ng-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: nfs
provisioner: mynfs
reclaimPolicy: Delete
volumeBindingMode: Immediate

主机添加hosts

根据个人需求替换dev.withpixie.dev

10.0.0.1 dev.withpixie.dev
10.0.0.1 work.dev.withpixie.dev

pixie cloud self-hosted模式部署

  1. Clone Pixie repo

    git clone https://github.com/pixie-io/pixie.git
    cd pixie
    
  2. 选择pixie cloud 最后一个release版本注入环境变量

    export LATEST_CLOUD_RELEASE=$(git tag | grep 'release/cloud'  | sort -r | head -n 1 | awk -F/ '{print $NF}')
    
  3. 切换分支

    git checkout "release/cloud/prod/${LATEST_CLOUD_RELEASE}"
    
  4. 更新项目中kustomization文件中的镜像tag

    perl -pi -e "s|newTag: latest|newTag: \"${LATEST_CLOUD_RELEASE}\"|g" k8s/cloud/public/kustomization.yaml
    
  5. (可选)将项目中以下文件中的dev.withpixie.dev替换为自定义域名

    k8s/cloud/public/proxy_envoy.yaml
    k8s/cloud/public/domain_config.yaml
    scripts/create_cloud_secrets.sh
    
  6. 创建namespace

    kubectl create namespace plc
    
  7. 创建证书文件以及secret

    点击链接下载mkcert二进制文件

    mkcert -install && ./scripts/create_cloud_secrets.sh
    
  8. 安装[kustomize](Kustomize | SIG CLI)

  9. 创建pixie cloud 依赖组件,相关pod会在plc命名空间启动,等所有组件正常后再进行下一步

    #添加storage patch
    k8s/cloud_deps/public/elastic/elastic_storage_patch.yaml
    
    # Master node
    - op: replace
      path: /spec/nodeSets/0/volumeClaimTemplates/0/spec/storageClassName
      value: nfs
    # Data node.
    - op: replace
      path: /spec/nodeSets/1/volumeClaimTemplates/0/spec/storageClassName
      value: nfs   
     
     k8s/cloud_deps/public/nats/storage_patch.yaml
     
     在spec.volumeClaimTemplates.spec添加 storageClassName: nfs
     
     k8s/cloud_deps/public/postgres/postgres_persistent_volume.yaml
     在spec中添加storageClassName: nfs
     
      
    kustomize build k8s/cloud_deps/base/elastic/operator | kubectl apply -f -
    kustomize build k8s/cloud_deps/public | kubectl apply -f -
    
  10. 部署pixie cloud

    修改k8s/cloud/public/domain_config.yaml ,将PASSTHROUGH_PROXY_PORT设为空

    PASSTHROUGH_PROXY_PORT: ""
    PL_DOMAIN_NAME: dev.withpixie.dev
    
    kustomize build k8s/cloud/public/ | kubectl apply -f -
    
  11. 查看所有pod

    kubectl get po -n plc
    NAME                                       READY   STATUS    RESTARTS   AGE
    api-server-74bd7fb65b-nv7v4                1/1     Running   0          144m
    artifact-tracker-server-6cf7dc66cd-bkct9   1/1     Running   0          23h
    auth-server-6474d66b7-gxkz2                1/1     Running   0          144m
    cloud-proxy-5c458f9b99-jwkrv               2/2     Running   0          145m
    config-manager-server-6f94d96687-p6h2l     1/1     Running   0          23h
    cron-script-server-744fb79449-44784        1/1     Running   0          23h
    hydra-6496d8d76-lxlfr                      2/2     Running   0          23h
    indexer-server-5cc4685b86-vkbnn            1/1     Running   0          23h
    kratos-589bb4f659-r2qsw                    2/2     Running   0          23h
    metrics-server-76cc598bc9-7jt6x            1/1     Running   0          23h
    pl-elastic-es-data-0                       1/1     Running   0          23h
    pl-elastic-es-master-0                     1/1     Running   0          23h
    pl-elastic-es-master-1                     1/1     Running   0          23h
    pl-nats-0                                  1/1     Running   0          23h
    pl-nats-1                                  1/1     Running   0          23h
    pl-nats-2                                  1/1     Running   0          23h
    plugin-server-df75f76cf-vtnzx              1/1     Running   0          23h
    postgres-6f75677777-5jc4d                  1/1     Running   0          23h
    profile-server-bcb7bb496-2mzkl             1/1     Running   0          23h
    project-manager-server-57575ff8b7-28hzt    1/1     Running   0          23h
    scriptmgr-server-6f7f5968d4-vbnjd          1/1     Running   0          23h
    vzconn-server-7685cb68b4-jjwnk             1/1     Running   0          23h
    vzmgr-server-7bc9d5d46c-fblns              1/1     Running   0          23h
    
  12. 创建Ingress规则

    kubectl apply -f k8s/cloud/overlays/exposed_services_nginx/cloud_ingress_grpcs.yaml
    kubectl apply -f k8s/cloud/overlays/exposed_services_nginx/cloud_ingress_https.yaml
    
    

    此处亲测有坑,需要将cloud_ingress_grpcs.yaml中除了/px.api.vizierpb.VizierService/ 这个路径相关配置以外所有的 service: cloud-proxy-service改为service: api-service 相关端口从5555改为51200 本文使用的版本为 release/cloud/prod/1676065759,若其他版本修复了这个问题,可以忽略

    ## Replace all occurrences of work.dev.withpixie.dev with the custom domain name you wish to use
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: cloud-ingress-grpcs
      namespace: plc
      annotations:
        nginx.ingress.kubernetes.io/backend-protocol: "GRPCS"
    spec:
      ingressClassName: nginx
      tls:
      - hosts:
        - work.dev.withpixie.dev
        - work.work.dev.withpixie.dev
        secretName: cloud-proxy-tls-certs
      rules:
      - host: work.dev.withpixie.dev
        http:
          paths:
          - path: /pl.cloudapi.ArtifactTracker/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.services.VZConnService/
            pathType: Prefix
            backend:
              service:
                name: vzconn-service
                port:
                  number: 51600
          - path: /px.cloudapi.ArtifactTracker/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.APIKeyManager/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.AuthService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.ConfigService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.OrganizationService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.PluginService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.UserService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierClusterInfo/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierDeploymentKeyManager/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierImageAuthorization/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.api.vizierpb.VizierService/
            pathType: Prefix
            backend:
              service:
                name: cloud-proxy-service
                port:
                  number: 4444
      - host: work.work.dev.withpixie.dev
        http:
          paths:
          - path: /pl.cloudapi.ArtifactTracker/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.services.VZConnService/
            pathType: Prefix
            backend:
              service:
                name: vzconn-service
                port:
                  number: 51600
          - path: /px.cloudapi.ArtifactTracker/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.APIKeyManager/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.AuthService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.ConfigService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.OrganizationService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.PluginService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.UserService/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierClusterInfo/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierDeploymentKeyManager/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.cloudapi.VizierImageAuthorization/
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 51200
          - path: /px.api.vizierpb.VizierService/
            pathType: Prefix
            backend:
              service:
                name: cloud-proxy-service
                port:
                  number: 4444
    
  13. 安装pixie(集群数据采集组件)

    根据自己的域名配置

    export PL_CLOUD_ADDR=dev.withpixie.dev
    

安装pixie 执行脚本并按操作执行下载 登录 验证逻辑并执行后续安装操作

    # Copy and run command to install the Pixie CLI.
    bash -c "$(curl -fsSL https://work.dev.withpixie.dev/install.sh)"

其中需要打开游览器登录使用admin@default.com 作为identity 使用admin作为密码

注意登录的机器也需要配置hosts

打开页面登录获取token

image.png

image.png

拷贝token填入终端

然后执行

px deploy --dev_cloud_namespace plc

安装完成,查看页面

image.png

image.png

image.png