简介
Pixie 是一个用于 Kubernetes 应用程序的开源可观察性工具。Pixie 使用eBPF自动捕获遥测数据,无需手动检测。开发人员可以使用 Pixie 查看其集群的高级状态(服务地图、集群资源、应用程序流量),还可以深入查看更详细的视图(pod 状态、火焰图、单独的全身应用程序请求)。
Pixie 由New Relic, Inc.于 2021 年 6 月作为沙盒项目贡献给云原生计算基金会。
亮点
- 自动遥测:Pixie 使用 eBPF 自动收集遥测数据,例如全身请求、资源和网络指标、应用程序配置文件等。
- 集群内边缘计算:Pixie在集群内收集、存储和查询所有遥测数据。Pixie 使用的集群 CPU 不到 5%,在大多数情况下不到 2%。
- Scriptability:PxL是由Pixie开发的灵活的 Pythonic 查询语言,可以跨 Pixie 的 UI、CLI 和客户端 API 使用。Pixie为常见用例提供了一组社区脚本。
架构
Pixie 平台由多个组件组成:
- Pixie Edge Module (PEM):Pixie 的代理,安装在每个节点上。PEM 使用 eBPF 收集数据,这些数据存储在节点本地。
- Vizier:Pixie 的收集器,按集群安装。负责查询执行和管理 PEM。 Pixie Cloud:用于用户管理、身份验证和数据代理。可以托管或自托管。
- Pixie CLI:用于部署 Pixie。也可用于运行查询和管理 API 密钥等资源。
- Pixie 客户端 API:用于以编程方式访问 Pixie(例如集成、Slackbots 和需要 Pixie 数据作为输入的自定义用户逻辑)
数据源
Pixie 自动收集以下数据:
- 协议跟踪:应用程序 pod 之间的完整消息。Tracing 当前支持以下协议。有关详细信息,请参阅请求跟踪、服务性能和数据库查询分析教程。
- 资源指标:您的 pod 的 CPU、内存和 I/O 指标。有关详细信息,请参阅Infra Health教程。
- 网络指标:网络层和连接级 RX/TX 统计数据。有关详细信息,请参阅网络监控教程。
- JVM 指标:Java 应用程序的 JVM 内存管理指标。
- 应用程序 CPU 配置文件:来自应用程序的采样堆栈跟踪。Pixie 的连续分析器始终在运行,以帮助您在需要时识别应用程序性能瓶颈。目前支持编译语言(Go、Rust、C/C++)。有关详细信息,请参阅连续应用程序分析教程。
- Pixie 还可以由用户配置为从 Go 应用程序代码中收集动态日志并运行自定义 BPFTrace 脚本。
支持的协议
Pixie 自动跟踪以下协议数据:
协议 | 支持 | 笔记 |
---|---|---|
HTTP | 支持 | |
HTTP2 | 支持Golang gRPC(带和不带 TLS)。 | Golang 应用程序必须有调试信息。 |
DNS | 支持 | |
NATS | 支持 | 需要带有调试信息的NATS 构建。 |
MySQL | 支持 | |
PostgreSQL | 支持 | |
Cassandra | 支持 | |
Redis | 支持 | |
Kafka | 支持 | |
AMQP | 支持 |
其他协议的支持也在进行中
加密库
Pixie 支持跟踪使用以下库加密的流量:
Library | Notes |
---|---|
OpenSSL | Version 1.1.0 or 1.1.1, dynamically linked. |
Go TLS | Requires a build with debug information. |
部署
环境要求
-
k8s v.1.21+
-
centos 7.3+ Debian 10+ Ubuntu 18.04+
-
kernel v4.14+
-
cpu x86-64 ARM暂时不支持
-
mem 1G+(每个节点)
-
Pod Security Context
-
- Pixie 与 Linux 内核交互安装 BPF 程序来收集遥测数据。为了安装 BPF 程序,Pixie vizier-pem *pod 需要特权访问。
官方提供了多种环境的部署方案,本文采用self-Hosted模式+nginx-ingress访问。其他模式请参考官方文档
nginx-ingress安装
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.6.4/deploy/static/provider/baremetal/deploy.yaml
使用hostport 映射 80 443 端口供后续访问
需要安装默认网络存储,并配置默认storageclass
本文使用nfs作为provide,nfs部署不再赘述
以下为nfs-provide部署yaml,根据需求修改NFS_SERVER NFS_PATH
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: nfs-client-provisioner
namespace: kube-system
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: mynfs
- name: NFS_SERVER
value: 10.0.0.1
- name: NFS_PATH
value: /data/nfs
volumes:
- name: nfs-client-root
nfs:
server: 10.0.0.1
path: /data/nfs
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: kube-system
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: kube-system
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: kube-system
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: kube-system
roleRef:
kind: Role
name: leader-locki
ng-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: nfs
provisioner: mynfs
reclaimPolicy: Delete
volumeBindingMode: Immediate
主机添加hosts
根据个人需求替换dev.withpixie.dev
10.0.0.1 dev.withpixie.dev
10.0.0.1 work.dev.withpixie.dev
pixie cloud self-hosted模式部署
-
Clone Pixie repo
git clone https://github.com/pixie-io/pixie.git cd pixie
-
选择pixie cloud 最后一个release版本注入环境变量
export LATEST_CLOUD_RELEASE=$(git tag | grep 'release/cloud' | sort -r | head -n 1 | awk -F/ '{print $NF}')
-
切换分支
git checkout "release/cloud/prod/${LATEST_CLOUD_RELEASE}"
-
更新项目中kustomization文件中的镜像tag
perl -pi -e "s|newTag: latest|newTag: \"${LATEST_CLOUD_RELEASE}\"|g" k8s/cloud/public/kustomization.yaml
-
(可选)将项目中以下文件中的dev.withpixie.dev替换为自定义域名
k8s/cloud/public/proxy_envoy.yaml k8s/cloud/public/domain_config.yaml scripts/create_cloud_secrets.sh
-
创建namespace
kubectl create namespace plc
-
创建证书文件以及secret
点击链接下载mkcert二进制文件
mkcert -install && ./scripts/create_cloud_secrets.sh
-
安装[kustomize](Kustomize | SIG CLI)
-
创建pixie cloud 依赖组件,相关pod会在plc命名空间启动,等所有组件正常后再进行下一步
#添加storage patch k8s/cloud_deps/public/elastic/elastic_storage_patch.yaml # Master node - op: replace path: /spec/nodeSets/0/volumeClaimTemplates/0/spec/storageClassName value: nfs # Data node. - op: replace path: /spec/nodeSets/1/volumeClaimTemplates/0/spec/storageClassName value: nfs k8s/cloud_deps/public/nats/storage_patch.yaml 在spec.volumeClaimTemplates.spec添加 storageClassName: nfs k8s/cloud_deps/public/postgres/postgres_persistent_volume.yaml 在spec中添加storageClassName: nfs kustomize build k8s/cloud_deps/base/elastic/operator | kubectl apply -f - kustomize build k8s/cloud_deps/public | kubectl apply -f -
-
部署pixie cloud
修改k8s/cloud/public/domain_config.yaml ,将PASSTHROUGH_PROXY_PORT设为空
PASSTHROUGH_PROXY_PORT: "" PL_DOMAIN_NAME: dev.withpixie.dev
kustomize build k8s/cloud/public/ | kubectl apply -f -
-
查看所有pod
kubectl get po -n plc NAME READY STATUS RESTARTS AGE api-server-74bd7fb65b-nv7v4 1/1 Running 0 144m artifact-tracker-server-6cf7dc66cd-bkct9 1/1 Running 0 23h auth-server-6474d66b7-gxkz2 1/1 Running 0 144m cloud-proxy-5c458f9b99-jwkrv 2/2 Running 0 145m config-manager-server-6f94d96687-p6h2l 1/1 Running 0 23h cron-script-server-744fb79449-44784 1/1 Running 0 23h hydra-6496d8d76-lxlfr 2/2 Running 0 23h indexer-server-5cc4685b86-vkbnn 1/1 Running 0 23h kratos-589bb4f659-r2qsw 2/2 Running 0 23h metrics-server-76cc598bc9-7jt6x 1/1 Running 0 23h pl-elastic-es-data-0 1/1 Running 0 23h pl-elastic-es-master-0 1/1 Running 0 23h pl-elastic-es-master-1 1/1 Running 0 23h pl-nats-0 1/1 Running 0 23h pl-nats-1 1/1 Running 0 23h pl-nats-2 1/1 Running 0 23h plugin-server-df75f76cf-vtnzx 1/1 Running 0 23h postgres-6f75677777-5jc4d 1/1 Running 0 23h profile-server-bcb7bb496-2mzkl 1/1 Running 0 23h project-manager-server-57575ff8b7-28hzt 1/1 Running 0 23h scriptmgr-server-6f7f5968d4-vbnjd 1/1 Running 0 23h vzconn-server-7685cb68b4-jjwnk 1/1 Running 0 23h vzmgr-server-7bc9d5d46c-fblns 1/1 Running 0 23h
-
创建Ingress规则
kubectl apply -f k8s/cloud/overlays/exposed_services_nginx/cloud_ingress_grpcs.yaml kubectl apply -f k8s/cloud/overlays/exposed_services_nginx/cloud_ingress_https.yaml
此处亲测有坑,需要将cloud_ingress_grpcs.yaml中除了/px.api.vizierpb.VizierService/ 这个路径相关配置以外所有的 service: cloud-proxy-service改为service: api-service 相关端口从5555改为51200 本文使用的版本为 release/cloud/prod/1676065759,若其他版本修复了这个问题,可以忽略
## Replace all occurrences of work.dev.withpixie.dev with the custom domain name you wish to use --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: cloud-ingress-grpcs namespace: plc annotations: nginx.ingress.kubernetes.io/backend-protocol: "GRPCS" spec: ingressClassName: nginx tls: - hosts: - work.dev.withpixie.dev - work.work.dev.withpixie.dev secretName: cloud-proxy-tls-certs rules: - host: work.dev.withpixie.dev http: paths: - path: /pl.cloudapi.ArtifactTracker/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.services.VZConnService/ pathType: Prefix backend: service: name: vzconn-service port: number: 51600 - path: /px.cloudapi.ArtifactTracker/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.APIKeyManager/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.AuthService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.ConfigService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.OrganizationService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.PluginService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.UserService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierClusterInfo/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierDeploymentKeyManager/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierImageAuthorization/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.api.vizierpb.VizierService/ pathType: Prefix backend: service: name: cloud-proxy-service port: number: 4444 - host: work.work.dev.withpixie.dev http: paths: - path: /pl.cloudapi.ArtifactTracker/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.services.VZConnService/ pathType: Prefix backend: service: name: vzconn-service port: number: 51600 - path: /px.cloudapi.ArtifactTracker/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.APIKeyManager/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.AuthService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.ConfigService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.OrganizationService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.PluginService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.UserService/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierClusterInfo/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierDeploymentKeyManager/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.cloudapi.VizierImageAuthorization/ pathType: Prefix backend: service: name: api-service port: number: 51200 - path: /px.api.vizierpb.VizierService/ pathType: Prefix backend: service: name: cloud-proxy-service port: number: 4444
-
安装pixie(集群数据采集组件)
根据自己的域名配置
export PL_CLOUD_ADDR=dev.withpixie.dev
安装pixie 执行脚本并按操作执行下载 登录 验证逻辑并执行后续安装操作
# Copy and run command to install the Pixie CLI.
bash -c "$(curl -fsSL https://work.dev.withpixie.dev/install.sh)"
其中需要打开游览器登录使用admin@default.com 作为identity 使用admin作为密码
注意登录的机器也需要配置hosts
打开页面登录获取token
拷贝token填入终端
然后执行
px deploy --dev_cloud_namespace plc
安装完成,查看页面