安装
环境准备
centos:7.9docker:20.10.9kubernetes:v1.20.4helm:v3.2.1
使用 Helm 安装
第 1 步:添加 Chaos Mesh 仓库
在 Helm 仓库中添加 Chaos Mesh 仓库:
helm repo add chaos-mesh https://charts.chaos-mesh.org
第 2 步:创建安装 Chaos Mesh 的命名空间
推荐将 Chaos Mesh 安装在 chaos-testing 命名空间下,也可以指定任意命名空间安装 Chaos Mesh:
kubectl create ns chaos-testing
第 3 步:在docker环境下安装
Docker
helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-testing
验证安装
查看运行情况
要查看 Chaos Mesh 的运行情况,请执行以下命令:
kubectl get po -n chaos-testing
以下是预期输出:
NAME READY STATUS RESTARTS AGE
chaos-controller-manager-5cd8dc646c-qvrwd 1/1 Running 0 103s
chaos-daemon-75p56 1/1 Running 0 103s
chaos-daemon-gglmj 1/1 Running 0 103s
chaos-daemon-pm6nq 1/1 Running 0 103s
chaos-daemon-z6cfk 1/1 Running 0 104s
chaos-dashboard-649585686-5rshc 1/1 Running 0 103s
如果你的实际输出与预期输出相符,表示 Chaos Mesh 已经成功安装。
如果实际输出的 STATUS 状态不是 Running,则需要运行以下命令查看 Pod 的详细信息,然后依据错误提示排查并解决问题。
查看 dashboard
[root@m1 ~]# kubectl get svc -n chaos-testing chaos-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
chaos-dashboard NodePort 10.233.40.47 <none> 2333:31519/TCP 5m15s
访问nodeport
nodeport端口为31519 浏览器打开 masterip:31519
生成token
点击Click here to generate。
勾选Cluster scoped,Role 选择Manager ,然后点击COPY复制生成好的yaml文件,并保存为rbac.yaml
执行yaml
kubectl apply -f rbac.yaml
serviceaccount/account-default-viewer-dqscy created
role.rbac.authorization.k8s.io/role-default-viewer-dqscy created
rolebinding.rbac.authorization.k8s.io/bind-default-viewer-dqscy created
获取 token
kubectl describe -n default secrets account-default-viewer-dqscy
填写token
填写name及token,然后点击SUBMIT
提交完成后的页面
实验
准备测试pod
kubectl create deployment tomcat --image=tomcat:7
kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-5f7b97cd7-8xx6v 1/1 Running 0 6m18s
POD故障
POD FAILURE
新建实验
选择 Pod Fault 和 Pod Failure
选择测试namespace 及填写Name 选择 Run continuously 然后点击提交
提交完成后
验证故障
查看故障事件
查看
pod状态为CrashLoopBackOff
kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-5f7b97cd7-8xx6v 0/1 CrashLoopBackOff 0 9m43s
查看pod事件 为一直拉取镜像失败
kubectl describe pod tomcat-5f7b97cd7-8xx6v
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/tomcat-5f7b97cd7-8xx6v to fn01
Normal Pulled 12m kubelet Container image "tomcat:7" already present on machine
Normal Created 12m kubelet Created container tomcat
Normal Started 12m kubelet Started container tomcat
Normal Killing 5m20s kubelet Container tomcat definition changed, will be restarted
Normal BackOff 3m38s (x3 over 5m4s) kubelet Back-off pulling image "gcr.io/google-containers/pause:latest"
Warning Failed 3m38s (x3 over 5m4s) kubelet Error: ImagePullBackOff
Normal Pulling 2m35s (x4 over 5m20s) kubelet Pulling image "gcr.io/google-containers/pause:latest"
Warning Failed 2m10s (x4 over 5m5s) kubelet Failed to pull image "gcr.io/google-containers/pause:latest": rpc error: code = Unknown desc = Error response from daemon: Get "https://gcr.io/v2/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Failed 2m10s (x4 over 5m5s) kubelet Error: ErrImagePull
Warning BackOff 115s (x5 over 3m23s) kubelet Back-off restarting failed container
恢复故障
查看
pod状态,恢复正常,RESTARTS 加1
[root@m1 chaos]# kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-5f7b97cd7-8xx6v 1/1 Running 1 21m
POD KILL
新建实验
选择 Pod Fault 和 Pod KILL
选择测试namespace 及填写Name 选择 Run continuously 然后点击提交
提交完成后
验证故障
查看pod发现原来pod已经被kill 产生新的POD
kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-5f7b97cd7-74fj6 1/1 Running 0 2m43s
查看replicasets 事件 有新的pod 被拉起
kubectl describe replicasets.apps tomcat-5f7b97cd7
Name: tomcat-5f7b97cd7
Namespace: default
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 36m replicaset-controller Created pod: tomcat-5f7b97cd7-8xx6v
Normal SuccessfulCreate 2m47s replicaset-controller Created pod: tomcat-5f7b97cd7-74fj6
Container Kill
新建实验
选择 Pod Fault 和 Container Kill 及填入container names 这里填tomcat
选择测试namespace 及填写Name Duration 填写30s 然后点击提交
验证故障
查看pod状态,发现RESTARTS 次数加1
kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-5f7b97cd7-74fj6 1/1 Running 1 18m
查看pod事件,tomcat 容器退出后 pod 又拉起一个新的容器
kubectl describe pod tomcat-5f7b97cd7-74fj6
Name: tomcat-5f7b97cd7-74fj6
Namespace: default
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned default/tomcat-5f7b97cd7-74fj6 to fn01
Normal Pulled 31s (x2 over 19m) kubelet Container image "tomcat:7" already present on machine
Normal Created 30s (x2 over 19m) kubelet Created container tomcat
Normal Started 30s (x2 over 19m) kubelet Started container tomcat
网络
限流
环境准备
部署测试应用
kubectl create deployment networktest --image=zhfangk8s/nginx-test
查看podip
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
networktest-6ccdcf677f-757ph 1/1 Running 0 40m 10.233.105.12 fn03 <none> <none>
tomcat-5f7b97cd7-74fj6 1/1 Running 1 4h51m 10.233.99.16 fn01 <none> <none>
模拟测试流量
while true;do curl -O 10.233.105.12/test ;done
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1000M 100 1000M 0 0 278M 0 0:00:03 0:00:03 --:--:-- 278M
通过yaml创建限流
设置限流为100mbps
kind: NetworkChaos
apiVersion: chaos-mesh.org/v1alpha1
metadata:
name: bandwith
namespace: default
annotations:
experiment.chaos-mesh.org/pause: 'true'
spec:
selector:
namespaces:
- default
labelSelectors:
app: networktest
mode: one
action: bandwidth
bandwidth:
rate: 100mbps
limit: 10000000
buffer: 100000000
direction: to
验证限流效果
下载速度从278M下降到11.9M
[root@m1 chaos]# while true;do curl -O 10.233.105.12/test ;done
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1000M 100 1000M 0 0 12.2M 0 0:01:21 0:01:21 --:--:-- 11.9M
查看grafana监控 ,流量从2.47Gb下降至100Mb,与限流的100mbps相符。
Partition
通过ymal发布
将实验配置写入到文件中 network-partition.yaml,内容示例如下:
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: partition
spec:
action: partition
mode: all
selector:
namespaces:
- default
labelSelectors:
'app': 'tomcat'
direction: to
target:
mode: all
selector:
namespaces:
- default
labelSelectors:
'app': 'networktest'
该配置将阻止从 tomcat 向 networktest 建立的连接。direction 字段的值可以选择 to,from 及 both
使用 kubectl 创建实验,命令如下:
kubectl apply -f ./network-partition.yaml
验证实验
进入tomcat 容器ping networktest显示无法范围
kubectl exec -it tomcat-5f7b97cd7-74fj6 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# ping 10.233.105.12
PING 10.233.105.12 (10.233.105.12) 56(84) bytes of data.
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
^C
--- 10.233.105.12 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 1003ms
#
command terminated with exit code 1
在宿主机上ping可以成功
ping 10.233.105.12
PING 10.233.105.12 (10.233.105.12) 56(84) bytes of data.
64 bytes from 10.233.105.12: icmp_seq=1 ttl=63 time=0.339 ms
64 bytes from 10.233.105.12: icmp_seq=2 ttl=63 time=0.335 ms
64 bytes from 10.233.105.12: icmp_seq=3 ttl=63 time=0.265 ms