背景
排查过程
- pod服务开启是否正常
- 一直crash
- 只能,从有curl 等工具pod(同cluster),探测。发现一会ok,一会异常。后确认为 重复重启导致curl时结果不稳定
sh-4.2# curl http://172.19.139.254:8084/healthz sh-4.2# curl http://172.19.166.100:8082/healthz
- 一直crash
- 剔除健康检查逻辑
- 没有来得及验证完全,被找到根因了。
- 没有来得及验证完全,被找到根因了。
根因
解决
方案1:增加ipBlock(证明无效)
➜ helm git:(master) kubectl exec -ti argocd-redis-d486999b7-w7h96 --namespace=argocd -- /bin/sh
➜ helm git:(master) kubectl exec -ti centos-c68f668d8-lpldd -- /bin/sh
kubectl get networkpolicy argocd-repo-server-network-policy -nargocd -oyaml
➜ helm git:(master) kubectl logs -p argocd-application-controller-0 -n argocd
time="2022-05-13T15:04:58Z" level=info msg="Processing all cluster shards"
time="2022-05-13T15:04:58Z" level=info msg="appResyncPeriod=3m0s"
time="2022-05-13T15:04:58Z" level=info msg="Application Controller (version: v2.2.8+93d588c, built: 2022-03-23T00:27:32Z) starting (namespace: argocd)"
time="2022-05-13T15:04:58Z" level=info msg="Starting configmap/secret informers"
time="2022-05-13T15:04:58Z" level=info msg="Configmap/secret informer synced"
time="2022-05-13T15:04:58Z" level=info msg="Ignore status for CustomResourceDefinitions"
time="2022-05-13T15:04:58Z" level=info msg="Ignore '/spec/preserveUnknownFields' for CustomResourceDefinitions"
time="2022-05-13T15:04:58Z" level=info msg="0xc000bb7a40 subscribed to settings updates"
time="2022-05-13T15:04:58Z" level=info msg="Starting secretInformer forcluster"
➜ helm git:(master)
➜ helm git:(master) kubectl logs -p argocd-repo-server-7f944f76bf-47mc8 -n argocd
time="2022-05-13T15:02:57Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"
time="2022-05-13T15:02:57Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2022-05-13T15:02:57Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe826707274" dir= execID=6ca8f
time="2022-05-13T15:02:58Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe826707274]" dir= operation_name="exec gpg" time_ms=231.27132699999999
time="2022-05-13T15:02:58Z" level=info msg="Populating GnuPG keyring with keys from /app/config/gpg/source"
time="2022-05-13T15:02:58Z" level=info msg="gpg --no-permission-warning --list-public-keys" dir= execID=2bf8f
time="2022-05-13T15:02:58Z" level=info msg=Trace args="[gpg --no-permission-warning --list-public-keys]" dir= operation_name="exec gpg" time_ms=3.854837
time="2022-05-13T15:02:58Z" level=info msg="gpg --no-permission-warning -a --export 479D0430FE5CC66C" dir= execID=04aa7
time="2022-05-13T15:02:58Z" level=info msg=Trace args="[gpg --no-permission-warning -a --export 479D0430FE5CC66C]" dir= operation_name="exec gpg" time_ms=2.7001579999999996
time="2022-05-13T15:02:58Z" level=info msg="gpg-wrapper.sh --no-permission-warning --list-secret-keys 479D0430FE5CC66C" dir= execID=61d57
time="2022-05-13T15:02:58Z" level=info msg=Trace args="[gpg-wrapper.sh --no-permission-warning --list-secret-keys 479D0430FE5CC66C]" dir= operation_name="exec gpg-wrapper.sh" time_ms=4.097624000000001
time="2022-05-13T15:02:58Z" level=info msg="Loaded 0 (and removed 0) keys from keyring"
time="2022-05-13T15:02:58Z" level=info msg="argocd-repo-server v2.2.8+93d588c serving on [::]:8081"
time="2022-05-13T15:02:58Z" level=info msg="Starting GPG sync watcher on directory '/app/config/gpg/source'"
kubectl edit networkpolicy argocd-repo-server-network-policy -nargocd
edit 一下 加到这里在试试
- ipBlock: cidr: 172.19.0.0/16
方案,无效。
直接关闭 networkpolicy
方案2:若要使用ipvlan,必须关闭networkpolicy(有效、根因)
-
改true
-
然后重启terway-eniip的pod
kubectl delete -n kube-system pod -l app=terway-eniip
➜ helm git:(master) kubectl delete -n kube-system pod -l app=terway-eniip
pod "terway-eniip-62zzt" deleted
pod "terway-eniip-kjtt7" deleted
pod "terway-eniip-m9gfn" deleted
对于networkpolicy
这个有好处也有坏处,一般很少人用networkpolicy的,我们也反馈一下看看能不能优化。ipvlan对于健康检查的场景
选了ipvlan, networkpolicy可以不勾选。不开ipvlan可以使用networkpolicy
网络策略networkpolicy 这个是可以后期开启和关闭的 ,ipvlan不可以
选ipvlan的话可以关闭就可以,或者不创建networkpolicy的资源