K8S如何完成金丝雀发布

1,255 阅读8分钟

1. 什么是金丝雀发布

金丝雀发布有名灰度发布。起源于17世纪,英国矿井工人发现,金丝雀对瓦斯这种气体十分敏感。空气中哪怕有极其微量的瓦斯,金丝雀也会停止歌唱;而当瓦斯含量超过一定限度时,虽然人类毫无察觉,金丝雀却早已毒发身亡。当时在采矿设备相对简陋的条件下,工人们每次下井都会带上一只金丝雀作为“瓦斯检测指标”,以便在危险状况下紧急撤离

  • 在金丝雀发布开始后,先启动一个新版本应用,但是并不直接将流量切过来,而是测试人员对新版本进行线上测试,启动的这个新版本应用,就是我们的金丝雀。如果没有问题,那么可以将少量的用户流量导入到新版本上,然后再对新版本做运行状态观察,收集各种运行时数据,如果此时对新旧版本做各种数据对比,就是所谓的A/B测试。

  • 当确认新版本运行良好后,再逐步将更多的流量导入到新版本上,在此期间,还可以不断地调整新旧两个版本的运行的服务器副本数量,以使得新版本能够承受越来越大的流量压力。直到将100%的流量都切换到新版本上,最后关闭剩下的老版本服务,完成金丝雀发布。

  • 如果在金丝雀发布过程中(灰度期)发现了新版本有问题,就应该立即将流量切回老版本上,这样,就会将负面影响控制在最小范围内。

总之,金丝雀发布指的是在生产环境中分阶段逐步更新后端应用的版本(需要具备流量控制能力),在小范围验证符合预期之后,再推广至整个生产环境。

2. k8s-deployment实现金丝雀发布

2.1 yaml文件准备

  • deployment-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: myapp-deploy
spec:
    replicas: 2
    selector:
        matchLabels:
            app: myapp
            release: canary
    template:
        metadata:
            labels: 
                app: myapp
                release: canary
        spec:
            containers:
            - name: myapp
              image: ikubernetes/myapp:v2
              ports:
              - name: http
                containerPort: 80
  • service-myapp.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc 
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: myapp 
    release: canary

2.2 发布

[root@t34 deploy]# kubectl apply -f ../deploy/
deployment.apps/myapp-deploy created
service/myapp-svc created

2.3 测试

  • 查看
kubectl get pod -l app=myapp
NAME                            READY   STATUS    RESTARTS   AGE
myapp-deploy-675558bfc5-nwzp4   1/1     Running   0          76s
myapp-deploy-675558bfc5-z6zf7   1/1     Running   0          76s
  • 升级
kubectl set image deployment myapp-deploy myapp=ikubernetes/myapp:v3 && kubectl rollout pause deployment myapp-deploy
[root@t34 deploy]# kubectl get pod -l app=myapp
NAME                            READY   STATUS    RESTARTS   AGE
myapp-deploy-675558bfc5-nwzp4   1/1     Running   0          2m25s
myapp-deploy-675558bfc5-z6zf7   1/1     Running   0          2m25s
myapp-deploy-7f577979c8-jzzsj   1/1     Running   0          22s
[root@t34 deploy]# kubectl get deploy -l app=myapp
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
myapp-deploy   3/2     1            3           2m38s
[root@t34 deploy]# 

此时,myapp-deploy ready的数量为3,其中两个为myapp:v2的版本,一个为myapp:v3的版本

  • 流量测试
[root@t34 deploy]# kubectl get svc myapp-svc
NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
myapp-svc   ClusterIP   10.43.223.204   <none>        80/TCP    7m59s

[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>

此时,流量会随机的流向v2和v3的服务,如果测试v3服务稳定之后,可以通过rollout resume进行升级为v3

[root@t34 deploy]# kubectl rollout resume deploy myapp-deploy 
deployment.extensions/myapp-deploy resumed

[root@t34 deploy]# kubectl get pod -l app=myapp
NAME                            READY   STATUS    RESTARTS   AGE
myapp-deploy-7f577979c8-jzzsj   1/1     Running   0          15m
myapp-deploy-7f577979c8-rbwwx   1/1     Running   0          2m12s

[root@t34 deploy]# kubectl get deploy myapp-deploy 
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
myapp-deploy   2/2     2            2           16m

[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>

注意:升级服务到v3,只有一个pod是重新创建的,之前的发布的pod一直存在

  • 回滚
[root@t34 deploy]# kubectl get rs  -l app=myapp
NAME                      DESIRED   CURRENT   READY   AGE
myapp-deploy-675558bfc5   0         0         0       26m
myapp-deploy-7f577979c8   2         2         2       24m

[root@t34 deploy]# kubectl rollout history deploy myapp-deploy
deployment.extensions/myapp-deploy 
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

[root@t34 deploy]# kubectl rollout undo deploy myapp-deploy --to-revision=1
deployment.extensions/myapp-deploy rolled back

[root@t34 deploy]# kubectl get pod -l app=myapp
NAME                            READY   STATUS    RESTARTS   AGE
myapp-deploy-675558bfc5-nzn85   1/1     Running   0          18s
myapp-deploy-675558bfc5-tnhjd   1/1     Running   0          20s
[root@t34 deploy]# kubectl get deploy myapp-deploy 
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
myapp-deploy   2/2     2            2           29m
[root@t34 deploy]# kubectl rollout history deploy myapp-deploy
deployment.extensions/myapp-deploy 
REVISION  CHANGE-CAUSE
2         <none>
3         <none>

[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>

在以上操作的过程中,流量会随机在新旧版本中交替,但是没办法做到更加精准的控制,下面介绍k8s+isito的方式,来更加准确的控制访问流量的走向

3. k8s-isito实现金丝雀发布

isito的安装可以参考我之前的一片文章istio的安装,使用和诊断

3.1 yaml文件准备

  • deploy-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: myapp-deploy-v2
spec:
    replicas: 2
    selector:
        matchLabels:
            app: myapp
            version: v2 
    template:
        metadata:
            labels: 
                app: myapp
                version: v2 
        spec:
            containers:
            - name: myapp
              image: ikubernetes/myapp:v2
              ports:
              - name: http
                containerPort: 80
  • deploy-v3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: myapp-deploy-v3
spec:
    replicas: 2
    selector:
        matchLabels:
            app: myapp
            version: v3 
    template:
        metadata:
            labels: 
                app: myapp
                version: v3 
        spec:
            containers:
            - name: myapp
              image: ikubernetes/myapp:v3
              ports:
              - name: http
                containerPort: 80
  • service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp-svc 
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  selector:
    app: myapp 
  • gateway.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: kubeflow-gateway
  namespace: kubeflow
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
  • vs.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  hosts:
  - myapp.com
  gateways:
  - kubeflow/kubeflow-gateway
  http:
  - route:
    - destination:
        host: myapp-svc.default.svc.cluster.local
        subset: v2
      weight: 90
    - destination:
        host: myapp-svc.default.svc.cluster.local
        subset: v3
      weight: 10
  • dr.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: myapp-dr 
spec:
  host: myapp-svc 
  subsets:
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3

3.2 发布

[root@t34 canary]# kubectl apply -f ../canary/
deployment.apps/myapp-deploy-v2 created
deployment.apps/myapp-deploy-v3 created
destinationrule.networking.istio.io/myapp-dr created
service/myapp-svc created
virtualservice.networking.istio.io/myapp-vs created

3.3 测试

  • 查看
[root@t34 canary]# kubectl get deploy  | grep myapp
myapp-deploy-v2                                    2/2     2            2           119s
myapp-deploy-v3                                    2/2     2            2           119s

[root@t34 canary]# kubectl get pod -l app=myapp
NAME                               READY   STATUS    RESTARTS   AGE
myapp-deploy-v2-7fbfdfbb7b-2rm6r   1/1     Running   0          2m9s
myapp-deploy-v2-7fbfdfbb7b-qpkwd   1/1     Running   0          2m9s
myapp-deploy-v3-6c67c5b878-m9mzm   1/1     Running   0          2m9s
myapp-deploy-v3-6c67c5b878-s2t58   1/1     Running   0          2m9s

[root@t34 canary]# kubectl get svc myapp-svc
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
myapp-svc   ClusterIP   10.43.245.81   <none>        80/TCP    2m21s

[root@t34 canary]# kubectl get vs myapp-vs
NAME       GATEWAYS                      HOSTS         AGE
myapp-vs   [kubeflow/kubeflow-gateway]   [myapp.com]   2m46s

[root@t34 canary]# kubectl get dr
NAME       HOST        AGE
myapp-dr   myapp-svc   2m49s
  • 流量测试
[root@t34 canary]# kubectl get svc myapp-svc
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
myapp-svc   ClusterIP   10.43.245.81   <none>        80/TCP    7m46s

[root@t34 canary]# for i in {1..10}; do curl 10.43.245.81:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>

如果,按照上面deployment的方式进行测试的话,会发现v3的流量有可能更多(因为这是随机流向的),并不能达到我们设置90%的流量流向v2,10%的流量流量v3,因此,我们需要使用istio的ingress

[root@t34 canary]# kubectl get svc -n istio-system | grep ingress
istio-ingressgateway       NodePort       10.43.176.205   <none>        15020:30541/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:32727/TCP,15030:32111/TCP,15031:32031/TCP,15032:31013/TCP,15443:30928/TCP                                                199d

istio中ingress的80端口对应的nodeport为31380

[root@t34 canary]# for i in {1..10}; do curl -H "Host: myapp.com" 192.168.4.34:31380; done
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>

此时,大部分流量会流向v2,只有较少一部分会流向v3

如果更改v2和v3的比例为10%和90%,再进行测试如下:

[root@t34 canary]# for i in {1..10}; do curl -H "Host: myapp.com" 192.168.4.34:31380; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>

此时,更多的流量流向了v3

istio还可以根据不同的用户群体进行流量导向,更多的内容可以查看isito官方文档

4. 结语

在微服务时代,不同的服务相互联系,关系错中复杂,部署升级一个服务,可能造成整个系统的瘫痪,因此,需要选择合适的部署方式,从而将风险降到最低。金丝雀发布(灰度发布)只是多种部署方式的一种,还有蓝绿部署、滚动部署(deployment默认的升级方式为滚动升级)等,可以根据不同的业务场景选择不同的发布形式。