1. 什么是金丝雀发布
金丝雀发布有名灰度发布。起源于17世纪,英国矿井工人发现,金丝雀对瓦斯这种气体十分敏感。空气中哪怕有极其微量的瓦斯,金丝雀也会停止歌唱;而当瓦斯含量超过一定限度时,虽然人类毫无察觉,金丝雀却早已毒发身亡。当时在采矿设备相对简陋的条件下,工人们每次下井都会带上一只金丝雀作为“瓦斯检测指标”,以便在危险状况下紧急撤离
-
在金丝雀发布开始后,先启动一个新版本应用,但是并不直接将流量切过来,而是测试人员对新版本进行线上测试,启动的这个新版本应用,就是我们的金丝雀。如果没有问题,那么可以将少量的用户流量导入到新版本上,然后再对新版本做运行状态观察,收集各种运行时数据,如果此时对新旧版本做各种数据对比,就是所谓的A/B测试。
-
当确认新版本运行良好后,再逐步将更多的流量导入到新版本上,在此期间,还可以不断地调整新旧两个版本的运行的服务器副本数量,以使得新版本能够承受越来越大的流量压力。直到将100%的流量都切换到新版本上,最后关闭剩下的老版本服务,完成金丝雀发布。
-
如果在金丝雀发布过程中(灰度期)发现了新版本有问题,就应该立即将流量切回老版本上,这样,就会将负面影响控制在最小范围内。
总之,金丝雀发布指的是在生产环境中分阶段逐步更新后端应用的版本(需要具备流量控制能力),在小范围验证符合预期之后,再推广至整个生产环境。
2. k8s-deployment实现金丝雀发布
2.1 yaml文件准备
- deployment-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
spec:
replicas: 2
selector:
matchLabels:
app: myapp
release: canary
template:
metadata:
labels:
app: myapp
release: canary
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v2
ports:
- name: http
containerPort: 80
- service-myapp.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: myapp
release: canary
2.2 发布
[root@t34 deploy]# kubectl apply -f ../deploy/
deployment.apps/myapp-deploy created
service/myapp-svc created
2.3 测试
- 查看
kubectl get pod -l app=myapp
NAME READY STATUS RESTARTS AGE
myapp-deploy-675558bfc5-nwzp4 1/1 Running 0 76s
myapp-deploy-675558bfc5-z6zf7 1/1 Running 0 76s
- 升级
kubectl set image deployment myapp-deploy myapp=ikubernetes/myapp:v3 && kubectl rollout pause deployment myapp-deploy
[root@t34 deploy]# kubectl get pod -l app=myapp
NAME READY STATUS RESTARTS AGE
myapp-deploy-675558bfc5-nwzp4 1/1 Running 0 2m25s
myapp-deploy-675558bfc5-z6zf7 1/1 Running 0 2m25s
myapp-deploy-7f577979c8-jzzsj 1/1 Running 0 22s
[root@t34 deploy]# kubectl get deploy -l app=myapp
NAME READY UP-TO-DATE AVAILABLE AGE
myapp-deploy 3/2 1 3 2m38s
[root@t34 deploy]#
此时,myapp-deploy ready的数量为3,其中两个为myapp:v2的版本,一个为myapp:v3的版本
- 流量测试
[root@t34 deploy]# kubectl get svc myapp-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp-svc ClusterIP 10.43.223.204 <none> 80/TCP 7m59s
[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
此时,流量会随机的流向v2和v3的服务,如果测试v3服务稳定之后,可以通过rollout resume进行升级为v3
[root@t34 deploy]# kubectl rollout resume deploy myapp-deploy
deployment.extensions/myapp-deploy resumed
[root@t34 deploy]# kubectl get pod -l app=myapp
NAME READY STATUS RESTARTS AGE
myapp-deploy-7f577979c8-jzzsj 1/1 Running 0 15m
myapp-deploy-7f577979c8-rbwwx 1/1 Running 0 2m12s
[root@t34 deploy]# kubectl get deploy myapp-deploy
NAME READY UP-TO-DATE AVAILABLE AGE
myapp-deploy 2/2 2 2 16m
[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
注意:升级服务到v3,只有一个pod是重新创建的,之前的发布的pod一直存在
- 回滚
[root@t34 deploy]# kubectl get rs -l app=myapp
NAME DESIRED CURRENT READY AGE
myapp-deploy-675558bfc5 0 0 0 26m
myapp-deploy-7f577979c8 2 2 2 24m
[root@t34 deploy]# kubectl rollout history deploy myapp-deploy
deployment.extensions/myapp-deploy
REVISION CHANGE-CAUSE
1 <none>
2 <none>
[root@t34 deploy]# kubectl rollout undo deploy myapp-deploy --to-revision=1
deployment.extensions/myapp-deploy rolled back
[root@t34 deploy]# kubectl get pod -l app=myapp
NAME READY STATUS RESTARTS AGE
myapp-deploy-675558bfc5-nzn85 1/1 Running 0 18s
myapp-deploy-675558bfc5-tnhjd 1/1 Running 0 20s
[root@t34 deploy]# kubectl get deploy myapp-deploy
NAME READY UP-TO-DATE AVAILABLE AGE
myapp-deploy 2/2 2 2 29m
[root@t34 deploy]# kubectl rollout history deploy myapp-deploy
deployment.extensions/myapp-deploy
REVISION CHANGE-CAUSE
2 <none>
3 <none>
[root@t34 deploy]# for i in {1..10}; do curl 10.43.223.204:80; done
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
在以上操作的过程中,流量会随机在新旧版本中交替,但是没办法做到更加精准的控制,下面介绍k8s+isito的方式,来更加准确的控制访问流量的走向
3. k8s-isito实现金丝雀发布
isito的安装可以参考我之前的一片文章istio的安装,使用和诊断
3.1 yaml文件准备
- deploy-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy-v2
spec:
replicas: 2
selector:
matchLabels:
app: myapp
version: v2
template:
metadata:
labels:
app: myapp
version: v2
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v2
ports:
- name: http
containerPort: 80
- deploy-v3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy-v3
spec:
replicas: 2
selector:
matchLabels:
app: myapp
version: v3
template:
metadata:
labels:
app: myapp
version: v3
spec:
containers:
- name: myapp
image: ikubernetes/myapp:v3
ports:
- name: http
containerPort: 80
- service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: myapp
- gateway.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: kubeflow-gateway
namespace: kubeflow
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
- vs.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp-vs
spec:
hosts:
- myapp.com
gateways:
- kubeflow/kubeflow-gateway
http:
- route:
- destination:
host: myapp-svc.default.svc.cluster.local
subset: v2
weight: 90
- destination:
host: myapp-svc.default.svc.cluster.local
subset: v3
weight: 10
- dr.yaml(isito的资源)
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: myapp-dr
spec:
host: myapp-svc
subsets:
- name: v2
labels:
version: v2
- name: v3
labels:
version: v3
3.2 发布
[root@t34 canary]# kubectl apply -f ../canary/
deployment.apps/myapp-deploy-v2 created
deployment.apps/myapp-deploy-v3 created
destinationrule.networking.istio.io/myapp-dr created
service/myapp-svc created
virtualservice.networking.istio.io/myapp-vs created
3.3 测试
- 查看
[root@t34 canary]# kubectl get deploy | grep myapp
myapp-deploy-v2 2/2 2 2 119s
myapp-deploy-v3 2/2 2 2 119s
[root@t34 canary]# kubectl get pod -l app=myapp
NAME READY STATUS RESTARTS AGE
myapp-deploy-v2-7fbfdfbb7b-2rm6r 1/1 Running 0 2m9s
myapp-deploy-v2-7fbfdfbb7b-qpkwd 1/1 Running 0 2m9s
myapp-deploy-v3-6c67c5b878-m9mzm 1/1 Running 0 2m9s
myapp-deploy-v3-6c67c5b878-s2t58 1/1 Running 0 2m9s
[root@t34 canary]# kubectl get svc myapp-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp-svc ClusterIP 10.43.245.81 <none> 80/TCP 2m21s
[root@t34 canary]# kubectl get vs myapp-vs
NAME GATEWAYS HOSTS AGE
myapp-vs [kubeflow/kubeflow-gateway] [myapp.com] 2m46s
[root@t34 canary]# kubectl get dr
NAME HOST AGE
myapp-dr myapp-svc 2m49s
- 流量测试
[root@t34 canary]# kubectl get svc myapp-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp-svc ClusterIP 10.43.245.81 <none> 80/TCP 7m46s
[root@t34 canary]# for i in {1..10}; do curl 10.43.245.81:80; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
如果,按照上面deployment的方式进行测试的话,会发现v3的流量有可能更多(因为这是随机流向的),并不能达到我们设置90%的流量流向v2,10%的流量流量v3,因此,我们需要使用istio的ingress
[root@t34 canary]# kubectl get svc -n istio-system | grep ingress
istio-ingressgateway NodePort 10.43.176.205 <none> 15020:30541/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:32727/TCP,15030:32111/TCP,15031:32031/TCP,15032:31013/TCP,15443:30928/TCP 199d
istio中ingress的80端口对应的nodeport为31380
[root@t34 canary]# for i in {1..10}; do curl -H "Host: myapp.com" 192.168.4.34:31380; done
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
此时,大部分流量会流向v2,只有较少一部分会流向v3
如果更改v2和v3的比例为10%和90%,再进行测试如下:
[root@t34 canary]# for i in {1..10}; do curl -H "Host: myapp.com" 192.168.4.34:31380; done
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
Hello MyApp | Version: v3 | <a href="hostname.html">Pod Name</a>
此时,更多的流量流向了v3
istio还可以根据不同的用户群体进行流量导向,更多的内容可以查看isito官方文档
4. 结语
在微服务时代,不同的服务相互联系,关系错中复杂,部署升级一个服务,可能造成整个系统的瘫痪,因此,需要选择合适的部署方式,从而将风险降到最低。金丝雀发布(灰度发布)只是多种部署方式的一种,还有蓝绿部署、滚动部署(deployment默认的升级方式为滚动升级)等,可以根据不同的业务场景选择不同的发布形式。