基于etcd集群的服务发现功能尝试
为什么需要服务发现
在微服务架构中,服务实例的数量和位置可能会随时改变,比如扩容节点、更改节点。服务发现可以实时跟踪这些变化,让服务能够找到当前可用的其他服务实例。
减少硬编码地址的依赖,服务调用侧不持有服务的地址和端口,只通过服务注册中心获取地址和端口。
与节点监控结合后,请求不会发送到已故障或不可用的节点上,也可支持负载均衡。
etcd
etcd一款基于raft分布式一致性算法实现的分布式键值数据库。(据说源自MIT 6.824 Distributed System的课设)
由于本文作者没有集群服务器,因此使用docker compose将etcd部署在自己的Windows笔记本上。Docker compose.yml配置文件如下所示:
services:
node1:
image: quay.io/coreos/etcd:v3.5.9-amd64
volumes:
- type: bind
source: /d/Program Files/DockerImage/etcd/node1-data
target: /etcd-data
ports:
- "2379:2379"
- "2380:2380"
networks:
cluster_net:
ipv4_address: 172.16.238.100
environment:
- ETCDCTL_API=3
command:
- /usr/local/bin/etcd
- --data-dir=/etcd-data
- --name
- node1
- --initial-advertise-peer-urls
- http://172.16.238.100:2380
- --listen-peer-urls
- http://0.0.0.0:2380
- --advertise-client-urls
- http://172.16.238.100:2379
- --listen-client-urls
- http://0.0.0.0:2379
- --initial-cluster
- node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
- --initial-cluster-state
- new
- --initial-cluster-token
- docker-etcd
node2:
image: quay.io/coreos/etcd:v3.5.9-amd64
volumes:
- type: bind
source: /d/Program Files/DockerImage/etcd/node2-data
target: /etcd-data
networks:
cluster_net:
ipv4_address: 172.16.238.101
environment:
- ETCDCTL_API=3
ports:
- "2369:2379"
- "2370:2380"
command:
- /usr/local/bin/etcd
- --data-dir=/etcd-data
- --name
- node2
- --initial-advertise-peer-urls
- http://172.16.238.101:2380
- --listen-peer-urls
- http://0.0.0.0:2380
- --advertise-client-urls
- http://172.16.238.101:2379
- --listen-client-urls
- http://0.0.0.0:2379
- --initial-cluster
- node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
- --initial-cluster-state
- new
- --initial-cluster-token
- docker-etcd
node3:
image: quay.io/coreos/etcd:v3.5.9-amd64
volumes:
- type: bind
source: /d/Program Files/DockerImage/etcd/node3-data
target: /etcd-data
networks:
cluster_net:
ipv4_address: 172.16.238.102
environment:
- ETCDCTL_API=3
ports:
- "2359:2379"
- "2360:2380"
command:
- /usr/local/bin/etcd
- --data-dir=/etcd-data
- --name
- node3
- --initial-advertise-peer-urls
- http://172.16.238.102:2380
- --listen-peer-urls
- http://0.0.0.0:2380
- --advertise-client-urls
- http://172.16.238.102:2379
- --listen-client-urls
- http://0.0.0.0:2379
- --initial-cluster
- node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
- --initial-cluster-state
- new
- --initial-cluster-token
- docker-etcd
networks:
cluster_net:
driver: bridge
ipam:
driver: default
config:
-
subnet: 172.16.238.0/24
gateway: 172.16.238.1
启动集群:docker compose up -d(-d表示daemon)
关闭集群:docker compose down
在桌面端查看容器状态
使用etcd官方的v3版本的golang客户端API,在宿主机上测试集群是否联通:
func TestEtcdConnection(t *testing.T) {
cli, err := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379", "localhost:2369", "localhost:2359"},
DialTimeout: 5 * time.Second,
})
if err != nil {
log.Panicln(err)
}
defer cli.Close()
log.Println("putting")
ctx, _ := context.WithTimeout(context.Background(), time.Second*2)
_, err = cli.Put(ctx, "test", "test")
if err != nil {
log.Panic(err)
}
log.Println("getting")
ctx, _ = context.WithTimeout(context.Background(), time.Second*2)
resp, err := cli.Get(ctx, "test")
if err != nil {
log.Panic(err)
}
for _, ev := range resp.Kvs {
fmt.Println(string(ev.Key), string(ev.Value))
}
}
编写IDL
这里使用thrift协议,需要先安装一下thriftgo。
go install github.com/cloudwego/thriftgo@latest
测试一下
thriftgo --version
这里按照CloudWeGo的进阶教程,编写base.thrift、item.thrift和stock.thrift。
// base
namespace go example.base
struct BaseResp {
1: string code
2: string msg
}
// item
namespace go example.item
include "base.thrift"
struct Item {
1: i64 id
2: string title
3: string description
4: i64 stock
}
struct GetItemReq {
1: required i64 id
}
struct GetItemResp {
1: Item item
255: base.BaseResp baseResp
}
service ItemService{
GetItemResp GetItem(1: GetItemReq req)
}
// stock
namespace go example.stock
include "base.thrift"
struct GetItemStockReq {
1: required i64 item_id
}
struct GetItemStockResp {
1: i64 stock
255: base.BaseResp BaseResp
}
service StockService {
GetItemStockResp GetItemStock(1:GetItemStockReq req)
}
kitex生产代码
安装kitex
go install github.com/cloudwego/kitex/tool/cmd/kitex@latest
测试一下
kitex --version
然后使用kitex生成对应的RPC代码代码,
kitex -module example ./idl/item.thrift
kitex -module example ./idl/stock.thrift
代码会生成在kitex_gen/example中,再生产对应的Client和Server脚手架代码:
# 创建rpc/item后,在 rpc/item下
kitex -module exmaple -service example.item -use example/kitex_gen ../../idl/item.thrift
RPC业务逻辑在func (s *ItemServiceImpl) GetItem(ctx context.Context, req *item.GetItemReq) (resp *item.GetItemResp, err error)内,函数名与IDL文件item.thrift内的GetItemResp GetItem(1: GetItemReq req)对应。
stock与之类似。
注册服务
在NewServer函数中设置server.WithRegistry()选项即可。服务端侧的服务名(这里是example.item)要和客户端侧的服务名对上。
etcdRegistry, err := etcd.NewEtcdRegistry([]string{
"localhost:2379", "localhost:2369", "localhost:2359"})
if err != nil {
log.Fatal(err)
}
addr, _ := net.ResolveTCPAddr("tcp", "127.0.0.1:8888")
itemServiceImpl := new(ItemServiceImpl)
svr := item.NewServer(itemServiceImpl,
server.WithServiceAddr(addr),
server.WithRegistry(etcdRegistry),
server.WithServerBasicInfo(
&rpcinfo.EndpointBasicInfo{
ServiceName: "example.item", //
}),
)
在etcd节点上使用etcdctl查看结果
docker exec -it etcd-docker-compose-node1-1 etcdctl get --prefix "kitex"
# 结果
kitex/registry-etcd/example.item/127.0.0.1:8888
{"network":"tcp","address":"127.0.0.1:8888","weight":10,"tags":null}
期间的遇到一个问题:
将server.WithServiceAddr(addr)注释或者将addr设置为0.0.0.0:8888后,etcd查询的结果的address从127.0.0.1:8888变成了169.254.64.52:8888,然后使用客户端获取服务时,报错:[happened in biz handler, method=ItemService.GetItem, please check the panic at the server side] service discovery error: no instance remains for exmaple.item。
网上搜到是,当计算机试图通过DHCP从网络服务器获取IP地址但失败时,操作系统会自动生成一个169.254.x.x范围内的IP地址。此措施保证即便没有获得网络管理员分配的IP地址,计算机在本地网络仍保持通信能力。
随后将又在宿主机运行了etcd,重试后依然如此,不知为什么。若有大佬能解答,十分感谢。
客户端侧
r, err := etcd.NewEtcdResolver([]string{"localhost:2379", "localhost:2369", "localhost:2359"})
if err != nil {
log.Fatal(err)
}
c, err := itemservice.NewClient("example.item", client.WithResolver(r))
if err != nil {
log.Fatal(err)
}
req := item.NewGetItemReq()
// 设置request
// ...
resp, err := c.GetItem(context.Background(), req, callopt.WithRPCTimeout(3*time.Second))
// 处理response
// ...
stock与之类似。