golibs — Protocol & Registry 技术文档

20 阅读15分钟

本文档详细介绍 golibs 项目中 Makefile(Protobuf 编译)、protocol(gRPC 协议层)与 registry(etcd 服务注册与发现)三大模块的设计思想、架构关系与完整代码实现。


📑 目录


1. 整体架构概览

┌─────────────────────────────────────────────────────────────────────────┐
│                            golibs 项目                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌───────────┐    protoc 编译     ┌──────────────────────────────────┐  │
│  │ Makefile   │ ───────────────▶  │        protocol/                 │  │
│  └───────────┘                    │  ┌──────────┐ ┌──────────────┐  │  │
│                                   │  │  types/   │ │     ip/      │  │  │
│                                   │  │ .proto    │ │ .proto + Go  │  │  │
│                                   │  │ .pb.go    │ │ .pb.go       │  │  │
│                                   │  └──────────┘ │ _grpc.pb.go  │  │  │
│                                   │               │ client.go    │  │  │
│                                   │               └──────────────┘  │  │
│                                   │  ┌──────────────────────────┐   │  │
│                                   │  │     interceptor/         │   │  │
│                                   │  │  logger / metadata /     │   │  │
│                                   │  │  recovery                │   │  │
│                                   │  └──────────────────────────┘   │  │
│                                   └──────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                     registry/                                    │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐  │   │
│  │  │ Registry │  │ Watcher  │  │ Service  │  │   discover/    │  │   │
│  │  │ (etcd)   │  │ (watch)  │  │ (unmarshal)│ │   resolver     │  │   │
│  │  └──────────┘  └──────────┘  └──────────┘  └────────────────┘  │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│                         ┌───────────────┐                               │
│                         │   etcd 集群    │                               │
│                         └───────────────┘                               │
└─────────────────────────────────────────────────────────────────────────┘

模块职责:

模块职责
Makefile使用 protoc 编译 .proto 文件为 Go 代码
protocol/types定义跨服务通用的 Protobuf 类型(Error、Wrappers、Timestamp)
protocol/ipIP 定位 gRPC 服务定义、客户端实现
protocol/interceptorgRPC 一元拦截器(日志、metadata 传递、panic 恢复)
registry基于 etcd 的服务注册、注销、心跳保活、服务发现
registry/discover实现 gRPC resolver.Builder/resolver.Resolver,对接 etcd 做客户端负载均衡

2. Makefile — Protobuf 编译构建

Makefile 提供了三个 make 目标,用于一键编译 Protobuf 和整理依赖:

GOPATH:=$(shell go env GOPATH)
API_PROTO_FILES=$(shell find src -name *.proto)

.PHONY: types
types:
	@protoc --proto_path=. \
				--proto_path=./protocol/types \
				--go_out=paths=source_relative:. \
				--go-errors_out=paths=source_relative:. \
				./protocol/types/error.proto
	@protoc --proto_path=. \
				--proto_path=./protocol/types \
				--go_out=paths=source_relative:. \
				--go-errors_out=paths=source_relative:. \
				./protocol/types/wrappers.proto
	@protoc --proto_path=. \
				--proto_path=./protocol/types \
				--go_out=paths=source_relative:. \
				--go-errors_out=paths=source_relative:. \
				./protocol/types/timestamp.proto

.PHONY: ip
ip:
	@protoc --proto_path=. \
		   --go_out=paths=source_relative:. \
		   ./protocol/ip/ip_message.proto
	@protoc --proto_path=. \
		   --proto_path=./protocol/ip \
		   --go-grpc_out=. \
		   ./protocol/ip/ip_service.proto

.PHONY: tidy
tidy:
	@go mod tidy

编译流程图

    make types                              make ip
        │                                       │
        ▼                                       ▼
  ┌───────────────┐                    ┌─────────────────┐
  │ error.proto   │──▶ error.pb.go     │ ip_message.proto│──▶ ip_message.pb.go
  │ wrappers.proto│──▶ wrappers.pb.go  │ ip_service.proto│──▶ ip_service_grpc.pb.go
  │ timestamp.proto──▶ timestamp.pb.go └─────────────────┘
  └───────────────┘
        │                                       │
        │         protoc 插件                    │
        ├── --go_out          (生成消息体)       ├── --go_out      (生成消息体)
        └── --go-errors_out   (生成错误码)       └── --go-grpc_out (生成 gRPC 服务存根)

关键参数说明:

参数说明
--proto_path=.以项目根目录为 proto 搜索路径
--proto_path=./protocol/types支持 types 包内的相互引用
--go_out=paths=source_relative:.生成的 .pb.go.proto 同目录
--go-errors_out=paths=source_relative:.生成自定义错误码(types 专用)
--go-grpc_out=.生成 gRPC 服务端/客户端存根代码

3. protocol 模块

3.1 types — 公共 Protobuf 类型定义

protocol/types/ 下定义了三个通用的 .proto 文件,供所有 gRPC 服务共享使用。

3.1.1 Error(统一错误类型)

syntax = "proto3";

package types;
option go_package = "gitee.com/ha666/golibs/protocol/types";

message Error {
  //异常代码, 用来判断异常类型
  string code = 1;
  //异常详细信息
  string message = 2;
}

📌 设计理念: 将业务错误码(如 "USER_NOT_FOUND")和错误描述封装在 gRPC 响应体内,而非依赖 gRPC status code,使得客户端可以统一处理业务异常。

3.1.2 Wrappers(基本类型包装器)

提供对基本类型(doublefloatint64uint64int32uint32boolstringbytes)的包装消息,用于区分"字段未设置"和"字段为默认零值"的情况:

message Int64Value {
  int64 value = 1;
}

message StringValue {
  string value = 1;
}

message BoolValue {
  bool value = 1;
}
// ... 还有 DoubleValue、FloatValue、UInt64Value、Int32Value、UInt32Value、BytesValue

3.1.3 Timestamp(时间戳)

message Timestamp {
  // 自 Unix 纪元(1970-01-01T00:00:00Z)以来的秒数
  int64 seconds = 1;
  // 纳秒级精度的非负小数部分
  int32 nanos = 2;
}

类似 Google 的 google.protobuf.Timestamp,但放在项目自有的 types 包下,可以配合自定义的序列化逻辑使用。


3.2 ip — IP 定位服务(gRPC)

3.2.1 消息定义

syntax = "proto3";

package ip;
import "protocol/types/error.proto";
option go_package = "./protocol/ip";

message GetLocateByIPReq {
  string ip = 1;
}

message GetLocateByIPReply {
  types.Error error = 1;
  Locate locate = 2;
}

message Locate {
  string full_address = 1;
  string country = 2;
  string province = 3;
  string city = 4;
  string district = 5;
  string street = 6;
}

3.2.2 服务定义

syntax = "proto3";

package ip;
import "protocol/ip/ip_message.proto";
option go_package = "./protocol/ip";

service IPService {
  rpc GetLocateByIP(GetLocateByIPReq) returns (GetLocateByIPReply);
}

请求/响应关系图:

  客户端                                    服务端
    │                                         │
    │  GetLocateByIPReq { ip: "1.2.3.4" }     │
    │ ──────────────────────────────────────▶  │
    │                                         │
    │  GetLocateByIPReply {                   │
    │    error: nil,                          │
    │    locate: {                            │
    │      full_address: "中国上海市浦东新区",   │
    │      country: "中国",                   │
    │      province: "上海市",                │
    │      city: "上海市",                    │
    │      district: "浦东新区",              │
    │      street: "..."                     │
    │    }                                   │
    │  }                                     │
    │ ◀──────────────────────────────────────  │

3.2.3 客户端实现 (client.go)

提供两种连接方式:直连通过 etcd 服务发现

package ip

import (
	"fmt"
	"time"

	"gitee.com/ha666/golibs/protocol/interceptor"
	"gitee.com/ha666/golibs/registry/discover"
	clientv3 "go.etcd.io/etcd/client/v3"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/grpc/resolver"
)

// Client IP服务客户端
type Client struct {
	conn *grpc.ClientConn
	IPServiceClient
}

// NewClient 创建IP客户端(直连模式)
func NewClient(addr string) (*Client, error) {
	conn, err := grpc.NewClient(addr,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithChainUnaryInterceptor(
			interceptor.ClientLogger,
			interceptor.ClientMetadata,
		),
		grpc.WithDefaultCallOptions(
			grpc.MaxCallRecvMsgSize(1024*1024*1), // 1MB
			grpc.MaxCallSendMsgSize(1024*1024*1),
		),
	)
	if err != nil {
		return nil, fmt.Errorf("failed to connect: %w", err)
	}
	return &Client{
		conn:            conn,
		IPServiceClient: NewIPServiceClient(conn),
	}, nil
}

// NewClientWithEtcd 创建IP客户端(etcd 服务发现模式)
func NewClientWithEtcd() (*Client, error) {
	// 1. 创建 etcd 客户端
	etcdClient, err := clientv3.New(clientv3.Config{
		Endpoints:   []string{"localhost:2379"},
		DialTimeout: 1 * time.Second,
	})
	if err != nil {
		return nil, err
	}

	// 2. 注册自定义 resolver
	builder := discover.NewBuilder(etcdClient)
	resolver.Register(builder)

	// 3. 使用 etcd 方案连接 gRPC 服务
	// 目标字符串格式:etcd:///service-name
	conn, err := grpc.NewClient("etcd:///ip-service",
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithChainUnaryInterceptor(
			interceptor.ClientLogger,
			interceptor.ClientMetadata,
		),
		grpc.WithDefaultCallOptions(
			grpc.MaxCallRecvMsgSize(1024*1024*1), // 1MB
			grpc.MaxCallSendMsgSize(1024*1024*1),
		),
		grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
		grpc.WithResolvers(builder),
	)
	if err != nil {
		return nil, err
	}

	return &Client{
		conn:            conn,
		IPServiceClient: NewIPServiceClient(conn),
	}, nil
}

// Close 关闭连接
func (c *Client) Close() error {
	return c.conn.Close()
}

两种连接模式对比:

  ┌─────────────────────────────────────────────┐
  │            直连模式 (NewClient)               │
  │                                              │
  │  Client ──────────────────▶ Server           │
  │          grpc.NewClient("ip:port")           │
  └─────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │         etcd 服务发现模式 (NewClientWithEtcd)                │
  │                                                              │
  │  Client ──▶ etcd resolver ──▶ etcd ──▶ [Server1, Server2]   │
  │               │                                              │
  │               └── round_robin 负载均衡                        │
  │                                                              │
  │  目标地址:etcd:///ip-service                                 │
  └─────────────────────────────────────────────────────────────┘

3.3 interceptor — gRPC 拦截器

protocol/interceptor/ 包含三个拦截器,分别处理日志记录metadata 传递panic 恢复

3.3.1 Logger(日志拦截器)

分为服务端和客户端两个拦截器,记录请求的方法名、耗时、请求/响应内容。

package interceptor

import (
	"context"
	"time"

	"gitee.com/ha666/golibs"
	"gitee.com/ha666/golibs/logs"
	"google.golang.org/grpc"
	"google.golang.org/grpc/peer"
)

// ServerLogger 服务端日志拦截器
func ServerLogger(ctx context.Context, req any, info *grpc.UnaryServerInfo,
	handler grpc.UnaryHandler) (any, error) {
	var clientIPPort string
	if p, ok := peer.FromContext(ctx); ok {
		clientIPPort = p.Addr.String()
	} else {
		clientIPPort = "unknown"
	}

	start := time.Now()
	logs.Info(ctx, "[server] method=%s, client=%s, req=%+v",
		info.FullMethod, clientIPPort, req)
	reply, err := handler(ctx, req)
	consume := golibs.Since(start)
	if err != nil {
		logs.Error(ctx, "[server] method=%s, client=%s, req=%+v, consume:%dms, err:%+v",
			info.FullMethod, clientIPPort, req, consume, err)
	} else {
		logs.Info(ctx, "[server] method=%s, client=%s, req=%+v, consume:%dms, reply=%+v",
			info.FullMethod, clientIPPort, req, consume, reply)
	}
	return reply, err
}

// ClientLogger 客户端日志拦截器
func ClientLogger(ctx context.Context, method string, req, reply any,
	cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
	start := time.Now()
	var p peer.Peer
	logs.Info(ctx, "[client] method=%s, req=%+v", method, req)
	err := invoker(ctx, method, req, reply, cc, append(opts, grpc.Peer(&p))...)
	serverAddr := cc.Target()
	if p.Addr != nil {
		serverAddr = p.Addr.String()
	}
	consume := golibs.Since(start)
	if err != nil {
		logs.Error(ctx, "[client] method=%s, server=%s, req=%+v, consume:%dms, err:%+v",
			method, serverAddr, req, consume, err)
	} else {
		logs.Info(ctx, "[client] method=%s, server=%s, req=%+v, consume:%dms, reply=%+v",
			method, serverAddr, req, consume, reply)
	}
	return err
}

日志拦截器执行时序:

  客户端                        服务端
    │                              │
    │  ┌─ ClientLogger ─┐         │
    │  │ 记录 req        │         │
    │  │                 │         │
    │  │   invoker() ────────────▶ │  ┌─ ServerLogger ─┐
    │  │                 │         │  │ 记录 client IP  │
    │  │                 │         │  │ 记录 req        │
    │  │                 │         │  │                 │
    │  │                 │         │  │  handler()      │
    │  │                 │         │  │                 │
    │  │                 │         │  │ 记录 consume    │
    │  │   ◀──── reply ──────────  │  └─────────────────┘
    │  │ 记录 consume    │         │
    │  └─────────────────┘         │

3.3.2 Metadata(元数据传递拦截器)

实现上下文数据(如 trace_id)在 gRPC 客户端/服务端之间的透明传递。

package interceptor

import (
	"context"
	"fmt"

	"gitee.com/ha666/golibs"
	"google.golang.org/grpc"
	"google.golang.org/grpc/metadata"
)

// 存储所有需要在gRPC中传递的键
var grpcTransmitKeys = map[string]string{
	golibs.CtxTraceId: golibs.CtxTraceId,
}

// GetTransmitKeys 获取所有需要传递的键
func GetTransmitKeys() map[string]string {
	return grpcTransmitKeys
}

// ClientMetadata 客户端拦截器:将 ctx.Value 附加到 gRPC metadata
func ClientMetadata(ctx context.Context, method string, req, reply any,
	cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
	outgoingCtx := attachContextValuesToMetadata(ctx)
	return invoker(outgoingCtx, method, req, reply, cc, opts...)
}

// attachContextValuesToMetadata 将 ctx.Value 中的值附加到 metadata
func attachContextValuesToMetadata(ctx context.Context) context.Context {
	md, ok := metadata.FromOutgoingContext(ctx)
	if !ok {
		md = metadata.MD{}
	}
	for _, mdKey := range GetTransmitKeys() {
		if val := ctx.Value(mdKey); val != nil {
			var strValue string
			switch v := val.(type) {
			case string:
				strValue = v
			case fmt.Stringer:
				strValue = v.String()
			default:
				strValue = fmt.Sprintf("%v", v)
			}
			md.Set(mdKey, strValue)
		}
	}
	return metadata.NewOutgoingContext(ctx, md)
}

// ServerMetadata 服务端拦截器:从 gRPC metadata 提取值到 context
func ServerMetadata(ctx context.Context, req any, info *grpc.UnaryServerInfo,
	handler grpc.UnaryHandler) (any, error) {
	ctx = extractMetadataToContext(ctx)
	return handler(ctx, req)
}

// extractMetadataToContext 从 metadata 提取值到 context
func extractMetadataToContext(ctx context.Context) context.Context {
	md, ok := metadata.FromIncomingContext(ctx)
	if !ok {
		return ctx
	}
	mdToContextKey := make(map[string]struct{})
	for _, mdKey := range GetTransmitKeys() {
		mdToContextKey[mdKey] = struct{}{}
	}
	for mdKey, values := range md {
		if _, exists := mdToContextKey[mdKey]; exists && len(values) > 0 {
			ctx = context.WithValue(ctx, mdKey, values[0])
		}
	}
	return ctx
}

// GetStringValue 获取字符串值
func GetStringValue(ctx context.Context, key string) string {
	if value := ctx.Value(key); value != nil {
		if str, ok := value.(string); ok {
			return str
		}
	}
	return ""
}

Metadata 传递流程:

 ┌────────────── 客户端进程 ──────────────┐    ┌────────────── 服务端进程 ──────────────┐
 │                                         │    │                                         │
 │  ctx.Value("trace_id") = "abc-123"      │    │                                         │
 │           │                             │    │                                         │
 │           ▼                             │    │                                         │
 │  ClientMetadata 拦截器                   │    │                                         │
 │   ┌─────────────────────────────────┐   │    │                                         │
 │   │ 遍历 grpcTransmitKeys          │   │    │                                         │
 │   │ ctx.Value("trace_id") → "abc"   │   │    │                                         │
 │   │ md.Set("trace_id", "abc-123")   │   │    │                                         │
 │   └─────────────────────────────────┘   │    │                                         │
 │           │                             │    │                                         │
 │    gRPC metadata (HTTP/2 Headers)       │    │                                         │
 │    ═══════════════════════════════════▶  │    │  ServerMetadata 拦截器                   │
 │                                         │    │   ┌──────────────────────────────────┐  │
 │                                         │    │   │ md["trace_id"] → "abc-123"       │  │
 │                                         │    │   │ ctx = WithValue(ctx, "trace_id", │  │
 │                                         │    │   │                "abc-123")         │  │
 │                                         │    │   └──────────────────────────────────┘  │
 │                                         │    │           │                              │
 │                                         │    │           ▼                              │
 │                                         │    │   handler(ctx, req) — 业务代码可读到      │
 │                                         │    │   ctx.Value("trace_id") == "abc-123"     │
 └─────────────────────────────────────────┘    └─────────────────────────────────────────┘

3.3.3 Recovery(Panic 恢复拦截器)

防止服务端 handler 中的 panic 导致整个 gRPC 服务崩溃,将 panic 转化为 codes.Internal 错误并记录堆栈。

package interceptor

import (
	"context"
	"runtime/debug"

	"gitee.com/ha666/golibs/logs"
	"google.golang.org/grpc"
	"google.golang.org/grpc/codes"
	"google.golang.org/grpc/status"
)

// Recovery 用于gRPC服务端一元调用的Recovery拦截器
// 捕获handler中的panic,记录堆栈信息,并返回Internal错误
func Recovery(ctx context.Context, req any, info *grpc.UnaryServerInfo,
	handler grpc.UnaryHandler) (resp any, err error) {
	defer func() {
		if r := recover(); r != nil {
			stack := debug.Stack()
			logs.Error(ctx, "panic: %v,%s", r, stack)
			err = status.Errorf(codes.Internal, "internal server error: %v", r)
		}
	}()
	return handler(ctx, req)
}

Recovery 拦截器单元测试:

func TestRecovery_NoPanic(t *testing.T) {
	handler := func(ctx context.Context, req any) (any, error) {
		return "ok", nil
	}
	resp, err := Recovery(context.Background(), nil, &grpc.UnaryServerInfo{}, handler)
	if err != nil {
		t.Fatalf("expected no error, got: %v", err)
	}
	if resp != "ok" {
		t.Fatalf("expected resp 'ok', got: %v", resp)
	}
}

func TestRecovery_WithPanic(t *testing.T) {
	handler := func(ctx context.Context, req any) (any, error) {
		panic("something went wrong")
	}
	_, err := Recovery(context.Background(), nil, &grpc.UnaryServerInfo{}, handler)
	if err == nil {
		t.Fatal("expected error after panic, got nil")
	}
	st, ok := status.FromError(err)
	if !ok {
		t.Fatal("expected grpc status error")
	}
	if st.Code() != codes.Internal {
		t.Fatalf("expected codes.Internal, got: %v", st.Code())
	}
}

拦截器链装配示意

在 gRPC 客户端/服务端创建时,拦截器以 链式 方式注册:

┌─────────────────────── 客户端拦截器链 ─────────────────────┐
│                                                            │
│   请求 ──▶ ClientLogger ──▶ ClientMetadata ──▶ invoker()   │
│                                                            │
└────────────────────────────────────────────────────────────┘

┌─────────────────────── 服务端拦截器链 ─────────────────────┐
│                                                            │
│   请求 ──▶ Recovery ──▶ ServerLogger ──▶ ServerMetadata    │
│                                         ──▶ handler()     │
│                                                            │
└────────────────────────────────────────────────────────────┘

4. registry 模块

4.1 核心接口与数据结构

Watcher 接口

package registry

// Watcher is service watcher.
type Watcher interface {
	// Next returns services in the following two cases:
	// 1.the first time to watch and the service instance list is not empty.
	// 2.any service instance changes found.
	// if the above two conditions are not met, it will block until context deadline exceeded or canceled
	Next() ([]string, error)
	// Stop close the watcher.
	Stop() error
}

辅助函数 (service.go)

package registry

import "fmt"

func unmarshal(data []byte) (string, error) {
	if len(data) == 0 {
		return "", fmt.Errorf("not found data")
	}
	return string(data), nil
}

4.2 etcd 注册中心实现

registry/etcd.go 是整个服务注册与发现的核心,提供 注册注销查询心跳保活 能力。

etcd 键值结构

etcd key 格式:/{env}/{serviceName}/{endpoint}
etcd value:   endpoint 字符串(如 "127.0.0.1:1234")

示例:
  /local/helloworld/127.0.0.1:1234  →  "127.0.0.1:1234"
  /local/helloworld/127.0.0.1:5678  →  "127.0.0.1:5678"

完整代码

package registry

import (
	"context"
	"fmt"
	"math/rand/v2"
	"time"

	clientv3 "go.etcd.io/etcd/client/v3"
)

// Option is etcd registry option.
type Option func(o *options)

type options struct {
	ctx      context.Context
	env      string
	ttl      time.Duration
	interval time.Duration
	maxRetry int
}

// Context with registry context.
func Context(ctx context.Context) Option {
	return func(o *options) { o.ctx = ctx }
}

func WithEnv(env string) Option {
	return func(o *options) { o.env = env }
}

// WithTTL with register ttl.
func WithTTL(ttl time.Duration) Option {
	return func(o *options) { o.ttl = ttl }
}

func WithInterval(interval time.Duration) Option {
	return func(o *options) { o.interval = interval }
}

func MaxRetry(num int) Option {
	return func(o *options) { o.maxRetry = num }
}

// Registry is etcd registry.
type Registry struct {
	name   string
	opts   *options
	client *clientv3.Client
	kv     clientv3.KV
	lease  clientv3.Lease
	ctxMap map[string]context.CancelFunc
}

// New creates etcd registry
func New(client *clientv3.Client, name string, opts ...Option) (r *Registry) {
	if name == "" {
		panic("缺少name参数")
	}
	op := &options{
		ctx:      context.Background(),
		env:      "abc",
		ttl:      time.Second * 15,
		interval: time.Second * 5,
		maxRetry: 5,
	}
	for _, o := range opts {
		o(op)
	}
	return &Registry{
		name:   name,
		opts:   op,
		client: client,
		kv:     clientv3.NewKV(client),
		ctxMap: make(map[string]context.CancelFunc),
	}
}

// Register the registration.
func (r *Registry) Register(ctx context.Context, endpoint string) error {
	key := r.getNodeKey(ctx, endpoint)
	if r.lease != nil {
		r.lease.Close()
	}
	r.lease = clientv3.NewLease(r.client)
	leaseID, err := r.registerWithKV(ctx, key, endpoint)
	if err != nil {
		return err
	}

	hctx, cancel := context.WithCancel(r.opts.ctx)
	r.ctxMap[endpoint] = cancel
	go r.heartBeat(hctx, leaseID, key, endpoint)
	return nil
}

// Deregister the registration.
func (r *Registry) Deregister(ctx context.Context, endpoint string) error {
	defer func() {
		if r.lease != nil {
			r.lease.Close()
		}
	}()
	if cancel, ok := r.ctxMap[endpoint]; ok {
		cancel()
		delete(r.ctxMap, endpoint)
	}
	key := r.getNodeKey(ctx, endpoint)
	_, err := r.client.Delete(ctx, key)
	return err
}

// GetService return the service instances in memory according to the service name.
func (r *Registry) GetService(ctx context.Context) ([]string, error) {
	key := r.getServiceKey(ctx)
	resp, err := r.kv.Get(ctx, key, clientv3.WithPrefix())
	if err != nil {
		return nil, err
	}
	items := make([]string, 0, len(resp.Kvs))
	for _, kv := range resp.Kvs {
		si, err := unmarshal(kv.Value)
		if err != nil {
			return nil, err
		}
		if si == "" {
			continue
		}
		items = append(items, si)
	}
	return items, nil
}

// Watch creates a watcher according to the service name.
func (r *Registry) Watch(ctx context.Context) (Watcher, error) {
	key := r.getServiceKey(ctx)
	return newWatcher(ctx, key, r.name, r.client)
}

// registerWithKV create a new lease, return current leaseID
func (r *Registry) registerWithKV(ctx context.Context, key string,
	value string) (clientv3.LeaseID, error) {
	grant, err := r.lease.Grant(ctx, int64(r.opts.ttl.Seconds()))
	if err != nil {
		return 0, err
	}
	_, err = r.client.Put(ctx, key, value, clientv3.WithLease(grant.ID))
	if err != nil {
		return 0, err
	}
	return grant.ID, nil
}

func (r *Registry) heartBeat(ctx context.Context, leaseID clientv3.LeaseID,
	key string, value string) {
	curLeaseID := leaseID
	kac, err := r.client.KeepAlive(ctx, leaseID)
	if err != nil {
		curLeaseID = 0
	}

	for {
		if curLeaseID == 0 {
			var retreat []int
			for retryCnt := 0; retryCnt < r.opts.maxRetry; retryCnt++ {
				if ctx.Err() != nil {
					return
				}
				idChan := make(chan clientv3.LeaseID, 1)
				errChan := make(chan error, 1)
				cancelCtx, cancel := context.WithCancel(ctx)
				go func() {
					defer cancel()
					id, registerErr := r.registerWithKV(cancelCtx, key, value)
					if registerErr != nil {
						errChan <- registerErr
					} else {
						idChan <- id
					}
				}()

				select {
				case <-time.After(3 * time.Second):
					cancel()
					continue
				case <-errChan:
					continue
				case curLeaseID = <-idChan:
				}

				kac, err = r.client.KeepAlive(ctx, curLeaseID)
				if err == nil {
					break
				}
				retreat = append(retreat, 1<<retryCnt)
				time.Sleep(time.Duration(retreat[rand.IntN(len(retreat))]) * time.Second)
			}
			if _, ok := <-kac; !ok {
				return
			}
		}

		select {
		case _, ok := <-kac:
			if !ok {
				if ctx.Err() != nil {
					return
				}
				curLeaseID = 0
				continue
			}
		case <-r.opts.ctx.Done():
			return
		}
	}
}

func (r *Registry) getNodeKey(ctx context.Context, endpoint string) string {
	return fmt.Sprintf("/%s/%s/%s", r.opts.env, r.name, endpoint)
}

func (r *Registry) getServiceKey(ctx context.Context) string {
	return fmt.Sprintf("/%s/%s", r.opts.env, r.name)
}

注册与心跳保活流程

   Registry.Register()
       │
       ▼
  ┌─────────────────────┐
  │ 1. 创建 Lease        │
  │ 2. Grant(TTL=15s)   │
  │ 3. Put(key, value,  │
  │       WithLease)    │
  └────────┬────────────┘
           │
           ▼
  ┌─────────────────────────────────────────────────────┐
  │           heartBeat goroutine(后台运行)              │
  │                                                      │
  │   KeepAlive(leaseID) → 持续续约                       │
  │       │                                              │
  │       ├── 续约成功 → 继续等待下一次 KeepAlive 响应       │
  │       │                                              │
  │       └── 续约失败(通道关闭) → 重新注册                  │
  │               │                                      │
  │               ├── registerWithKV (最多 maxRetry 次)   │
  │               │       ├── 3s 超时控制                  │
  │               │       └── 指数退避重试                  │
  │               │                                      │
  │               └── 全部重试失败 → goroutine 退出         │
  └─────────────────────────────────────────────────────┘

配置选项说明:

Option默认值说明
WithEnv(env)"abc"环境标识,用于 key 前缀隔离(如 local/dev/prod
WithTTL(ttl)15setcd Lease 生存时间
WithInterval(interval)5s预留心跳间隔(当前通过 KeepAlive 自动续约)
MaxRetry(num)5心跳断开后最大重试次数
Context(ctx)context.Background()全局生命周期上下文

4.3 Watcher 服务监听

registry/watcher.go 实现了 Watcher 接口,基于 etcd Watch 机制实时感知服务实例变化。

package registry

import (
	"context"
	"time"

	clientv3 "go.etcd.io/etcd/client/v3"
)

var _ Watcher = (*watcher)(nil)

type watcher struct {
	key         string
	ctx         context.Context
	cancel      context.CancelFunc
	client      *clientv3.Client
	watchChan   clientv3.WatchChan
	watcher     clientv3.Watcher
	kv          clientv3.KV
	first       bool
	serviceName string
}

func newWatcher(ctx context.Context, key, name string,
	client *clientv3.Client) (*watcher, error) {
	w := &watcher{
		key:         key,
		client:      client,
		watcher:     clientv3.NewWatcher(client),
		kv:          clientv3.NewKV(client),
		first:       true,
		serviceName: name,
	}
	w.ctx, w.cancel = context.WithCancel(ctx)
	w.watchChan = w.watcher.Watch(w.ctx, key,
		clientv3.WithPrefix(), clientv3.WithRev(0), clientv3.WithKeysOnly())
	err := w.watcher.RequestProgress(w.ctx)
	if err != nil {
		return nil, err
	}
	return w, nil
}

func (w *watcher) Next() ([]string, error) {
	if w.first {
		item, err := w.getInstance()
		w.first = false
		return item, err
	}

	select {
	case <-w.ctx.Done():
		return nil, w.ctx.Err()
	case watchResp, ok := <-w.watchChan:
		if !ok || watchResp.Err() != nil {
			time.Sleep(time.Second)
			err := w.reWatch()
			if err != nil {
				return nil, err
			}
		}
		return w.getInstance()
	}
}

func (w *watcher) Stop() error {
	w.cancel()
	return w.watcher.Close()
}

func (w *watcher) getInstance() ([]string, error) {
	resp, err := w.kv.Get(w.ctx, w.key, clientv3.WithPrefix())
	if err != nil {
		return nil, err
	}
	items := make([]string, 0, len(resp.Kvs))
	for _, kv := range resp.Kvs {
		si, err := unmarshal(kv.Value)
		if err != nil {
			return nil, err
		}
		if si == "" {
			continue
		}
		items = append(items, si)
	}
	return items, nil
}

func (w *watcher) reWatch() error {
	w.watcher.Close()
	w.watcher = clientv3.NewWatcher(w.client)
	w.watchChan = w.watcher.Watch(w.ctx, w.key,
		clientv3.WithPrefix(), clientv3.WithRev(0), clientv3.WithKeysOnly())
	return w.watcher.RequestProgress(w.ctx)
}

Watcher 工作流程:

  w.Next()
     │
     ├── 首次调用?
     │      │
     │      YES ──▶ getInstance() ──▶ 返回当前所有服务实例
     │      │
     │      NO ──▶ 阻塞等待 watchChan
     │                  │
     │                  ├── 收到事件 ──▶ getInstance() ──▶ 返回最新实例列表
     │                  │
     │                  ├── 通道关闭/错误 ──▶ reWatch() ──▶ 重建 Watch
     │                  │
     │                  └── ctx.Done() ──▶ 返回错误
     │
  w.Stop()
     │
     └── cancel() + watcher.Close()

4.4 discover — gRPC Resolver

registry/discover/resolver.go 实现了 gRPC 的 resolver.Builderresolver.Resolver 接口,让 gRPC 客户端可以通过 etcd:///service-name 格式的地址自动发现服务。

package discover

import (
	"context"
	"fmt"
	"strings"
	"sync"
	"time"

	"gitee.com/ha666/golibs"
	"gitee.com/ha666/golibs/logs"
	"go.etcd.io/etcd/api/v3/mvccpb"
	clientv3 "go.etcd.io/etcd/client/v3"
	"google.golang.org/grpc/resolver"
)

// etcdBuilder 实现了 resolver.Builder
type etcdBuilder struct {
	client     *clientv3.Client
	serviceTTL int64
}

// NewBuilder 创建一个 etcd resolver builder
func NewBuilder(client *clientv3.Client) resolver.Builder {
	return &etcdBuilder{client: client}
}

// Build 为给定目标创建新的 resolver
func (b *etcdBuilder) Build(target resolver.Target, cc resolver.ClientConn,
	opts resolver.BuildOptions) (resolver.Resolver, error) {
	serviceName := strings.TrimPrefix(target.URL.Path, "/")
	if serviceName == "" {
		return nil, fmt.Errorf("etcd resolver: missing service name in target URL")
	}

	ctx, cancel := context.WithCancel(context.Background())
	r := &etcdResolver{
		client:      b.client,
		serviceName: serviceName,
		ctx:         ctx,
		cancel:      cancel,
		cc:          cc,
		addrs:       make(map[string]bool),
		rn:          make(chan struct{}, 1),
	}

	go r.watchService()

	return r, nil
}

// Scheme 返回此 resolver 的 scheme
func (b *etcdBuilder) Scheme() string {
	return "etcd"
}

// etcdResolver 实现了 resolver.Resolver
type etcdResolver struct {
	client      *clientv3.Client
	serviceName string
	ctx         context.Context
	cancel      context.CancelFunc
	cc          resolver.ClientConn
	addrs       map[string]bool
	mu          sync.Mutex
	rn          chan struct{}
	env         string
}

// ResolveNow 被 gRPC 调用,提示 resolver 可以重新解析
func (r *etcdResolver) ResolveNow(o resolver.ResolveNowOptions) {
	select {
	case r.rn <- struct{}{}:
	default:
	}
}

// Close 关闭 resolver
func (r *etcdResolver) Close() {
	r.cancel()
}

// watchService 监听 etcd 中服务地址的变化
func (r *etcdResolver) watchService() {
	if err := r.sync(); err != nil {
		logs.Error(nil, "etcd resolver: initial sync failed: %v", err)
	}
	keyPrefix := fmt.Sprintf("/%s/%s/", golibs.Env, r.serviceName)
	r.watchWithRetry(keyPrefix)
}

// sync 从 etcd 获取当前所有服务地址,并更新到 gRPC
func (r *etcdResolver) sync() error {
	ctx, cancel := context.WithTimeout(r.ctx, 5*time.Second)
	defer cancel()

	keyPrefix := fmt.Sprintf("/%s/%s/", golibs.Env, r.serviceName)
	resp, err := r.client.Get(ctx, keyPrefix, clientv3.WithPrefix())
	if err != nil {
		return err
	}

	newAddrs := make(map[string]bool)
	for _, kv := range resp.Kvs {
		addr := string(kv.Value)
		if addr != "" {
			newAddrs[addr] = true
		}
	}

	return r.updateState(newAddrs)
}

// updateState 将地址集合转换为 resolver.State 并更新
func (r *etcdResolver) updateState(newAddrs map[string]bool) error {
	r.mu.Lock()
	defer r.mu.Unlock()

	if mapsEqual(r.addrs, newAddrs) {
		return nil
	}

	var addresses []resolver.Address
	for addr := range newAddrs {
		addresses = append(addresses, resolver.Address{Addr: addr})
	}

	state := resolver.State{Addresses: addresses}
	if err := r.cc.UpdateState(state); err != nil {
		return err
	}

	r.addrs = newAddrs
	logs.Info(nil, "etcd resolver: updated addresses for %s: %v",
		r.serviceName, addresses)
	return nil
}

// watchWithRetry 启动 watch,并在出错时自动重试(指数退避)
func (r *etcdResolver) watchWithRetry(keyPrefix string) {
	retryDelay := time.Second
	maxRetryDelay := 30 * time.Second

	for {
		select {
		case <-r.ctx.Done():
			return
		default:
		}

		watchChan := r.client.Watch(r.ctx, keyPrefix, clientv3.WithPrefix())
		if err := r.handleWatch(watchChan); err != nil {
			logs.Warn(nil, "etcd resolver: watch error: %v, retrying in %v",
				err, retryDelay)
			select {
			case <-r.ctx.Done():
				return
			case <-time.After(retryDelay):
			}
			retryDelay *= 2
			if retryDelay > maxRetryDelay {
				retryDelay = maxRetryDelay
			}
		} else {
			retryDelay = time.Second
		}
	}
}

// handleWatch 处理 watch 事件
func (r *etcdResolver) handleWatch(watchChan clientv3.WatchChan) error {
	for {
		select {
		case <-r.ctx.Done():
			return nil
		case wresp, ok := <-watchChan:
			if !ok {
				return fmt.Errorf("watch channel closed")
			}
			if wresp.Err() != nil {
				return wresp.Err()
			}
			if err := r.processWatchResponse(wresp); err != nil {
				logs.Error(nil,
					"etcd resolver: process watch response error: %v", err)
			}
		}
	}
}

// processWatchResponse 处理单个 watch 响应,更新地址
func (r *etcdResolver) processWatchResponse(wresp clientv3.WatchResponse) error {
	needFullSync := false
	for _, ev := range wresp.Events {
		if ev.Type == mvccpb.DELETE {
			needFullSync = true
			break
		}
	}

	// DELETE 事件无法获取 value,采用全量同步
	if needFullSync {
		if err := r.sync(); err != nil {
			logs.Error(nil,
				"etcd resolver: full sync after delete failed: %v", err)
		}
		return nil
	}

	// PUT 事件:增量更新
	r.mu.Lock()
	defer r.mu.Unlock()

	newAddrs := make(map[string]bool)
	for k := range r.addrs {
		newAddrs[k] = true
	}

	for _, ev := range wresp.Events {
		if ev.Type == mvccpb.PUT {
			addr := string(ev.Kv.Value)
			if addr != "" {
				newAddrs[addr] = true
			}
		}
	}

	return r.updateStateNoLock(newAddrs)
}

// updateStateNoLock 内部使用,调用前需要持有锁
func (r *etcdResolver) updateStateNoLock(newAddrs map[string]bool) error {
	if mapsEqual(r.addrs, newAddrs) {
		return nil
	}

	var addresses []resolver.Address
	for addr := range newAddrs {
		addresses = append(addresses, resolver.Address{Addr: addr})
	}

	state := resolver.State{Addresses: addresses}
	if err := r.cc.UpdateState(state); err != nil {
		return err
	}

	r.addrs = newAddrs
	logs.Info(nil, "etcd resolver: updated addresses for %s: %v",
		r.serviceName, addresses)
	return nil
}

// mapsEqual 比较两个 map 是否相同
func mapsEqual(a, b map[string]bool) bool {
	if len(a) != len(b) {
		return false
	}
	for k := range a {
		if !b[k] {
			return false
		}
	}
	return true
}

gRPC Resolver 解析流程:

  gRPC Client 拨号 "etcd:///ip-service"
         │
         ▼
  ┌─────────────────────────────────┐
  │ etcdBuilder.Build()             │
  │   serviceName = "ip-service"    │
  │   启动 watchService goroutine   │
  └────────────────┬────────────────┘
                   │
                   ▼
  ┌─────────────────────────────────────────────────────┐
  │              watchService()                          │
  │                                                      │
  │  1. sync() — 全量拉取                                 │
  │     GET /{env}/ip-service/ (prefix)                  │
  │     ┌──────────────────────────────────┐             │
  │     │ kv1: 192.168.1.10:8080           │             │
  │     │ kv2: 192.168.1.11:8080           │             │
  │     └──────────────────────────────────┘             │
  │     updateState → cc.UpdateState([addr1, addr2])     │
  │                                                      │
  │  2. watchWithRetry(keyPrefix) — 增量监听              │
  │     ┌──────────────────────────────────┐             │
  │     │ PUT  → 新增地址(增量更新)         │             │
  │     │ DELETE → 触发全量 sync()           │             │
  │     └──────────────────────────────────┘             │
  │                                                      │
  │  3. 出错时指数退避重试 (1s → 2s → 4s → ... → 30s)     │
  └─────────────────────────────────────────────────────┘
         │
         ▼
  gRPC ClientConn 得到最新服务地址列表
  配合 round_robin 策略实现负载均衡

5. 端到端调用流程

通过 etcd 服务发现调用 IP 定位服务 为例,展示完整的调用链路:

 ┌─────────────── 调用方 ──────────────────────────────────────────────────┐
 │                                                                         │
 │  1. ctx = context.WithValue(ctx, "trace_id", "test-123456")            │
 │                                                                         │
 │  2. client, _ := ip.NewClientWithEtcd()                                │
 │     ├── 创建 etcd 客户端 → 连接 etcd://localhost:2379                    │
 │     ├── 注册 etcdBuilder(scheme="etcd")                                 │
 │     ├── grpc.NewClient("etcd:///ip-service")                           │
 │     │     ├── etcdBuilder.Build() → etcdResolver                       │
 │     │     │     └── sync() → GET /prod/ip-service/ (prefix)            │
 │     │     │         └── 得到 [192.168.1.10:8080, 192.168.1.11:8080]    │
 │     │     ├── round_robin 负载均衡                                      │
 │     │     └── 拦截器链: [ClientLogger, ClientMetadata]                  │
 │     └── 返回 Client 对象                                                │
 │                                                                         │
 │  3. reply, _ := client.GetLocateByIP(ctx, &GetLocateByIPReq{           │
 │         Ip: "223.161.208.123",                                          │
 │     })                                                                  │
 │     ├── ClientLogger: 记录 [client] method=..., req=...                │
 │     ├── ClientMetadata: md.Set("trace_id", "test-123456")              │
 │     ├── invoker() → gRPC 调用 (round_robin 选择后端)                     │
 │     │                                                                   │
 │     │   ┌───────── 服务端 ──────────────────────────────────┐           │
 │     │   │ Recovery: defer recover()                         │           │
 │     │   │ ServerLogger: 记录 client IP、req                 │           │
 │     │   │ ServerMetadata:                                   │           │
 │     │   │   md["trace_id"] → ctx = WithValue("trace_id",   │           │
 │     │   │                     "test-123456")                │           │
 │     │   │ handler(ctx, req) → 业务逻辑处理                   │           │
 │     │   │   └── 查询 IP → 返回 Locate{...}                  │           │
 │     │   │ ServerLogger: 记录 consume, reply                 │           │
 │     │   └───────────────────────────────────────────────────┘           │
 │     │                                                                   │
 │     ├── ClientLogger: 记录 consume, reply                              │
 │     └── 返回 reply                                                      │
 │                                                                         │
 │  4. reply.GetLocate().GetCity() → "上海市"                              │
 │                                                                         │
 │  5. client.Close()                                                      │
 └─────────────────────────────────────────────────────────────────────────┘

6. 快速上手

6.1 编译 Protobuf

# 编译公共类型
make types

# 编译 IP 服务
make ip

# 整理依赖
make tidy

6.2 服务注册

import (
    "context"
    "time"
    clientv3 "go.etcd.io/etcd/client/v3"
    "gitee.com/ha666/golibs/registry"
)

// 创建 etcd 客户端
client, _ := clientv3.New(clientv3.Config{
    Endpoints:   []string{"127.0.0.1:2379"},
    DialTimeout: time.Second,
})

// 创建注册中心
r := registry.New(client, "ip-service",
    registry.WithEnv("prod"),
    registry.WithTTL(15*time.Second),
    registry.MaxRetry(5),
)

// 注册服务
_ = r.Register(context.Background(), "192.168.1.10:8080")

// 程序退出时注销
defer r.Deregister(context.Background(), "192.168.1.10:8080")

6.3 服务发现

// 方式一:直接查询
services, _ := r.GetService(context.Background())
for _, svc := range services {
    fmt.Println("endpoint:", svc)
}

// 方式二:Watch 持续监听
w, _ := r.Watch(context.Background())
defer w.Stop()
for {
    services, err := w.Next() // 阻塞直到有变化
    if err != nil {
        break
    }
    fmt.Println("当前服务列表:", services)
}

6.4 gRPC 客户端调用

import "gitee.com/ha666/golibs/protocol/ip"

// 方式一:直连
client, _ := ip.NewClient("127.0.0.1:9123")

// 方式二:通过 etcd 自动发现 + 负载均衡
client, _ := ip.NewClientWithEtcd()

ctx := context.WithValue(context.Background(), "trace_id", "req-001")
reply, err := client.GetLocateByIP(ctx, &ip.GetLocateByIPReq{
    Ip: "223.161.208.123",
})
if err != nil {
    log.Fatal(err)
}
if reply.GetError() != nil {
    log.Fatalf("业务错误: %s - %s", reply.Error.Code, reply.Error.Message)
}
fmt.Printf("定位结果: %s\n", reply.GetLocate().GetFullAddress())

defer client.Close()

📝 文档版本: v1.0 | Go 版本: 1.25.0 | 模块路径: gitee.com/ha666/golibs