背景 最近做了几个需求都用了redis stream用作消息队列,感觉redis stream相当大轻量化,易于上手,且功能强大,为此特意实现了了一个极简但实用的 redis stream 的示例
redis stream 的三个概念 stream, consumer group , consumer
要想学会如何使用 redis stream, 最重要的就是理解 stream, consumer group , consumer 三者的关系。 简单来说:
-
stream 为消息流, 类似于传送带负责传送商品
-
consumer group 为获取消息流的一个团队
-
consumer 为 consumer group 下的员工
一个 stream 下面可以包含多个 consumer group, 每个 consumer group下面可以包含多个 consumer .
我们在使用redis stream 之前需要提前使用XGroupCreateMkStream传入 stream名称 和 consumer group名称用来创建一个 stream 以及其对应的消费者组, 只有被注册的消费者组里面的消费者才能获取消息。
Pending Entries List (PEL) 与 二阶段提交
redis stream 也是支持二阶段提交与手动ack的, 这依赖于Pending Entries List (PEL) 。
使用 XReadGroup 传入 stream 名称,consumer group名称 和 consumer 名称 就可以获取到 消息, 被获取到的消息会进入 Pending Entries Lis (pel)里面, 这个pel相当于一个临时待确定的集合, 记录被消费者领取但是没有消费的消息,等消费者把消息 消费完成了还需要手动 ack 这条消息。 可以将pel类比成 git 的暂存区,需要被commit才算正式确认修改。
在pel的消息可以被同一个 consumer group的consumer获取,并且在一定时间内无法被其他consumer获取,这个时间叫做 MinIdle,MinIdle 保障了在这个时间范围内只能由一个消费者去二次消费消息。 使用 XAutoClaim 传入 stream 名称,consumer group名称, consumer 名称 和 MinIdle 即可从 pel 中领取近 MinIdle 时间范围内没有被领取的消息, 消息被领取后的 MinIdle 时间范围内无法被其他consumer领取
k8s 中 如何分配 consumer group 和 consumer ?
在实际开发中,一般一个类型的任务(比如消息通知系统)会用到一个 stream 和一个 consumer group. 虽然 stream 和 consumer group 是一对多的关系,但为了方便开发一般习惯于两者使用一对一的关系。 这点和工厂流水线同理,一条流水线最好还是专门传送一种商品。而consumer group 和 consumer的关系通常是一对多的,也就是真正需要干活的工作由多个 consumer 去完成。
如果一个k8s实例对应一个consumer(实际上也应该这么干,一台实例一个consumer, 一个consumer一次性可以领取多条消息),那么consumer name 就可以直接是当前机器的hostname, 如果一台只能同时消费多个消息,那么consumer内就可以直接是随机数。
考虑到k8s实例经常需要重启更新 consumer name 会越来越多,所以需要定期的去删除旧实例中的consuemr , 可以通过 XInfoConsumers 传入 stream name 和 group name 获取到当前consumer group 下的全部consumer.然后在通过每一个 consumer的 pending =0 和 Idle 大于某个值进行判断是否可以删除。
pending =0表示这个consumer没有被分片任何消息,idle会在每一次consumer与redis stream交互如 XReadGroup , XAutoClaim 被更新,大于某个值说明这个consumer很久没有任何动作了可以被判定为死掉
极简代码示例
目录结构
redis_stream_demo
- internal
internal.go 操作 redis stream的相关函数
- test
push_task_test.go 模拟生产者往 redis stream 发消息
main.go 服务端消费者循环消费任务, 定期清理死掉的consumer
go.mod
go.sum
先来看单元测试文件: 内容很简单,模拟生产者往里面塞消息,再确认有没有塞进去
package test
import (
"context"
"github.com/redis/go-redis/v9"
"redis_stream_demo/internal"
"testing"
)
func TestPushTask(t *testing.T) {
ctx := context.Background()
client := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
})
// 模拟生产者 向流中添加多条消息
for i := 0; i < 50; i++ {
_, err := client.XAdd(ctx, &redis.XAddArgs{
Stream: internal.SimpleStreamName,
Values: map[string]interface{}{
"Msg": i,
},
}).Result()
if err != nil {
t.Fatal(err)
}
}
// 查看是否将50条消息插入队列
queueSize, err := client.XLen(ctx, internal.SimpleStreamName).Result()
if err != nil {
t.Fatal(err)
}
if queueSize != 50 {
t.Fatal("queueSize != 50 ")
}
t.Log("add 50 tasks ok")
}
再来看 main 函数 main 函数也很简单,创建了一个stream 和对应的 consumer group, 启动两个 consumer , 然后两个consumer 不断消费消息,同时有另一个go routine监控过期的 consumer 然后给予清理, 当所有消息被消费后退出程序
package main
import (
"context"
"fmt"
"github.com/redis/go-redis/v9"
"math/rand"
"redis_stream_demo/internal"
"time"
)
func main() {
ctx := context.Background()
client := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
})
// 创建消费者组
err := internal.RegisterConsumerGroup(client)
if err != nil {
panic(err)
}
// 启动2个消费者
consumer1 := fmt.Sprintf("consumer:%d", rand.Int63())
consumer2 := fmt.Sprintf("consumer:%d", rand.Int63())
fmt.Printf("[%s]\n", consumer1)
fmt.Printf("[%s]\n", consumer2)
go internal.LoopConsume(client, consumer1)
go internal.LoopConsume(client, consumer2)
// 启动定时清除旧的consumer
go internal.LoopDeleteDeadConsumer(client)
for {
if internal.IsAllDone(client, ctx) {
fmt.Println("all task done")
break
}
time.Sleep(time.Second)
}
}
仔细看看internal.go 内部如何操作 redis stream
package internal
import (
"context"
"errors"
"fmt"
"github.com/redis/go-redis/v9"
"math/rand"
"time"
)
const (
SimpleStreamName = "simple-stream"
SimpleGroupName = "simple-group"
taskIdleTimeout = 5 * time.Second // 5 秒后进入 pel
consumerIdleTimeout = time.Minute // consumer 超过1分钟没有活动则认为死亡
)
type TaskItem struct {
MsgId string
Msg string
}
// 创建消费者组,如果已经创建了不重复创建
func RegisterConsumerGroup(client *redis.Client) error {
groupInfo, err := client.XInfoGroups(context.Background(), SimpleStreamName).Result()
if err != nil && err.Error() != "ERR no such key" {
return err
}
for _, group := range groupInfo {
if group.Name == SimpleGroupName {
return nil
}
}
return client.XGroupCreateMkStream(context.Background(), SimpleStreamName, SimpleGroupName, "0").Err()
}
func LoopConsume(client *redis.Client, consumerName string) {
for {
// 优先处理pel中的任务
task, err := pullPelTask(client, consumerName)
if err != nil {
fmt.Println(err)
time.Sleep(time.Second)
continue
}
if task != nil {
if err := processMessage(task); err != nil {
// 失败则执行下一条任务
fmt.Printf("[%s] processed pel task failed, messageid = %s\n", consumerName, task.MsgId)
continue
}
// 成功则 XACK
err = ackSuccessTask(context.Background(), client, SimpleStreamName, SimpleGroupName, task.MsgId)
if err != nil {
// 失败则执行下一条任务
fmt.Printf("[%s] acked pel task failed, messageid = %s\n", consumerName, task.MsgId)
continue
}
fmt.Printf("[%s] acked pel task ok, messageid = %s\n", consumerName, task.MsgId)
continue // 优先处理 pel 中积压的消息
}
// 拉取 1 条消息
task, err = pullTask(context.Background(), client, consumerName)
if err != nil {
fmt.Println(err)
time.Sleep(time.Second)
continue
}
if task != nil {
if err := processMessage(task); err != nil {
// 失败则执行下一条任务
fmt.Printf("[%s] processed task failed, messageid = %s\n", consumerName, task.MsgId)
continue
}
// 成功则 XACK
err = ackSuccessTask(context.Background(), client, SimpleStreamName, SimpleGroupName, task.MsgId)
if err != nil {
// 失败则执行下一条任务
fmt.Printf("[%s] acked task failed, messageid = %s\n", consumerName, task.MsgId)
continue
}
fmt.Printf("[%s] acked task ok, messageid = %s\n", consumerName, task.MsgId)
} else {
time.Sleep(3 * time.Second) // 没有任何消息
}
}
}
func LoopDeleteDeadConsumer(client *redis.Client) {
// 每10扫一次consumer
ticker := time.NewTicker(10 * time.Second)
for {
select {
case <-ticker.C:
consumers, err := readConsumer(context.Background(), client)
if err != nil {
fmt.Println(err)
break
}
if len(consumers) == 0 {
fmt.Println("tg robot len(consumers) == 0 ")
break
}
for _, consumer := range consumers {
// 条件: pending == 0 且 idle 超过阈值
if consumer.Pending == 0 && consumer.Idle > consumerIdleTimeout {
// 执行删除
_, err = deleteConsumer(context.Background(), client, consumer.Name)
if err != nil {
fmt.Println(err)
continue
}
fmt.Printf("[%s] (idle: %s, pending: %d) was deleted\n",
consumer.Name, consumer.Idle.String(), consumer.Pending)
}
}
}
}
}
func readConsumer(ctx context.Context, client *redis.Client) ([]redis.XInfoConsumer, error) {
consumers, err := client.XInfoConsumers(ctx, SimpleStreamName, SimpleGroupName).Result()
if err != nil {
return nil, err
}
return consumers, nil
}
func deleteConsumer(ctx context.Context, client *redis.Client, consumerName string) (int64, error) {
deleted, err := client.XGroupDelConsumer(ctx, SimpleStreamName, SimpleGroupName, consumerName).Result()
if err != nil {
return 0, err
}
return deleted, err
}
func ackSuccessTask(ctx context.Context, client *redis.Client, streamName, groupName, messageID string) error {
tx := client.TxPipeline()
// ack 任务
tx.XAck(ctx, streamName, groupName, messageID)
// 删除消息
tx.XDel(ctx, streamName, messageID)
_, err := tx.Exec(ctx)
if err != nil {
return err
}
return nil
}
func pullTask(ctx context.Context, client *redis.Client, consumer string) (*TaskItem, error) {
streams, err := client.XReadGroup(ctx, &redis.XReadGroupArgs{
Group: SimpleGroupName,
Consumer: consumer,
Streams: []string{SimpleStreamName, ">"}, // ">" 是一个特殊 ID,表示从消费者组中尚未分配给任何消费者的新消息开始读取
Count: 1,
Block: 1000 * time.Millisecond,
}).Result()
if err != nil {
if err == redis.Nil {
return nil, nil
}
return nil, err
}
if len(streams) == 0 {
return nil, nil
}
if len(streams[0].Messages) == 0 {
return nil, nil
}
message := streams[0].Messages[0]
messageData := message.Values
msg, ok := messageData["Msg"].(string)
if !ok {
return nil, errors.New("messageData["Msg"].(string) !ok")
}
task := &TaskItem{
MsgId: message.ID,
Msg: msg,
}
return task, nil
}
func pullPelTask(client *redis.Client, consumerName string) (*TaskItem, error) {
result, _, err := client.XAutoClaim(context.Background(), &redis.XAutoClaimArgs{
Stream: SimpleStreamName,
Group: SimpleGroupName,
Consumer: consumerName, // 用于认领的消费者名
MinIdle: taskIdleTimeout, // 5秒未确认才被领取
Start: "0-0", // 特殊的消息 ID,表示从 PEL 的最开始(最早的消息)开始扫描
Count: 1, // 每次认领最多 1 条
}).Result()
if err != nil && err != redis.Nil {
return nil, fmt.Errorf("Error in XAUTOCLAIM: %v \n", err)
}
if len(result) == 0 {
return nil, nil
}
messageData := result[0].Values
msg, ok := messageData["Msg"].(string)
if !ok {
return nil, errors.New("messageData["Msg"].(string) !ok")
}
return &TaskItem{
MsgId: result[0].ID,
Msg: msg,
}, nil
}
func IsAllDone(client *redis.Client, ctx context.Context) bool {
queueSize, err := client.XLen(ctx, SimpleStreamName).Result()
if err != nil {
fmt.Println("client.XLen(ctx, streamName).Result() err = ", err)
return false
}
return queueSize == 0
}
func processMessage(item *TaskItem) error {
// 10%概率出错
num := rand.Intn(101)
if num >= 20 {
return nil
}
return errors.New("random process message error")
}
几个重要的函数
LoopConsume 不断循环获取消息,消费消息, 优先处理 pel 中积压的消息,pel中消息处理完了再消未被领取的消息
LoopDeleteDeadConsumer 定期清理死掉的consumer , 判断依据是pending=0和idle 大于某个值
processMessage 消费消息的函数, 为了让消息进入pel被二次获取,这里手动控制了10%的概率执行失败,这样就可以被其他consumer消费
IsAllDone 查看redis stream 中是否还有未被消费的消息
测试一下
测试过程
- 执行 push_task_test.go 的 TestPushTask 函数往 redis stream中手动插入50条消息
- 执行main.go 预期会处理完全部消息
- 等待两分钟,两分钟后再次执行TestPushTask往redis stream中手动插入50条消息
- 执行main.go 预期会处理完全部消息,且会删除掉第一次执行main.go的两个consumer(idle阈值时间是1分钟)
输出结果
第一次TestPushTask
=== RUN TestPushTask
push_task_test.go:37: add 50 tasks ok
--- PASS: TestPushTask (0.03s)
PASS
第一次执行main.go
[consumer:4012193680385755545]
[consumer:1114439635847877195]
[consumer:4012193680385755545] processed task failed, messageid = 1770889629531-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629531-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629532-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629532-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629533-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629533-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629534-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629534-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629535-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629536-0
[consumer:1114439635847877195] processed task failed, messageid = 1770889629537-0
[consumer:4012193680385755545] acked task ok, messageid = 1770889629536-1
[consumer:4012193680385755545] processed task failed, messageid = 1770889629537-1
[consumer:1114439635847877195] processed task failed, messageid = 1770889629538-0
[consumer:4012193680385755545] processed task failed, messageid = 1770889629539-1
[consumer:1114439635847877195] acked task ok, messageid = 1770889629539-0
[consumer:4012193680385755545] acked task ok, messageid = 1770889629540-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629540-1
[consumer:4012193680385755545] processed task failed, messageid = 1770889629541-0
[consumer:1114439635847877195] processed task failed, messageid = 1770889629541-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629541-2
[consumer:1114439635847877195] acked task ok, messageid = 1770889629542-0
[consumer:4012193680385755545] acked task ok, messageid = 1770889629542-1
[consumer:1114439635847877195] acked task ok, messageid = 1770889629543-0
[consumer:4012193680385755545] acked task ok, messageid = 1770889629543-1
[consumer:1114439635847877195] acked task ok, messageid = 1770889629543-2
[consumer:4012193680385755545] acked task ok, messageid = 1770889629544-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629544-1
[consumer:4012193680385755545] processed task failed, messageid = 1770889629545-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629545-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629546-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629546-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629546-2
[consumer:1114439635847877195] processed task failed, messageid = 1770889629547-0
[consumer:4012193680385755545] processed task failed, messageid = 1770889629547-1
[consumer:1114439635847877195] acked task ok, messageid = 1770889629547-2
[consumer:4012193680385755545] acked task ok, messageid = 1770889629548-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629548-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629549-0
[consumer:1114439635847877195] processed task failed, messageid = 1770889629549-1
[consumer:4012193680385755545] processed task failed, messageid = 1770889629549-2
[consumer:1114439635847877195] acked task ok, messageid = 1770889629550-0
[consumer:4012193680385755545] acked task ok, messageid = 1770889629550-1
[consumer:1114439635847877195] processed task failed, messageid = 1770889629550-2
[consumer:4012193680385755545] acked task ok, messageid = 1770889629551-0
[consumer:1114439635847877195] acked task ok, messageid = 1770889629551-1
[consumer:4012193680385755545] acked task ok, messageid = 1770889629551-2
[consumer:1114439635847877195] acked task ok, messageid = 1770889629552-0
[consumer:4012193680385755545] processed task failed, messageid = 1770889629552-1
[consumer:1114439635847877195] acked task ok, messageid = 1770889629552-2
[consumer:4012193680385755545] processed pel task failed, messageid = 1770889629537-0
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629531-0
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629537-1
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629538-0
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629539-1
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629541-0
[consumer:4012193680385755545] processed pel task failed, messageid = 1770889629541-1
[consumer:1114439635847877195] processed pel task failed, messageid = 1770889629545-0
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629547-0
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629547-1
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629549-1
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629549-2
[consumer:4012193680385755545] processed pel task failed, messageid = 1770889629550-2
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629552-1
[consumer:1114439635847877195] processed pel task failed, messageid = 1770889629537-0
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629541-1
[consumer:1114439635847877195] acked pel task ok, messageid = 1770889629545-0
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629550-2
[consumer:4012193680385755545] acked pel task ok, messageid = 1770889629537-0
all task done
第二次TestPushTask
=== RUN TestPushTask
push_task_test.go:37: add 50 tasks ok
--- PASS: TestPushTask (0.03s)
PASS
第二次执行main.go
[consumer:7319152151140819830]
[consumer:2792941594605994536]
[consumer:7319152151140819830] acked task ok, messageid = 1770889711125-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711126-0
[consumer:2792941594605994536] processed task failed, messageid = 1770889711127-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711126-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711127-1
[consumer:7319152151140819830] acked task ok, messageid = 1770889711128-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711129-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711129-1
[consumer:2792941594605994536] processed task failed, messageid = 1770889711129-2
[consumer:7319152151140819830] acked task ok, messageid = 1770889711130-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711130-1
[consumer:7319152151140819830] acked task ok, messageid = 1770889711131-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711132-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711132-1
[consumer:2792941594605994536] processed task failed, messageid = 1770889711133-0
[consumer:7319152151140819830] processed task failed, messageid = 1770889711133-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711134-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711134-1
[consumer:2792941594605994536] processed task failed, messageid = 1770889711135-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711135-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711135-2
[consumer:2792941594605994536] acked task ok, messageid = 1770889711136-1
[consumer:7319152151140819830] acked task ok, messageid = 1770889711136-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711137-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711136-2
[consumer:7319152151140819830] processed task failed, messageid = 1770889711137-2
[consumer:2792941594605994536] acked task ok, messageid = 1770889711137-1
[consumer:7319152151140819830] acked task ok, messageid = 1770889711138-0
[consumer:2792941594605994536] processed task failed, messageid = 1770889711138-1
[consumer:7319152151140819830] acked task ok, messageid = 1770889711138-2
[consumer:2792941594605994536] acked task ok, messageid = 1770889711139-0
[consumer:2792941594605994536] processed task failed, messageid = 1770889711139-1
[consumer:7319152151140819830] processed task failed, messageid = 1770889711139-2
[consumer:7319152151140819830] acked task ok, messageid = 1770889711140-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711139-3
[consumer:2792941594605994536] processed task failed, messageid = 1770889711140-2
[consumer:7319152151140819830] acked task ok, messageid = 1770889711140-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711140-3
[consumer:7319152151140819830] acked task ok, messageid = 1770889711141-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711141-1
[consumer:7319152151140819830] processed task failed, messageid = 1770889711141-2
[consumer:2792941594605994536] acked task ok, messageid = 1770889711142-0
[consumer:7319152151140819830] acked task ok, messageid = 1770889711141-3
[consumer:7319152151140819830] acked task ok, messageid = 1770889711142-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711142-2
[consumer:7319152151140819830] processed task failed, messageid = 1770889711143-0
[consumer:2792941594605994536] acked task ok, messageid = 1770889711142-3
[consumer:7319152151140819830] acked task ok, messageid = 1770889711143-1
[consumer:2792941594605994536] acked task ok, messageid = 1770889711143-2
[consumer:7319152151140819830] acked task ok, messageid = 1770889711143-3
[consumer:2792941594605994536] processed pel task failed, messageid = 1770889711129-2
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711127-0
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711133-0
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711133-1
[consumer:2792941594605994536] processed pel task failed, messageid = 1770889711135-0
[consumer:2792941594605994536] processed pel task failed, messageid = 1770889711137-2
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711138-1
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711139-1
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711139-2
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711140-2
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711141-2
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711143-0
[consumer:1114439635847877195] (idle: 2m16.696s, pending: 0) was deleted
[consumer:4012193680385755545] (idle: 2m16.695s, pending: 0) was deleted
[consumer:2792941594605994536] processed pel task failed, messageid = 1770889711135-0
[consumer:7319152151140819830] acked pel task ok, messageid = 1770889711129-2
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711137-2
[consumer:2792941594605994536] processed pel task failed, messageid = 1770889711135-0
[consumer:2792941594605994536] acked pel task ok, messageid = 1770889711135-0
all task done
可以看到全部task都被ack了且第一次的两个consumer: 4012193680385755545 和 consumer:1114439635847877195 都在第二次main.go中被删除掉了