你应该了解Redis性能边界

98 阅读2分钟

## 前言

之前线上发生一次redis cpu 使用率高的问题,导致服务雪崩。今天来复盘下。

#### 环境信息

redis:一主两从三哨兵

客户端java:部署在k8s

## 复盘

早上告警群,收到告警,redis cpu使用率百分之90多,随后部署在k8s的JAVA服务,健康检查失败,k8s杀死pod,开始重启。至于为什么会这也,我们下一篇文章在进行分析。

### 排查redis

连接进redis查看现在redis给的建议

```

latency doctor

```

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.

Dave, I have observed latency spikes in this Redis instance. You don't mind talking about it, do you Dave?

1. command: 100 latency spikes (average 106ms, mean deviation 6ms, period 168501.73 sec). Worst all time event 147ms.

I have a few advices for you:

- Check your Slow Log to understand what are the commands you are running which are too slow to execute. Please check redis.io/commands/sl… for more information.

- Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that are often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.

- I detected a non zero amount of anonymous huge pages used by your process. This creates very serious latency events in different conditions, especially when Redis is persisting on disk. To disable THP support use the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled', make sure to also add it into /etc/rc.local so that the command will be executed again after a reboot. Note that even if you have already disabled THP, you still need to restart the Redis process to get rid of the huge pages already created.

可以看到redis的建议有大量的慢日志,我们通过redis的命令查看redis的慢日志

```

slowlog get 5

```

此命令会输出redis当中的慢日志,我们这里的问题是因为使用了`keys`和`keys+*`.

keys的时间复杂度是O(N),N为执行该命令下的数据库的key的数量.

Redis的命令执行是单线程的,同一时间只能执行单个命令。单一长时间命令会堵塞后续。(可以通过debug sleep 0.1100ms 模拟执行长时间命令)。

我们的问题是因为redis大量使用了keys和keys*导致,redis cpu飙高。

所以优化掉慢日志上面的命令,使用scan代替keys从业务设计层面解决,redis命令时间为O(N)的。

如果你们是其他的redis 的问题,通过上面的命令还没有找到问题的,可以试试以下方式

### redis火焰图

对redis 进程进行采样

```

perf record -g --pid $(pgrep redis-server) -F 999 -- sleep 60

```

使用 *perf report*显示信息

```

perf report -g "graph,0.5,caller"

```

转换成火焰图

```

git clone github.com/brendangreg…

cd FlameGraph

perf script > redis.perf.stacks

./stackcollapse-perf.pl redis.perf.stacks > redis.folded.stacks

./flamegraph.pl redis.folded.stacks > redis.svg

```

然后查看火焰图的调用栈。

## 了解Redis性能边界

![](cdn.jsdelivr.net/gh/filess/i…)

本文使用 文章同步助手 同步