让 AI 助手帮你监控一切
监控什么?
1. 系统监控
- CPU 使用率
- 内存使用率
- 磁盘空间
- 网络状态
2. 服务监控
- API 可用性
- 数据库连接
- 第三方服务
3. 业务监控
- 订单量
- 错误率
- 响应时间
快速开始
配置监控脚本
# ~/.openclaw/scripts/monitor.sh
#!/bin/bash
# CPU
CPU=$(top -l 1 | grep "CPU usage" | awk '{print $3}' | sed 's/%//')
# 内存
MEM=$(vm_stat | grep "free" | awk '{print $3}' | sed 's/\.//')
MEM_GB=$((MEM * 4096 / 1024 / 1024 / 1024))
# 磁盘
DISK=$(df -h / | tail -1 | awk '{print $5}' | sed 's/%//')
# 判断是否告警
if [ "$CPU" -gt 80 ]; then
echo "⚠️ CPU 使用率过高: ${CPU}%"
fi
if [ "$DISK" -gt 90 ]; then
echo "⚠️ 磁盘空间不足: ${DISK}%"
fi
配置心跳
heartbeat:
tasks:
- name: system-monitor
interval: 5m
action: ~/.openclaw/scripts/monitor.sh
on_alert: send-telegram
告警渠道
Telegram
channels:
- name: alert-bot
type: telegram
token: ${TG_TOKEN}
chat_id: ${TG_CHAT_ID}
邮件
channels:
- name: email-alert
type: email
smtp: smtp.gmail.com
from: alert@example.com
to: admin@example.com
飞书
channels:
- name: feishu-alert
type: feishu
webhook: ${FEISHU_WEBHOOK}
监控模板
服务器监控
heartbeat:
tasks:
- name: server-health
action: |
CPU=$(top -l 1 | grep "CPU" | awk '{print $3}')
if [ "$CPU" -gt 80 ]; then
echo "⚠️ CPU 告警: ${CPU}%"
fi
API 监控
heartbeat:
tasks:
- name: api-health
action: |
curl -f https://api.example.com/health || echo "⚠️ API 不可用"
数据库监控
heartbeat:
tasks:
- name: db-health
action: |
pg_isready -h localhost || echo "⚠️ 数据库不可用"
告警规则
阈值告警
alerts:
- name: high-cpu
condition: cpu > 80%
duration: 5m
action: send-alert
趋势告警
alerts:
- name: disk-filling
condition: disk_growth > 1GB/day
action: send-alert
异常告警
alerts:
- name: error-spike
condition: error_rate > 5%
action: send-alert
告警聚合
避免告警风暴:
alerting:
aggregation:
enabled: true
window: 5m # 5 分钟内相同告警只发一次
静默时段
alerting:
silence:
start: "23:00"
end: "07:00"
# 深夜不发送非紧急告警
监控面板
Prometheus + Grafana
metrics:
enabled: true
port: 9090
path: /metrics
导入 Grafana 仪表盘,可视化监控。
实战案例
案例 1:网站监控
heartbeat:
tasks:
- name: website-monitor
interval: 1m
action: |
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://example.com)
if [ "$STATUS" != "200" ]; then
echo "⚠️ 网站异常: HTTP $STATUS"
fi
案例 2:服务进程监控
heartbeat:
tasks:
- name: process-monitor
action: |
if ! pgrep -x "myapp" > /dev/null; then
echo "⚠️ myapp 进程不存在"
# 自动重启
systemctl restart myapp
fi
案例 3:证书过期监控
heartbeat:
tasks:
- name: cert-monitor
schedule: "0 0 * * *" # 每天检查
action: |
EXPIRE=$(openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -enddate)
echo "证书过期时间: $EXPIRE"
最佳实践
- 分级告警:P1 立即通知,P2 邮件,P3 日志
- 避免噪音:聚合、静默、去重
- 自动修复:能自动修复的尽量自动
- 定期演练:确保告警渠道正常
- 文档记录:每个告警的处理步骤
💬 你监控什么?评论区分享!
🎯 需要监控配置服务?微信:yang1002378395