知识点编号:270
难度系数:⭐⭐⭐
实用指数:💯💯💯💯💯
📖 开篇:一次数据库爆满事故
凌晨3点,运维小哥打来电话:
"兄弟!数据库快炸了!磁盘使用率 98%!😱"
你睡眼惺忪地登录服务器,查看数据库:
SELECT TABLE_NAME,
ROUND(DATA_LENGTH/1024/1024/1024, 2) AS size_gb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'my_database'
ORDER BY DATA_LENGTH DESC;
+----------------+---------+
| TABLE_NAME | size_gb |
+----------------+---------+
| log_table | 450.23 | ← 日志表450GB!
| order_history | 120.45 | ← 订单历史120GB!
| user_login_log | 80.12 | ← 登录日志80GB!
+----------------+---------+
你:
"什么鬼!日志表怎么这么大?😭"
查看最早的数据:
SELECT MIN(created_at) FROM log_table;
-- 结果:2018-01-01(5年前的数据还在!)
产品经理第二天:
"小王啊,这些旧数据都没用了,为啥不删掉?赶紧做个定时清理!"
今天,我们就来学习如何优雅地清理过期数据! 🎯
🎯 为什么要清理过期数据?
1. 存储成本 💰
1TB SSD 云盘:约 1000元/月
日志增长速度:10GB/天
一年增长:3.6TB
成本:3600元/月 × 12 = 43,200元/年
如果定期清理,保留3个月数据:
90天 × 10GB = 900GB < 1TB
成本节省:75% 💰
2. 查询性能 ⚡
-- 1000万条数据的表
SELECT * FROM log_table WHERE user_id = 123;
-- 耗时:5秒 🐌
-- 清理后剩100万条
SELECT * FROM log_table WHERE user_id = 123;
-- 耗时:0.5秒 ⚡ (快10倍!)
3. 备份效率 📦
全量备份时间:
- 清理前:500GB → 6小时
- 清理后:50GB → 36分钟
4. 法律合规 ⚖️
《数据安全法》《个人信息保护法》要求:
- 用户注销后30天内删除数据
- 不必要的数据及时清理
🗑️ 需要清理的数据类型
| 数据类型 | 保留时长 | 清理策略 |
|---|---|---|
| 操作日志 📝 | 3-6个月 | 定期删除 |
| 登录日志 🔐 | 1-3个月 | 定期删除 |
| 临时文件 📁 | 7天 | 物理删除 |
| 订单数据 🛒 | 3年 | 归档+删除 |
| 用户行为 👤 | 6个月 | 归档到大数据平台 |
| 短信验证码 📱 | 5分钟 | Redis自动过期 |
| Session 🍪 | 30分钟 | Redis自动过期 |
| 未支付订单 💳 | 30分钟 | 软删除 |
| 回收站数据 🗑️ | 30天 | 物理删除 |
🎨 清理策略选型
┌──────────────────────────────────────────────────────────┐
│ 数据清理策略对比 │
└──────────────────────────────────────────────────────────┘
方案1: 物理删除(DELETE)
- 直接删除,不可恢复
- 适用:日志、临时数据
- 风险:误删无法恢复
方案2: 软删除(逻辑删除)
- 标记deleted=1
- 适用:业务数据
- 优点:可恢复
方案3: 归档(Archive)
- 转移到历史表/冷存储
- 适用:订单、用户数据
- 优点:降成本 + 可追溯
方案4: 分区表(Partition)
- 按时间分区
- 适用:大数据量
- 优点:秒级删除
方案5: TTL自动过期
- Redis/MongoDB自带
- 适用:缓存、会话
- 优点:自动化
💻 实战1:定时任务清理(Spring Boot)
1. 使用@Scheduled
@Component
@Slf4j
public class DataCleanTask {
@Autowired
private LogMapper logMapper;
@Autowired
private OrderMapper orderMapper;
/**
* 每天凌晨2点清理日志
* cron表达式:秒 分 时 日 月 周
*/
@Scheduled(cron = "0 0 2 * * ?")
public void cleanLogs() {
log.info("🧹 开始清理日志...");
// 计算3个月前的日期
LocalDateTime threeMonthsAgo = LocalDateTime.now().minusMonths(3);
// 删除3个月前的日志
int count = logMapper.deleteByCreateTime(threeMonthsAgo);
log.info("✅ 日志清理完成,删除{}条", count);
}
/**
* 每小时清理未支付订单
*/
@Scheduled(cron = "0 0 * * * ?")
public void cleanUnpaidOrders() {
log.info("🧹 开始清理未支付订单...");
// 30分钟前创建且未支付的订单
LocalDateTime halfHourAgo = LocalDateTime.now().minusMinutes(30);
// 更新状态为已取消
int count = orderMapper.cancelUnpaidOrders(halfHourAgo);
log.info("✅ 订单清理完成,取消{}笔", count);
}
/**
* 每月1号归档上月数据
*/
@Scheduled(cron = "0 0 3 1 * ?")
public void archiveLastMonth() {
log.info("🧹 开始归档上月数据...");
YearMonth lastMonth = YearMonth.now().minusMonths(1);
LocalDateTime startTime = lastMonth.atDay(1).atStartOfDay();
LocalDateTime endTime = lastMonth.atEndOfMonth().atTime(23, 59, 59);
// 1. 复制到归档表
int archiveCount = logMapper.archiveToHistory(startTime, endTime);
// 2. 删除原表数据
int deleteCount = logMapper.deleteByTimeRange(startTime, endTime);
log.info("✅ 归档完成:归档{}条,删除{}条", archiveCount, deleteCount);
}
}
2. Mapper实现
@Mapper
public interface LogMapper extends BaseMapper<Log> {
/**
* 删除指定时间之前的数据
*/
@Delete("DELETE FROM log WHERE created_at < #{time}")
int deleteByCreateTime(@Param("time") LocalDateTime time);
/**
* 分批删除(避免锁表)
*/
@Delete("DELETE FROM log WHERE created_at < #{time} LIMIT #{batchSize}")
int deleteBatch(@Param("time") LocalDateTime time,
@Param("batchSize") int batchSize);
/**
* 归档到历史表
*/
@Insert("INSERT INTO log_history SELECT * FROM log " +
"WHERE created_at BETWEEN #{startTime} AND #{endTime}")
int archiveToHistory(@Param("startTime") LocalDateTime startTime,
@Param("endTime") LocalDateTime endTime);
/**
* 删除时间范围内的数据
*/
@Delete("DELETE FROM log " +
"WHERE created_at BETWEEN #{startTime} AND #{endTime}")
int deleteByTimeRange(@Param("startTime") LocalDateTime startTime,
@Param("endTime") LocalDateTime endTime);
}
3. 分批删除(避免锁表)
@Service
public class DataCleanService {
@Autowired
private LogMapper logMapper;
private static final int BATCH_SIZE = 1000; // 每批1000条
/**
* 分批删除,避免长时间锁表
*/
@Transactional
public int deleteBatch(LocalDateTime beforeTime) {
int totalDeleted = 0;
int deleted;
do {
// 每次删除1000条
deleted = logMapper.deleteBatch(beforeTime, BATCH_SIZE);
totalDeleted += deleted;
log.info("已删除{}条,累计{}条", deleted, totalDeleted);
// 休息100ms,释放锁
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
} while (deleted > 0);
return totalDeleted;
}
}
🗄️ 实战2:分区表清理(推荐⭐⭐⭐⭐⭐)
1. 创建分区表
-- 按月分区的日志表
CREATE TABLE log (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT,
action VARCHAR(50),
created_at DATETIME
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
-- ... 更多分区
PARTITION pmax VALUES LESS THAN MAXVALUE -- 兜底分区
);
2. 删除整个分区(秒级删除!)
-- 删除2024年1月的数据(瞬间完成!)
ALTER TABLE log DROP PARTITION p202401;
-- 对比:
-- DELETE删除100万条数据:需要5分钟 ⏰
-- DROP PARTITION:不到1秒 ⚡
3. 自动管理分区
@Service
public class PartitionManageService {
@Autowired
private JdbcTemplate jdbcTemplate;
/**
* 每月自动创建新分区
*/
@Scheduled(cron = "0 0 0 1 * ?") // 每月1号
public void createNextMonthPartition() {
YearMonth nextMonth = YearMonth.now().plusMonths(1);
String partitionName = "p" + nextMonth.toString().replace("-", "");
// 计算分区边界
LocalDate boundary = nextMonth.plusMonths(1).atDay(1);
String sql = String.format(
"ALTER TABLE log ADD PARTITION (" +
"PARTITION %s VALUES LESS THAN (TO_DAYS('%s'))" +
")",
partitionName,
boundary
);
jdbcTemplate.execute(sql);
log.info("✅ 创建分区:{}", partitionName);
}
/**
* 删除3个月前的分区
*/
@Scheduled(cron = "0 0 1 1 * ?") // 每月1号凌晨1点
public void dropOldPartitions() {
YearMonth threeMonthsAgo = YearMonth.now().minusMonths(3);
String partitionName = "p" + threeMonthsAgo.toString().replace("-", "");
// 检查分区是否存在
String checkSql =
"SELECT COUNT(*) FROM information_schema.PARTITIONS " +
"WHERE TABLE_NAME = 'log' AND PARTITION_NAME = ?";
Integer count = jdbcTemplate.queryForObject(checkSql, Integer.class, partitionName);
if (count != null && count > 0) {
String dropSql = String.format(
"ALTER TABLE log DROP PARTITION %s",
partitionName
);
jdbcTemplate.execute(dropSql);
log.info("✅ 删除分区:{}", partitionName);
}
}
}
4. 查询分区信息
-- 查看所有分区
SELECT
PARTITION_NAME,
PARTITION_DESCRIPTION,
TABLE_ROWS,
DATA_LENGTH / 1024 / 1024 AS size_mb
FROM information_schema.PARTITIONS
WHERE TABLE_NAME = 'log'
ORDER BY PARTITION_ORDINAL_POSITION;
+-------------+----------------------+------------+---------+
| PARTITION_NAME | PARTITION_DESCRIPTION | TABLE_ROWS | size_mb |
+-------------+----------------------+------------+---------+
| p202401 | 739252 | 1000000 | 120.5 |
| p202402 | 739282 | 1200000 | 145.2 |
| p202403 | 739311 | 1100000 | 132.8 |
+-------------+----------------------+------------+---------+
📦 实战3:归档方案
1. 归档表设计
-- 主表(热数据)
CREATE TABLE `order` (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_no VARCHAR(50) NOT NULL,
user_id BIGINT,
amount DECIMAL(10,2),
status TINYINT,
created_at DATETIME,
updated_at DATETIME,
KEY idx_created (created_at)
);
-- 归档表(冷数据)
CREATE TABLE `order_archive` (
id BIGINT PRIMARY KEY,
order_no VARCHAR(50) NOT NULL,
user_id BIGINT,
amount DECIMAL(10,2),
status TINYINT,
created_at DATETIME,
updated_at DATETIME,
archived_at DATETIME DEFAULT CURRENT_TIMESTAMP,
KEY idx_created (created_at),
KEY idx_archived (archived_at)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED; -- 压缩存储
2. 归档任务
@Service
public class OrderArchiveService {
@Autowired
private OrderMapper orderMapper;
@Autowired
private OrderArchiveMapper archiveMapper;
/**
* 归档3年前的订单
*/
@Scheduled(cron = "0 0 3 * * ?") // 每天凌晨3点
@Transactional
public void archiveOrders() {
log.info("🧹 开始归档订单...");
LocalDateTime threeYearsAgo = LocalDateTime.now().minusYears(3);
int batchSize = 1000;
int totalArchived = 0;
while (true) {
// 1. 查询需要归档的订单
List<Order> orders = orderMapper.selectOldOrders(
threeYearsAgo,
batchSize
);
if (orders.isEmpty()) {
break;
}
// 2. 写入归档表
int archived = archiveMapper.batchInsert(orders);
// 3. 从主表删除
List<Long> ids = orders.stream()
.map(Order::getId)
.collect(Collectors.toList());
int deleted = orderMapper.deleteBatchByIds(ids);
totalArchived += archived;
log.info("归档进度:本批{}条,累计{}条", archived, totalArchived);
// 休息,避免锁表
ThreadUtil.sleep(500);
}
log.info("✅ 归档完成,共{}条", totalArchived);
}
/**
* 查询订单(自动查主表和归档表)
*/
public Order getOrderByNo(String orderNo) {
// 1. 先查主表
Order order = orderMapper.selectByOrderNo(orderNo);
if (order != null) {
return order;
}
// 2. 查归档表
return archiveMapper.selectByOrderNo(orderNo);
}
}
3. 归档到OSS/S3(成本最低)
@Service
public class OssArchiveService {
@Autowired
private OSSClient ossClient;
@Autowired
private LogMapper logMapper;
/**
* 归档到OSS(成本仅为数据库的1/10)
*/
@Scheduled(cron = "0 0 2 * * ?")
public void archiveToOss() {
LocalDate yesterday = LocalDate.now().minusDays(1);
// 1. 查询昨天的数据
List<Log> logs = logMapper.selectByDate(yesterday);
// 2. 转JSON
String json = JSON.toJSONString(logs);
// 3. 压缩
byte[] compressed = GzipUtil.compress(json.getBytes());
// 4. 上传到OSS
String fileName = String.format("logs/%s.json.gz", yesterday);
ossClient.putObject(
"my-bucket",
fileName,
new ByteArrayInputStream(compressed)
);
// 5. 删除数据库记录
logMapper.deleteByDate(yesterday);
log.info("✅ 归档到OSS:{},大小:{}KB",
fileName,
compressed.length / 1024
);
}
/**
* 从OSS读取归档数据
*/
public List<Log> getLogsFromOss(LocalDate date) {
String fileName = String.format("logs/%s.json.gz", date);
// 1. 从OSS下载
OSSObject ossObject = ossClient.getObject("my-bucket", fileName);
byte[] compressed = IoUtil.readBytes(ossObject.getObjectContent());
// 2. 解压
byte[] decompressed = GzipUtil.decompress(compressed);
// 3. 解析JSON
return JSON.parseArray(new String(decompressed), Log.class);
}
}
🚀 高级技巧
1. 软删除 + 定期物理删除
@Data
@TableName("user")
public class User extends BaseEntity {
private Long id;
private String username;
@TableLogic // MyBatis-Plus逻辑删除
private Integer deleted; // 0-正常 1-已删除
private LocalDateTime deletedAt; // 删除时间
}
@Service
public class UserCleanService {
@Autowired
private UserMapper userMapper;
/**
* 物理删除30天前软删除的用户
*/
@Scheduled(cron = "0 0 4 * * ?")
public void physicalDelete() {
LocalDateTime thirtyDaysAgo = LocalDateTime.now().minusDays(30);
// 直接执行SQL(绕过MyBatis-Plus逻辑删除)
int count = userMapper.physicalDeleteBefore(thirtyDaysAgo);
log.info("✅ 物理删除{}个用户", count);
}
}
@Mapper
public interface UserMapper extends BaseMapper<User> {
@Delete("DELETE FROM user WHERE deleted = 1 AND deleted_at < #{time}")
int physicalDeleteBefore(@Param("time") LocalDateTime time);
}
2. 数据压缩存储
-- 使用压缩表(节省50%~70%空间)
CREATE TABLE log_archive (
id BIGINT,
content TEXT,
created_at DATETIME
) ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
KEY_BLOCK_SIZE=8; -- 8KB压缩块
-- 对比:
-- 普通表:10GB
-- 压缩表:3GB(节省70%)
3. 监控清理效果
@Service
public class CleanMonitorService {
@Autowired
private JdbcTemplate jdbcTemplate;
/**
* 统计表大小
*/
public Map<String, Long> getTableSizes() {
String sql =
"SELECT TABLE_NAME, " +
" DATA_LENGTH + INDEX_LENGTH AS size " +
"FROM information_schema.TABLES " +
"WHERE TABLE_SCHEMA = DATABASE()";
List<Map<String, Object>> rows = jdbcTemplate.queryForList(sql);
return rows.stream()
.collect(Collectors.toMap(
row -> (String) row.get("TABLE_NAME"),
row -> (Long) row.get("size")
));
}
/**
* 发送清理报告
*/
@Scheduled(cron = "0 0 9 * * ?") // 每天早上9点
public void sendReport() {
Map<String, Long> sizes = getTableSizes();
StringBuilder report = new StringBuilder();
report.append("📊 数据库表空间统计\n\n");
sizes.forEach((table, size) -> {
double sizeMB = size / 1024.0 / 1024.0;
report.append(String.format("%s: %.2f MB\n", table, sizeMB));
});
// 发送到钉钉/企业微信
dingTalkService.send(report.toString());
}
}
⚠️ 注意事项
1. 删除前备份
@Service
public class SafeDeleteService {
/**
* 安全删除(删除前备份)
*/
@Transactional
public void safeDelete(LocalDateTime beforeTime) {
// 1. 先备份
String backupFile = backupToFile(beforeTime);
log.info("备份完成:{}", backupFile);
// 2. 再删除
int count = logMapper.deleteByCreateTime(beforeTime);
log.info("删除{}条数据", count);
// 3. 验证
if (!verifyBackup(backupFile, count)) {
throw new RuntimeException("备份验证失败,回滚删除");
}
}
private String backupToFile(LocalDateTime beforeTime) {
List<Log> logs = logMapper.selectBeforeTime(beforeTime);
String fileName = String.format("backup_%s.json",
LocalDate.now()
);
FileUtil.writeString(
JSON.toJSONString(logs),
new File("/backup/" + fileName),
StandardCharsets.UTF_8
);
return fileName;
}
}
2. 避免高峰期清理
@Component
public class SmartScheduler {
/**
* 智能调度(避开业务高峰)
*/
@Scheduled(cron = "0 */10 * * * ?") // 每10分钟检查一次
public void smartClean() {
// 获取当前QPS
long currentQps = metricsService.getCurrentQps();
// 只在QPS < 100时执行清理
if (currentQps < 100) {
log.info("当前QPS:{},开始清理", currentQps);
dataCleanService.clean();
} else {
log.info("当前QPS:{},跳过清理", currentQps);
}
}
}
3. 限速删除
@Service
public class RateLimitedCleanService {
// 使用Guava限流器
private final RateLimiter rateLimiter = RateLimiter.create(100); // 每秒100条
public int deleteWithRateLimit(LocalDateTime beforeTime) {
int totalDeleted = 0;
int batchSize = 100;
while (true) {
// 等待令牌
rateLimiter.acquire(batchSize);
int deleted = logMapper.deleteBatch(beforeTime, batchSize);
if (deleted == 0) break;
totalDeleted += deleted;
}
return totalDeleted;
}
}
📊 效果对比
清理前 vs 清理后
| 指标 | 清理前 | 清理后 | 提升 |
|---|---|---|---|
| 表大小 | 500GB | 50GB | 90% ↓ |
| 查询时间 | 5秒 | 0.5秒 | 10倍 ⚡ |
| 备份时间 | 6小时 | 36分钟 | 10倍 ⚡ |
| 存储成本 | 5000元/月 | 500元/月 | 90% ↓ |
📝 总结
最佳实践 Checklist
- ✅ 使用分区表(推荐)
- ✅ 定期归档到冷存储
- ✅ 分批删除避免锁表
- ✅ 删除前备份
- ✅ 避开高峰期
- ✅ 监控清理效果
- ✅ 软删除 + 延迟物理删除
方案选型
场景1: 日志数据
→ 分区表 + 定期DROP分区
场景2: 业务数据
→ 归档到历史表 + 软删除
场景3: 大数据量
→ 归档到OSS/S3
场景4: 缓存数据
→ Redis TTL自动过期
让你的数据库始终保持"苗条"! 🎉🎉🎉