🧹 定时清理过期数据:数据界的断舍离大师

16 阅读10分钟

知识点编号:270
难度系数:⭐⭐⭐
实用指数:💯💯💯💯💯


📖 开篇:一次数据库爆满事故

凌晨3点,运维小哥打来电话:

"兄弟!数据库快炸了!磁盘使用率 98%!😱"

你睡眼惺忪地登录服务器,查看数据库:

SELECT TABLE_NAME, 
       ROUND(DATA_LENGTH/1024/1024/1024, 2) AS size_gb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'my_database'
ORDER BY DATA_LENGTH DESC;

+----------------+---------+
| TABLE_NAME     | size_gb |
+----------------+---------+
| log_table      | 450.23  |  ← 日志表450GB!
| order_history  | 120.45  |  ← 订单历史120GB!
| user_login_log |  80.12  |  ← 登录日志80GB!
+----------------+---------+

你:

"什么鬼!日志表怎么这么大?😭"

查看最早的数据:

SELECT MIN(created_at) FROM log_table;
-- 结果:2018-01-01(5年前的数据还在!)

产品经理第二天:

"小王啊,这些旧数据都没用了,为啥不删掉?赶紧做个定时清理!"


今天,我们就来学习如何优雅地清理过期数据! 🎯


🎯 为什么要清理过期数据?

1. 存储成本 💰

1TB SSD 云盘:约 1000元/月
日志增长速度:10GB/天
一年增长:3.6TB
成本:3600元/月 × 12 = 43,200元/年

如果定期清理,保留3个月数据:
90天 × 10GB = 900GB < 1TB
成本节省:75% 💰

2. 查询性能 ⚡

-- 1000万条数据的表
SELECT * FROM log_table WHERE user_id = 123;
-- 耗时:5秒 🐌

-- 清理后剩100万条
SELECT * FROM log_table WHERE user_id = 123;
-- 耗时:0.5秒 ⚡ (快10倍!)

3. 备份效率 📦

全量备份时间:
- 清理前:500GB → 6小时
- 清理后:50GB → 36分钟

4. 法律合规 ⚖️

《数据安全法》《个人信息保护法》要求:
- 用户注销后30天内删除数据
- 不必要的数据及时清理

🗑️ 需要清理的数据类型

数据类型保留时长清理策略
操作日志 📝3-6个月定期删除
登录日志 🔐1-3个月定期删除
临时文件 📁7天物理删除
订单数据 🛒3年归档+删除
用户行为 👤6个月归档到大数据平台
短信验证码 📱5分钟Redis自动过期
Session 🍪30分钟Redis自动过期
未支付订单 💳30分钟软删除
回收站数据 🗑️30天物理删除

🎨 清理策略选型

┌──────────────────────────────────────────────────────────┐
│              数据清理策略对比                              │
└──────────────────────────────────────────────────────────┘

方案1: 物理删除(DELETE)
  - 直接删除,不可恢复
  - 适用:日志、临时数据
  - 风险:误删无法恢复

方案2: 软删除(逻辑删除)
  - 标记deleted=1
  - 适用:业务数据
  - 优点:可恢复

方案3: 归档(Archive)
  - 转移到历史表/冷存储
  - 适用:订单、用户数据
  - 优点:降成本 + 可追溯

方案4: 分区表(Partition)
  - 按时间分区
  - 适用:大数据量
  - 优点:秒级删除

方案5: TTL自动过期
  - Redis/MongoDB自带
  - 适用:缓存、会话
  - 优点:自动化

💻 实战1:定时任务清理(Spring Boot)

1. 使用@Scheduled

@Component
@Slf4j
public class DataCleanTask {
    
    @Autowired
    private LogMapper logMapper;
    
    @Autowired
    private OrderMapper orderMapper;
    
    /**
     * 每天凌晨2点清理日志
     * cron表达式:秒 分 时 日 月 周
     */
    @Scheduled(cron = "0 0 2 * * ?")
    public void cleanLogs() {
        log.info("🧹 开始清理日志...");
        
        // 计算3个月前的日期
        LocalDateTime threeMonthsAgo = LocalDateTime.now().minusMonths(3);
        
        // 删除3个月前的日志
        int count = logMapper.deleteByCreateTime(threeMonthsAgo);
        
        log.info("✅ 日志清理完成,删除{}条", count);
    }
    
    /**
     * 每小时清理未支付订单
     */
    @Scheduled(cron = "0 0 * * * ?")
    public void cleanUnpaidOrders() {
        log.info("🧹 开始清理未支付订单...");
        
        // 30分钟前创建且未支付的订单
        LocalDateTime halfHourAgo = LocalDateTime.now().minusMinutes(30);
        
        // 更新状态为已取消
        int count = orderMapper.cancelUnpaidOrders(halfHourAgo);
        
        log.info("✅ 订单清理完成,取消{}笔", count);
    }
    
    /**
     * 每月1号归档上月数据
     */
    @Scheduled(cron = "0 0 3 1 * ?")
    public void archiveLastMonth() {
        log.info("🧹 开始归档上月数据...");
        
        YearMonth lastMonth = YearMonth.now().minusMonths(1);
        LocalDateTime startTime = lastMonth.atDay(1).atStartOfDay();
        LocalDateTime endTime = lastMonth.atEndOfMonth().atTime(23, 59, 59);
        
        // 1. 复制到归档表
        int archiveCount = logMapper.archiveToHistory(startTime, endTime);
        
        // 2. 删除原表数据
        int deleteCount = logMapper.deleteByTimeRange(startTime, endTime);
        
        log.info("✅ 归档完成:归档{}条,删除{}条", archiveCount, deleteCount);
    }
}

2. Mapper实现

@Mapper
public interface LogMapper extends BaseMapper<Log> {
    
    /**
     * 删除指定时间之前的数据
     */
    @Delete("DELETE FROM log WHERE created_at < #{time}")
    int deleteByCreateTime(@Param("time") LocalDateTime time);
    
    /**
     * 分批删除(避免锁表)
     */
    @Delete("DELETE FROM log WHERE created_at < #{time} LIMIT #{batchSize}")
    int deleteBatch(@Param("time") LocalDateTime time, 
                   @Param("batchSize") int batchSize);
    
    /**
     * 归档到历史表
     */
    @Insert("INSERT INTO log_history SELECT * FROM log " +
            "WHERE created_at BETWEEN #{startTime} AND #{endTime}")
    int archiveToHistory(@Param("startTime") LocalDateTime startTime,
                        @Param("endTime") LocalDateTime endTime);
    
    /**
     * 删除时间范围内的数据
     */
    @Delete("DELETE FROM log " +
            "WHERE created_at BETWEEN #{startTime} AND #{endTime}")
    int deleteByTimeRange(@Param("startTime") LocalDateTime startTime,
                         @Param("endTime") LocalDateTime endTime);
}

3. 分批删除(避免锁表)

@Service
public class DataCleanService {
    
    @Autowired
    private LogMapper logMapper;
    
    private static final int BATCH_SIZE = 1000;  // 每批1000条
    
    /**
     * 分批删除,避免长时间锁表
     */
    @Transactional
    public int deleteBatch(LocalDateTime beforeTime) {
        int totalDeleted = 0;
        int deleted;
        
        do {
            // 每次删除1000条
            deleted = logMapper.deleteBatch(beforeTime, BATCH_SIZE);
            totalDeleted += deleted;
            
            log.info("已删除{}条,累计{}条", deleted, totalDeleted);
            
            // 休息100ms,释放锁
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            
        } while (deleted > 0);
        
        return totalDeleted;
    }
}

🗄️ 实战2:分区表清理(推荐⭐⭐⭐⭐⭐)

1. 创建分区表

-- 按月分区的日志表
CREATE TABLE log (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT,
    action VARCHAR(50),
    created_at DATETIME
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
    PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
    -- ... 更多分区
    PARTITION pmax VALUES LESS THAN MAXVALUE  -- 兜底分区
);

2. 删除整个分区(秒级删除!)

-- 删除2024年1月的数据(瞬间完成!)
ALTER TABLE log DROP PARTITION p202401;

-- 对比:
-- DELETE删除100万条数据:需要5分钟 ⏰
-- DROP PARTITION:不到1秒 ⚡

3. 自动管理分区

@Service
public class PartitionManageService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 每月自动创建新分区
     */
    @Scheduled(cron = "0 0 0 1 * ?")  // 每月1号
    public void createNextMonthPartition() {
        YearMonth nextMonth = YearMonth.now().plusMonths(1);
        String partitionName = "p" + nextMonth.toString().replace("-", "");
        
        // 计算分区边界
        LocalDate boundary = nextMonth.plusMonths(1).atDay(1);
        
        String sql = String.format(
            "ALTER TABLE log ADD PARTITION (" +
            "PARTITION %s VALUES LESS THAN (TO_DAYS('%s'))" +
            ")",
            partitionName,
            boundary
        );
        
        jdbcTemplate.execute(sql);
        log.info("✅ 创建分区:{}", partitionName);
    }
    
    /**
     * 删除3个月前的分区
     */
    @Scheduled(cron = "0 0 1 1 * ?")  // 每月1号凌晨1点
    public void dropOldPartitions() {
        YearMonth threeMonthsAgo = YearMonth.now().minusMonths(3);
        String partitionName = "p" + threeMonthsAgo.toString().replace("-", "");
        
        // 检查分区是否存在
        String checkSql = 
            "SELECT COUNT(*) FROM information_schema.PARTITIONS " +
            "WHERE TABLE_NAME = 'log' AND PARTITION_NAME = ?";
        
        Integer count = jdbcTemplate.queryForObject(checkSql, Integer.class, partitionName);
        
        if (count != null && count > 0) {
            String dropSql = String.format(
                "ALTER TABLE log DROP PARTITION %s", 
                partitionName
            );
            
            jdbcTemplate.execute(dropSql);
            log.info("✅ 删除分区:{}", partitionName);
        }
    }
}

4. 查询分区信息

-- 查看所有分区
SELECT 
    PARTITION_NAME,
    PARTITION_DESCRIPTION,
    TABLE_ROWS,
    DATA_LENGTH / 1024 / 1024 AS size_mb
FROM information_schema.PARTITIONS
WHERE TABLE_NAME = 'log'
ORDER BY PARTITION_ORDINAL_POSITION;

+-------------+----------------------+------------+---------+
| PARTITION_NAME | PARTITION_DESCRIPTION | TABLE_ROWS | size_mb |
+-------------+----------------------+------------+---------+
| p202401     | 739252               |   1000000  |  120.5  |
| p202402     | 739282               |   1200000  |  145.2  |
| p202403     | 739311               |   1100000  |  132.8  |
+-------------+----------------------+------------+---------+

📦 实战3:归档方案

1. 归档表设计

-- 主表(热数据)
CREATE TABLE `order` (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    order_no VARCHAR(50) NOT NULL,
    user_id BIGINT,
    amount DECIMAL(10,2),
    status TINYINT,
    created_at DATETIME,
    updated_at DATETIME,
    KEY idx_created (created_at)
);

-- 归档表(冷数据)
CREATE TABLE `order_archive` (
    id BIGINT PRIMARY KEY,
    order_no VARCHAR(50) NOT NULL,
    user_id BIGINT,
    amount DECIMAL(10,2),
    status TINYINT,
    created_at DATETIME,
    updated_at DATETIME,
    archived_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    KEY idx_created (created_at),
    KEY idx_archived (archived_at)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;  -- 压缩存储

2. 归档任务

@Service
public class OrderArchiveService {
    
    @Autowired
    private OrderMapper orderMapper;
    
    @Autowired
    private OrderArchiveMapper archiveMapper;
    
    /**
     * 归档3年前的订单
     */
    @Scheduled(cron = "0 0 3 * * ?")  // 每天凌晨3点
    @Transactional
    public void archiveOrders() {
        log.info("🧹 开始归档订单...");
        
        LocalDateTime threeYearsAgo = LocalDateTime.now().minusYears(3);
        
        int batchSize = 1000;
        int totalArchived = 0;
        
        while (true) {
            // 1. 查询需要归档的订单
            List<Order> orders = orderMapper.selectOldOrders(
                threeYearsAgo, 
                batchSize
            );
            
            if (orders.isEmpty()) {
                break;
            }
            
            // 2. 写入归档表
            int archived = archiveMapper.batchInsert(orders);
            
            // 3. 从主表删除
            List<Long> ids = orders.stream()
                .map(Order::getId)
                .collect(Collectors.toList());
            
            int deleted = orderMapper.deleteBatchByIds(ids);
            
            totalArchived += archived;
            
            log.info("归档进度:本批{}条,累计{}条", archived, totalArchived);
            
            // 休息,避免锁表
            ThreadUtil.sleep(500);
        }
        
        log.info("✅ 归档完成,共{}条", totalArchived);
    }
    
    /**
     * 查询订单(自动查主表和归档表)
     */
    public Order getOrderByNo(String orderNo) {
        // 1. 先查主表
        Order order = orderMapper.selectByOrderNo(orderNo);
        
        if (order != null) {
            return order;
        }
        
        // 2. 查归档表
        return archiveMapper.selectByOrderNo(orderNo);
    }
}

3. 归档到OSS/S3(成本最低)

@Service
public class OssArchiveService {
    
    @Autowired
    private OSSClient ossClient;
    
    @Autowired
    private LogMapper logMapper;
    
    /**
     * 归档到OSS(成本仅为数据库的1/10)
     */
    @Scheduled(cron = "0 0 2 * * ?")
    public void archiveToOss() {
        LocalDate yesterday = LocalDate.now().minusDays(1);
        
        // 1. 查询昨天的数据
        List<Log> logs = logMapper.selectByDate(yesterday);
        
        // 2. 转JSON
        String json = JSON.toJSONString(logs);
        
        // 3. 压缩
        byte[] compressed = GzipUtil.compress(json.getBytes());
        
        // 4. 上传到OSS
        String fileName = String.format("logs/%s.json.gz", yesterday);
        ossClient.putObject(
            "my-bucket", 
            fileName, 
            new ByteArrayInputStream(compressed)
        );
        
        // 5. 删除数据库记录
        logMapper.deleteByDate(yesterday);
        
        log.info("✅ 归档到OSS:{},大小:{}KB", 
            fileName, 
            compressed.length / 1024
        );
    }
    
    /**
     * 从OSS读取归档数据
     */
    public List<Log> getLogsFromOss(LocalDate date) {
        String fileName = String.format("logs/%s.json.gz", date);
        
        // 1. 从OSS下载
        OSSObject ossObject = ossClient.getObject("my-bucket", fileName);
        byte[] compressed = IoUtil.readBytes(ossObject.getObjectContent());
        
        // 2. 解压
        byte[] decompressed = GzipUtil.decompress(compressed);
        
        // 3. 解析JSON
        return JSON.parseArray(new String(decompressed), Log.class);
    }
}

🚀 高级技巧

1. 软删除 + 定期物理删除

@Data
@TableName("user")
public class User extends BaseEntity {
    private Long id;
    private String username;
    
    @TableLogic  // MyBatis-Plus逻辑删除
    private Integer deleted;  // 0-正常 1-已删除
    
    private LocalDateTime deletedAt;  // 删除时间
}

@Service
public class UserCleanService {
    
    @Autowired
    private UserMapper userMapper;
    
    /**
     * 物理删除30天前软删除的用户
     */
    @Scheduled(cron = "0 0 4 * * ?")
    public void physicalDelete() {
        LocalDateTime thirtyDaysAgo = LocalDateTime.now().minusDays(30);
        
        // 直接执行SQL(绕过MyBatis-Plus逻辑删除)
        int count = userMapper.physicalDeleteBefore(thirtyDaysAgo);
        
        log.info("✅ 物理删除{}个用户", count);
    }
}

@Mapper
public interface UserMapper extends BaseMapper<User> {
    
    @Delete("DELETE FROM user WHERE deleted = 1 AND deleted_at < #{time}")
    int physicalDeleteBefore(@Param("time") LocalDateTime time);
}

2. 数据压缩存储

-- 使用压缩表(节省50%~70%空间)
CREATE TABLE log_archive (
    id BIGINT,
    content TEXT,
    created_at DATETIME
) ENGINE=InnoDB 
  ROW_FORMAT=COMPRESSED 
  KEY_BLOCK_SIZE=8;  -- 8KB压缩块

-- 对比:
-- 普通表:10GB
-- 压缩表:3GB(节省70%)

3. 监控清理效果

@Service
public class CleanMonitorService {
    
    @Autowired
    private JdbcTemplate jdbcTemplate;
    
    /**
     * 统计表大小
     */
    public Map<String, Long> getTableSizes() {
        String sql = 
            "SELECT TABLE_NAME, " +
            "       DATA_LENGTH + INDEX_LENGTH AS size " +
            "FROM information_schema.TABLES " +
            "WHERE TABLE_SCHEMA = DATABASE()";
        
        List<Map<String, Object>> rows = jdbcTemplate.queryForList(sql);
        
        return rows.stream()
            .collect(Collectors.toMap(
                row -> (String) row.get("TABLE_NAME"),
                row -> (Long) row.get("size")
            ));
    }
    
    /**
     * 发送清理报告
     */
    @Scheduled(cron = "0 0 9 * * ?")  // 每天早上9点
    public void sendReport() {
        Map<String, Long> sizes = getTableSizes();
        
        StringBuilder report = new StringBuilder();
        report.append("📊 数据库表空间统计\n\n");
        
        sizes.forEach((table, size) -> {
            double sizeMB = size / 1024.0 / 1024.0;
            report.append(String.format("%s: %.2f MB\n", table, sizeMB));
        });
        
        // 发送到钉钉/企业微信
        dingTalkService.send(report.toString());
    }
}

⚠️ 注意事项

1. 删除前备份

@Service
public class SafeDeleteService {
    
    /**
     * 安全删除(删除前备份)
     */
    @Transactional
    public void safeDelete(LocalDateTime beforeTime) {
        // 1. 先备份
        String backupFile = backupToFile(beforeTime);
        log.info("备份完成:{}", backupFile);
        
        // 2. 再删除
        int count = logMapper.deleteByCreateTime(beforeTime);
        log.info("删除{}条数据", count);
        
        // 3. 验证
        if (!verifyBackup(backupFile, count)) {
            throw new RuntimeException("备份验证失败,回滚删除");
        }
    }
    
    private String backupToFile(LocalDateTime beforeTime) {
        List<Log> logs = logMapper.selectBeforeTime(beforeTime);
        
        String fileName = String.format("backup_%s.json", 
            LocalDate.now()
        );
        
        FileUtil.writeString(
            JSON.toJSONString(logs), 
            new File("/backup/" + fileName), 
            StandardCharsets.UTF_8
        );
        
        return fileName;
    }
}

2. 避免高峰期清理

@Component
public class SmartScheduler {
    
    /**
     * 智能调度(避开业务高峰)
     */
    @Scheduled(cron = "0 */10 * * * ?")  // 每10分钟检查一次
    public void smartClean() {
        // 获取当前QPS
        long currentQps = metricsService.getCurrentQps();
        
        // 只在QPS < 100时执行清理
        if (currentQps < 100) {
            log.info("当前QPS:{},开始清理", currentQps);
            dataCleanService.clean();
        } else {
            log.info("当前QPS:{},跳过清理", currentQps);
        }
    }
}

3. 限速删除

@Service
public class RateLimitedCleanService {
    
    // 使用Guava限流器
    private final RateLimiter rateLimiter = RateLimiter.create(100);  // 每秒100条
    
    public int deleteWithRateLimit(LocalDateTime beforeTime) {
        int totalDeleted = 0;
        int batchSize = 100;
        
        while (true) {
            // 等待令牌
            rateLimiter.acquire(batchSize);
            
            int deleted = logMapper.deleteBatch(beforeTime, batchSize);
            if (deleted == 0) break;
            
            totalDeleted += deleted;
        }
        
        return totalDeleted;
    }
}

📊 效果对比

清理前 vs 清理后

指标清理前清理后提升
表大小500GB50GB90% ↓
查询时间5秒0.5秒10倍 ⚡
备份时间6小时36分钟10倍 ⚡
存储成本5000元/月500元/月90% ↓

📝 总结

最佳实践 Checklist

  • ✅ 使用分区表(推荐)
  • ✅ 定期归档到冷存储
  • ✅ 分批删除避免锁表
  • ✅ 删除前备份
  • ✅ 避开高峰期
  • ✅ 监控清理效果
  • ✅ 软删除 + 延迟物理删除

方案选型

场景1: 日志数据
  → 分区表 + 定期DROP分区

场景2: 业务数据
  → 归档到历史表 + 软删除

场景3: 大数据量
  → 归档到OSS/S3

场景4: 缓存数据
  → Redis TTL自动过期

让你的数据库始终保持"苗条"! 🎉🎉🎉