Java限流架构深度分析：算法、实现与应用场景高并发系统崩溃前，最后的防线是什么？本文深入分析四大限流算法原理与实现，从

在开发高并发系统时，你是否遇到过这样的情况：某个接口突然被大量请求访问，导致系统负载剧增，响应变慢，甚至宕机？这就是为什么我们需要限流技术。本文将深入剖析 Java 中的限流技术，帮助你构建更加健壮的系统。

什么是限流？

简单来说，限流就是控制系统接收请求的速率，就像水龙头控制水流一样。当系统面临过多请求时，限流机制会拒绝或延迟处理部分请求，保证系统的可用性和稳定性。

举个生活中的例子：地铁站在早高峰时会限制进站人数，这就是一种限流措施，目的是保证站内秩序和安全。

graph LR
    A[用户请求] --> B{限流器}
    B -->|允许通过| C[系统处理]
    B -->|拒绝/延迟| D[请求被限制]

为什么需要限流？

没有限流保护的系统就像一个没有门禁的小区，谁都可以随意进出，容易导致以下问题：

系统过载 - CPU、内存等资源耗尽
响应延迟 - 用户体验变差
连锁反应 - 一个服务不可用可能导致整个系统崩溃
恶意攻击 - 如 DDoS 攻击会导致正常用户无法访问

某平台促销活动期间，因为没有合理的限流措施，导致系统在流量高峰期宕机了 3 分钟，直接损失上万。可见限流对于高并发系统有多重要。

常见的限流算法

1. 固定窗口计数器算法

这是最简单的限流算法。原理很简单：在固定时间窗口内（比如 1 分钟），累加访问次数，当达到阈值时，拒绝后续请求。

graph TD
    A[新请求到达] --> B{当前窗口计数 < 阈值?}
    B -->|是| C[允许通过计数器+1]
    B -->|否| D[拒绝请求]
    E[窗口重置] -.-> F[计数器清零]

来看一个简单实现：

public class FixedWindowRateLimiter {
    private final int limit; // 窗口请求上限
    private final long windowSizeInMs; // 窗口大小，单位毫秒
    private long windowStartTime; // 窗口开始时间
    private int count; // 当前窗口计数

    public FixedWindowRateLimiter(int limit, long windowSizeInMs) {
        if (limit <= 0 || windowSizeInMs <= 0) {
            throw new IllegalArgumentException("参数必须为正数");
        }
        this.limit = limit;
        this.windowSizeInMs = windowSizeInMs;
        this.windowStartTime = System.currentTimeMillis();
        this.count = 0;
    }

    public synchronized boolean tryAcquire() {
        long now = System.currentTimeMillis();

        // 检查是否需要重置窗口
        if (now - windowStartTime > windowSizeInMs) {
            count = 0;
            windowStartTime = now;
        }

        if (count < limit) {
            count++;
            return true;
        } else {
            return false;
        }
    }
}

这个算法的缺点是在窗口边界有流量尖刺的风险。比如说，如果限制是每分钟 100 个请求，用户可能在第一分钟的最后 1 秒发送 100 个请求，然后在第二分钟的第一秒再发送 100 个请求，这样在 2 秒内就发送了 200 个请求！

2. 滑动窗口算法

滑动窗口是固定窗口的改进版，它把时间窗口分成更小的时间窗口分片，然后随着时间推移，窗口也会滑动。这样可以避免固定窗口边界的流量尖刺问题。

graph LR
    subgraph "滑动窗口"
    A[分片1] --> B[分片2]
    B --> C[分片3]
    C --> D[分片4]
    end
    E[时间] -.-> F[窗口滑动]
    G[过期分片] -.-> H[新分片]
    I[窗口滑动方向] --> J["→"]

优化后的实现代码，使用 AtomicIntegerArray 提升并发性能：

import java.util.concurrent.atomic.AtomicIntegerArray;
import java.util.concurrent.locks.ReentrantLock;

public class SlidingWindowRateLimiter {
    private final int limit; // 总限制
    private final int windowCount; // 分片数量
    private final long windowSizeInMs; // 总窗口大小
    private final long subWindowSizeInMs; // 分片大小
    private final AtomicIntegerArray subWindowCounts; // 每个分片的计数，支持并发更新
    private long currentWindowStartTime; // 当前窗口开始时间
    private int currentSubWindowIndex; // 当前分片索引
    private final ReentrantLock resetLock = new ReentrantLock(); // 窗口重置锁

    public SlidingWindowRateLimiter(int limit, long windowSizeInMs, int windowCount) {
        if (limit <= 0 || windowSizeInMs <= 0 || windowCount <= 0) {
            throw new IllegalArgumentException("参数必须为正数");
        }
        if (windowCount > 100) {
            throw new IllegalArgumentException("分片数不宜过大，建议不超过100");
        }

        this.limit = limit;
        this.windowSizeInMs = windowSizeInMs;
        this.windowCount = windowCount;
        this.subWindowSizeInMs = windowSizeInMs / windowCount;
        this.subWindowCounts = new AtomicIntegerArray(windowCount);
        this.currentWindowStartTime = System.currentTimeMillis();
        this.currentSubWindowIndex = 0;
    }

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();

        // 计算当前应该在哪个分片
        long elapsed = now - currentWindowStartTime;
        int newSubWindowIndex = (int) (elapsed / subWindowSizeInMs);

        // 需要重置窗口或移动分片
        if (newSubWindowIndex != currentSubWindowIndex || elapsed >= windowSizeInMs) {
            // 使用锁确保只有一个线程进行窗口重置操作
            resetLock.lock();
            try {
                // 双重检查，避免重复计算
                elapsed = now - currentWindowStartTime;
                newSubWindowIndex = (int) (elapsed / subWindowSizeInMs);

                // 如果已经过了一个完整窗口，重置所有计数
                if (elapsed >= windowSizeInMs) {
                    for (int i = 0; i < windowCount; i++) {
                        subWindowCounts.set(i, 0);
                    }
                    currentWindowStartTime = now;
                    currentSubWindowIndex = 0;
                }
                // 否则，清零已经过期的分片
                else if (newSubWindowIndex > currentSubWindowIndex) {
                    for (int i = currentSubWindowIndex + 1; i <= newSubWindowIndex && i < windowCount; i++) {
                        subWindowCounts.set(i % windowCount, 0);
                    }
                    currentSubWindowIndex = newSubWindowIndex;
                }
            } finally {
                resetLock.unlock();
            }
        }

        // 计算当前有效分片内的总请求数（优化：只计算有效分片）
        int totalCount = 0;
        int validWindowsToCheck = Math.min(windowCount, newSubWindowIndex + 1);
        for (int i = 0; i < validWindowsToCheck; i++) {
            int idx = (currentSubWindowIndex - i + windowCount) % windowCount;
            totalCount += subWindowCounts.get(idx);
        }

        // 检查是否超过限制
        if (totalCount < limit) {
            // 原子递增当前分片计数
            subWindowCounts.incrementAndGet(currentSubWindowIndex % windowCount);
            return true;
        } else {
            return false;
        }
    }
}

这个实现通过 AtomicIntegerArray 和细粒度锁，显著提高了并发性能。在高并发场景下，ReentrantLock 仅用于窗口滑动过程，而不影响正常的计数操作。

3. 漏桶算法

想象一个水桶，水（请求）以不确定的速率流入桶中，而桶以固定的速率流出水。当桶满时，新进入的水就会溢出（请求被拒绝）。

graph TD
    A[请求] --> B[(漏桶)]
    B --> C[固定速率处理请求]
    B --> |桶满时|D[请求被拒绝]

代码实现：

public class LeakyBucketRateLimiter {
    private final long capacity; // 桶的容量
    private final double leakRate; // 漏出速率（请求/秒）
    private long water; // 当前水量（请求数）
    private long lastLeakTime; // 上次漏水时间

    public LeakyBucketRateLimiter(long capacity, double leakRate) {
        if (capacity <= 0 || leakRate <= 0) {
            throw new IllegalArgumentException("参数必须为正数");
        }
        if (leakRate > capacity) {
            throw new IllegalArgumentException("漏出速率不能超过桶容量");
        }

        this.capacity = capacity;
        this.leakRate = leakRate;
        this.water = 0;
        this.lastLeakTime = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        // 计算从上次漏水到现在，漏掉了多少水
        long now = System.currentTimeMillis();
        double elapsedSeconds = (now - lastLeakTime) / 1000.0;
        long leakedWater = (long) (elapsedSeconds * leakRate);

        // 更新当前水量和上次漏水时间
        if (leakedWater > 0) {
            water = Math.max(0, water - leakedWater);
            lastLeakTime = now;
        }

        // 检查桶是否已满
        if (water < capacity) {
            water++;
            return true;
        } else {
            return false;
        }
    }
}

漏桶算法的特点是，无论请求流入速率如何变化，流出速率始终保持不变，这对于需要稳定处理速率的场景非常有用。举个例子，想象一个处理支付请求的系统，你希望它以稳定的速度处理订单，即使前端突然涌入大量请求，也不会让后端数据库崩溃。

4. 令牌桶算法

令牌桶算法是目前使用最广泛的限流算法之一。系统会以固定速率往桶中放入令牌，请求需要先获取令牌才能被处理，没拿到令牌的请求要么被拒绝要么等待。

graph LR
    A[令牌生成器] -->|固定速率| B[(令牌桶)]
    B -->|令牌累积不超过容量| B
    C[请求] --> D{获取令牌}
    B -.-> D
    D -->|成功| E[处理请求]
    D -->|失败| F[拒绝/等待]

Java 实现：

public class TokenBucketRateLimiter {
    private final long capacity; // 桶容量
    private final double refillRate; // 令牌填充速率
    private double availableTokens; // 当前可用令牌数
    private long lastRefillTime; // 上次填充令牌时间

    public TokenBucketRateLimiter(long capacity, double refillRate) {
        if (capacity <= 0 || refillRate <= 0) {
            throw new IllegalArgumentException("参数必须为正数");
        }
        if (refillRate > capacity) {
            throw new IllegalArgumentException("填充速率不宜超过桶容量，否则会导致令牌累积效果丧失");
        }

        this.capacity = capacity;
        this.refillRate = refillRate;
        this.availableTokens = capacity;
        this.lastRefillTime = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        refill();

        if (availableTokens >= 1) {
            availableTokens -= 1;
            return true;
        } else {
            return false;
        }
    }

    private void refill() {
        long now = System.currentTimeMillis();
        double elapsedSeconds = (now - lastRefillTime) / 1000.0;

        // 计算需要填充的令牌数
        double tokensToAdd = elapsedSeconds * refillRate;

        if (tokensToAdd > 0) {
            availableTokens = Math.min(capacity, availableTokens + tokensToAdd);
            lastRefillTime = now;
        }
    }

    // 获取当前令牌填充速率（用于自适应调整）
    public double getRefillRate() {
        return refillRate;
    }

    // 动态设置令牌填充速率（用于自适应调整）
    public void setRefillRate(double newRate) {
        // 先补充当前令牌
        refill();
        // 更新速率
        this.refillRate = Math.max(1.0, newRate); // 确保速率至少为1
    }
}

令牌桶的优点是可以应对流量尖刺，因为令牌可以累积，只要不超过桶的容量。比如说，如果你的网站平时流量很小，令牌会在桶里积累，当突然有大量用户访问时，这些积累的令牌可以一次性处理大量请求，而不是立即拒绝。

在实际项目中应用限流

使用 Guava RateLimiter

Google 的 Guava 库提供了一个实现令牌桶算法的 RateLimiter 类，使用起来非常简单：

import com.google.common.util.concurrent.RateLimiter;

public class GuavaRateLimiterDemo {
    public static void main(String[] args) {
        // 创建一个限流器，每秒允许5个请求
        RateLimiter limiter = RateLimiter.create(5.0);

        // 阻塞方式获取令牌 - 适合任务可以等待的场景
        double waitTime = limiter.acquire();
        System.out.println("等待时间：" + waitTime + "秒");

        // 非阻塞方式获取令牌 - 适合接口调用等需要快速响应的场景
        boolean acquired = limiter.tryAcquire(100, TimeUnit.MILLISECONDS);
        if (acquired) {
            // 处理请求
            System.out.println("获取令牌成功，处理请求");
        } else {
            // 请求被限流，返回友好提示
            System.out.println("获取令牌失败，请求被限流");
        }
    }
}

需要注意的是，acquire()方法会阻塞线程直到获取到令牌，这在高并发场景下可能导致线程池耗尽。想象一下，如果你的服务有 100 个线程，而所有线程都在等待令牌，那么其他请求将无法被处理，即使它们不需要被限流的资源。

对于对响应时间敏感的场景，建议使用非阻塞的tryAcquire()方法，它会在指定时间内尝试获取令牌，获取不到则立即返回 false。

使用 Sentinel 实现接口限流

阿里巴巴开源的 Sentinel 是一个更加强大的流量控制组件，它不仅支持限流，还支持熔断降级、系统自适应保护等功能。

首先，添加 Maven 依赖：

<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-core</artifactId>
    <version>1.8.6</version>
</dependency>

基本使用示例：

import com.alibaba.csp.sentinel.Entry;
import com.alibaba.csp.sentinel.SphU;
import com.alibaba.csp.sentinel.slots.block.BlockException;
import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;

import java.util.ArrayList;
import java.util.List;

public class SentinelDemo {
    public static void main(String[] args) {
        // 配置规则
        initFlowRules();

        // 模拟100个请求
        for (int i = 0; i < 100; i++) {
            Entry entry = null;
            try {
                // 尝试通过Sentinel的流控检查
                entry = SphU.entry("HelloWorld");
                // 执行业务逻辑
                System.out.println("请求通过：" + i);
            } catch (BlockException e) {
                // 被限流了，处理被拒绝的情况
                System.out.println("请求被限流：" + i);
            } finally {
                if (entry != null) {
                    entry.exit();
                }
            }

            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    private static void initFlowRules() {
        List<FlowRule> rules = new ArrayList<>();
        FlowRule rule = new FlowRule();
        rule.setResource("HelloWorld"); // 资源名称
        rule.setGrade(RuleConstant.FLOW_GRADE_QPS); // 限流类型(QPS)
        rule.setCount(5); // 每秒允许5个请求
        rules.add(rule);
        FlowRuleManager.loadRules(rules);
    }
}

Sentinel 热点参数限流

Sentinel 支持针对接口中的热点参数进行限流，比如针对某个商品 ID 限制访问频率：

// 热点参数限流配置
private static void initParamFlowRules() {
    // 创建热点规则
    ParamFlowRule rule = new ParamFlowRule("itemDetail")
        .setParamIdx(0)  // 第一个参数（从0开始）
        .setCount(5);    // 默认限制QPS为5

    // 特例配置：ID为10000的商品QPS限制为10
    ParamFlowItem item = new ParamFlowItem()
        .setObject("10000")  // 参数值
        .setClassType(String.class.getName())  // 参数类型
        .setCount(10);       // 特例QPS阈值
    rule.setParamFlowItemList(Collections.singletonList(item));

    ParamFlowRuleManager.loadRules(Collections.singletonList(rule));
}

使用示例（注意参数索引的风险）：

// 参数索引风险提示：当接口参数顺序变化时，setParamIdx(0)将指向错误的参数
// 解决方案：使用 @SentinelResource 注解指定具体参数名
@GetMapping("/items/{id}")
@SentinelResource(value = "itemDetail", blockHandler = "handleItemBlock",
                  paramIdx = 0) // 通过注解指定参数索引
public Item getItemDetail(@PathVariable("id") String itemId) {
    return itemService.getItemById(itemId);
}

// 限流处理方法
public Item handleItemBlock(String itemId, BlockException e) {
    log.warn("商品详情接口被限流，商品ID: {}", itemId);
    return new Item().setId(itemId).setName("商品信息获取频率过高");
}

这种方式很适合电商平台的爆款商品防刷，不会因为个别热门商品拖垮整个系统。但需要注意参数索引的变化风险，推荐使用@SentinelResource注解明确指定参数索引，并编写单元测试确保参数变更时及时发现问题。

Sentinel 系统自适应保护

Sentinel 还提供了基于系统负载的自适应限流保护：

private static void initSystemRules() {
    List<SystemRule> rules = new ArrayList<>();
    SystemRule rule = new SystemRule();
    // 设置系统CPU使用率超过70%时触发保护
    rule.setHighestSystemLoad(0.7);
    // 设置系统平均RT超过50ms时触发保护
    rule.setAvgRt(50);
    // 设置入口QPS上限为5000
    rule.setQps(5000);
    // 设置并发线程数超过300时触发保护
    rule.setMaxThread(300);
    rules.add(rule);
    SystemRuleManager.loadRules(rules);
}

这样系统会根据当前负载状态动态调整限流行为，无需手动干预。

分布式限流实现

在分布式环境中，单机限流往往不够，我们需要考虑全局限流。Redis 是实现分布式限流的常用工具。

使用 Redis+Lua 脚本实现的原子操作限流（进行了性能优化）：

import redis.clients.jedis.Jedis;

public class RedisRateLimiter {
    private final Jedis jedis;
    private final String key;
    private final int limit;
    private final int windowSeconds;

    // Lua脚本保证原子性（性能优化版）
    private static final String LIMIT_LUA_SCRIPT =
            "local key = KEYS[1] " +
            "local limit = tonumber(ARGV[1]) " +
            "local window = tonumber(ARGV[2]) " +
            "local current = redis.call('incr', key) " +
            "if current == 1 then " +
            "    redis.call('expire', key, window) " +
            "end " +
            "if current <= limit then " +
            "    return 1 " +
            "else " +
            "    if current == limit + 1 then " +
            "        redis.call('publish', 'rate_limit_channel', key) " +
            "    end " +
            "    return 0 " +
            "end";

    public RedisRateLimiter(Jedis jedis, String key, int limit, int windowSeconds) {
        this.jedis = jedis;
        this.key = key;
        this.limit = limit;
        this.windowSeconds = windowSeconds;
    }

    public boolean tryAcquire() {
        try {
            Long result = (Long) jedis.eval(
                LIMIT_LUA_SCRIPT,
                1,
                key,
                String.valueOf(limit),
                String.valueOf(windowSeconds)
            );
            return result == 1;
        } catch (Exception e) {
            // 发生异常时，为了系统安全，默认拒绝请求
            // 记录异常日志，便于排查Redis连接问题
            System.err.println("Redis限流器异常: " + e.getMessage());
            return false;
        }
    }
}

使用方式：

try (Jedis jedis = new Jedis("localhost", 6379)) {
    RedisRateLimiter limiter = new RedisRateLimiter(jedis, "rate:limit:api:123", 100, 60);

    if (limiter.tryAcquire()) {
        // 处理请求
    } else {
        // 请求被限流，返回友好提示
    }
}

注意使用try-with-resources确保 Jedis 连接被正确关闭，避免资源泄漏。

分布式滑动窗口实现（性能优化版）

针对不同服务器时钟可能不同步的问题，使用 Redis 的时间戳作为统一标准，并优化存储效率：

public class RedisSlideWindowLimiter {
    private final Jedis jedis;
    private final String key;
    private final int limit;
    private final int windowSeconds;

    // 优化版Lua脚本（避免随机字符串生成，减少存储开销）
    private static final String SLIDING_WINDOW_LUA =
            "local key = KEYS[1] " +
            "local now = tonumber(ARGV[1]) " +
            "local window = tonumber(ARGV[2]) " +
            "local limit = tonumber(ARGV[3]) " +
            // 获取当前窗口的起始时间戳
            "local windowStart = now - window * 1000 " +
            // 移除窗口外的数据
            "redis.call('ZREMRANGEBYSCORE', key, 0, windowStart) " +
            // 获取窗口内的请求数
            "local count = redis.call('ZCARD', key) " +
            "if count < limit then " +
            // 添加当前请求与时间戳（仅使用时间戳作为score，值为空字符串）
            "    redis.call('ZADD', key, now, now) " +
            // 设置过期时间（增加随机时间，避免大量key同时过期）
            "    local expireTime = window + math.random(1, 10) " +
            "    redis.call('EXPIRE', key, expireTime) " +
            "    return 1 " +
            "else " +
            "    return 0 " +
            "end";

    public RedisSlideWindowLimiter(Jedis jedis, String key, int limit, int windowSeconds) {
        this.jedis = jedis;
        this.key = key;
        this.limit = limit;
        this.windowSeconds = windowSeconds;
    }

    public boolean tryAcquire() {
        try {
            // 使用Redis的服务器时间作为标准，避免客户端时钟不一致问题
            long now = Long.parseLong(jedis.time().get(0)) * 1000;

            Long result = (Long) jedis.eval(
                SLIDING_WINDOW_LUA,
                1,
                key,
                String.valueOf(now),
                String.valueOf(windowSeconds),
                String.valueOf(limit)
            );

            return result == 1;
        } catch (Exception e) {
            System.err.println("Redis滑动窗口限流器异常: " + e.getMessage());
            return false;
        }
    }
}

这个优化版本主要有两点：

使用时间戳本身作为 ZSet 的值，避免了额外字符串生成
为每个 key 设置随机过期时间，避免大规模缓存同时过期导致的 Redis 雪崩风险

实际案例分析：接口限流防刷

某电商网站抢购接口被恶意用户使用脚本刷单，导致真实用户无法下单。我们来实现一个分布式环境下的多维度限流方案，并防范 Redis 缓存雪崩风险：

public class DistributedOrderRateLimiter {
    private final JedisPool jedisPool;

    public DistributedOrderRateLimiter(JedisPool jedisPool) {
        this.jedisPool = jedisPool;
    }

    // 限流Lua脚本（包含随机过期时间，避免缓存雪崩）
    private static final String LIMIT_SCRIPT =
            "local key = KEYS[1] " +
            "local limit = tonumber(ARGV[1]) " +
            "local baseExpire = tonumber(ARGV[2]) " +
            "local current = tonumber(redis.call('get', key) or '0') " +
            "if current < limit then " +
            "    redis.call('INCRBY', key, 1) " +
            "    if current == 0 then " +
            // 增加1-5秒随机过期时间，错开缓存失效时间
            "        local expireTime = baseExpire + math.random(1, 5) " +
            "        redis.call('EXPIRE', key, expireTime) " +
            "    end " +
            "    return 1 " +
            "else " +
            "    return 0 " +
            "end";

    public boolean tryAcquire(String ip, Long userId, String itemId) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 1. 全局接口限流：每秒最多1000个下单请求
            String globalKey = "limit:order:global";
            Long globalResult = (Long) jedis.eval(LIMIT_SCRIPT, 1, globalKey, "1000", "1");
            if (globalResult == 0) return false;

            // 2. IP限流：每IP每分钟最多30次请求
            String ipKey = "limit:order:ip:" + ip;
            Long ipResult = (Long) jedis.eval(LIMIT_SCRIPT, 1, ipKey, "30", "60");
            if (ipResult == 0) return false;

            // 3. 用户ID限流：每用户每天最多下10单
            String userKey = "limit:order:user:" + userId;
            Long userResult = (Long) jedis.eval(LIMIT_SCRIPT, 1, userKey, "10", "86400");
            if (userResult == 0) return false;

            // 4. IP+用户ID组合限流：防止同一用户多IP绕过
            String ipUserKey = "limit:order:ipuser:" + ip + ":" + userId;
            Long ipUserResult = (Long) jedis.eval(LIMIT_SCRIPT, 1, ipUserKey, "5", "3600");
            if (ipUserResult == 0) return false;

            // 5. 商品ID限流：热门商品每秒限制100个下单请求
            if (itemId != null) {
                String itemKey = "limit:order:item:" + itemId;
                Long itemResult = (Long) jedis.eval(LIMIT_SCRIPT, 1, itemKey, "100", "1");
                if (itemResult == 0) return false;
            }

            return true;
        } catch (Exception e) {
            System.err.println("分布式限流异常: " + e.getMessage());
            return false;
        }
    }
}

这个实现使用了随机过期时间来避免缓存雪崩风险，这在大规模并发场景尤其重要。想象一下，如果没有随机时间，所有限流器在同一时刻失效，可能导致短时间内大量请求直接通过，对系统造成冲击。

使用示例：

@RestController
public class OrderController {

    @Autowired
    private DistributedOrderRateLimiter rateLimiter;

    @PostMapping("/order/create")
    public Response createOrder(@RequestBody OrderRequest request, HttpServletRequest httpRequest) {
        String ip = getClientIp(httpRequest);
        Long userId = request.getUserId();
        String itemId = request.getItemId();

        if (!rateLimiter.tryAcquire(ip, userId, itemId)) {
            return Response.error("操作太频繁，请稍后再试");
        }

        // 处理正常下单逻辑
        return orderService.createOrder(request);
    }

    private String getClientIp(HttpServletRequest request) {
        // 获取客户端IP的逻辑，注意要考虑代理服务器的情况
        String ipAddress = request.getHeader("X-Forwarded-For");
        if (ipAddress == null || ipAddress.isEmpty() || "unknown".equalsIgnoreCase(ipAddress)) {
            ipAddress = request.getHeader("Proxy-Client-IP");
        }
        if (ipAddress == null || ipAddress.isEmpty() || "unknown".equalsIgnoreCase(ipAddress)) {
            ipAddress = request.getHeader("WL-Proxy-Client-IP");
        }
        if (ipAddress == null || ipAddress.isEmpty() || "unknown".equalsIgnoreCase(ipAddress)) {
            ipAddress = request.getRemoteAddr();
        }

        // 取第一个IP（可能有多个IP，通过逗号分隔）
        if (ipAddress != null && ipAddress.contains(",")) {
            ipAddress = ipAddress.split(",")[0].trim();
        }

        return ipAddress;
    }
}

这个方案采用了多层限流策略，通过 Redis 实现，确保在分布式环境中所有服务器共享同一份计数器，避免单机缓存的统计偏差。

限流与熔断降级的结合

限流解决了"太多请求"的问题，而熔断降级解决了"依赖服务不可用"的问题。将两者结合，可以构建更加健壮的系统：

@RestController
public class ProductController {

    @Autowired
    private ProductService productService;

    @Autowired
    private RedisRateLimiter rateLimiter;

    @GetMapping("/product/{id}")
    @HystrixCommand(fallbackMethod = "getProductFallback",
                   commandProperties = {
                       @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "10"),
                       @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50"),
                       @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds", value = "5000")
                   })
    public Product getProduct(@PathVariable String id) {
        // 先进行限流检查
        if (!rateLimiter.tryAcquire("product:" + id)) {
            throw new TooManyRequestsException("请求过多，请稍后再试");
        }

        // 然后调用服务，如果服务异常会触发熔断
        return productService.getProductById(id);
    }

    // 降级方法
    public Product getProductFallback(String id) {
        // 返回缓存数据或默认商品信息
        return new Product(id, "降级商品", "暂时无法获取详情", 0.0);
    }
}

在这个例子中：

当请求过多时，限流器会拒绝请求
当服务调用失败率高时，熔断器会打开，直接返回降级结果
这样既保护了系统资源，又保证了用户体验

自适应限流实现

自适应限流能根据系统状态动态调整限流阈值，比固定阈值限流更智能。这里提供一个完整实现：

import com.sun.management.OperatingSystemMXBean;
import java.lang.management.ManagementFactory;

public class AdaptiveRateLimiter {
    private final TokenBucketRateLimiter limiter;
    private final AtomicInteger queueSize = new AtomicInteger(0);
    private final int queueThreshold;
    private final OperatingSystemMXBean osBean;

    // 连续超载计数器
    private int consecutiveOverloads = 0;
    // 连续低负载计数器
    private int consecutiveLowLoads = 0;

    public AdaptiveRateLimiter(long initialCapacity, double initialRefillRate, int queueThreshold) {
        this.limiter = new TokenBucketRateLimiter(initialCapacity, initialRefillRate);
        this.queueThreshold = queueThreshold;
        this.osBean = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
    }

    public synchronized boolean tryAcquire() {
        // 检查系统负载
        checkAndAdjustRate();

        queueSize.incrementAndGet();
        boolean acquired = limiter.tryAcquire();
        queueSize.decrementAndGet();

        return acquired;
    }

    private void checkAndAdjustRate() {
        double cpuLoad = getSystemLoad();
        double currentRate = limiter.getRefillRate();

        if (cpuLoad > 0.8 || queueSize.get() > queueThreshold) {
            // 高负载状态
            consecutiveOverloads++;
            consecutiveLowLoads = 0;

            // 连续3次检测到高负载才降低速率，避免频繁调整
            if (consecutiveOverloads >= 3) {
                limiter.setRefillRate(currentRate * 0.8); // 降低20%
                consecutiveOverloads = 0;
            }
        } else if (cpuLoad < 0.5 && queueSize.get() < queueThreshold / 2) {
            // 低负载状态
            consecutiveLowLoads++;
            consecutiveOverloads = 0;

            // 连续3次检测到低负载才提高速率
            if (consecutiveLowLoads >= 3) {
                limiter.setRefillRate(Math.min(currentRate * 1.2, 1000)); // 提高20%，但不超过上限
                consecutiveLowLoads = 0;
            }
        } else {
            // 正常负载，重置计数器
            consecutiveOverloads = 0;
            consecutiveLowLoads = 0;
        }
    }

    // 获取系统CPU负载
    private double getSystemLoad() {
        return osBean.getSystemCpuLoad();
    }
}

这种方式会根据系统 CPU 负载和请求队列长度，动态调整令牌产生速率。通过设置连续检测次数阈值，避免因临时波动导致的频繁调整。这就像你开车时根据前方路况调整车速一样自然，道路拥堵时减速，道路畅通时加速。

如何选择合适的限流算法？

graph TD
    A[选择限流算法] --> B{是否需要处理流量尖刺?}
    B -->|是| C[令牌桶算法]
    B -->|否| D{是否需要平滑处理请求?}
    D -->|是| E[漏桶算法]
    D -->|否| F{是否简单实现为主?}
    F -->|是| G[固定窗口]
    F -->|否| H[滑动窗口]

限流阈值如何确定？

确定限流阈值应考虑以下因素：

压测确定系统承载能力 - 测试系统处理能力上限
关键指标监控 - 监控 CPU 使用率、响应时间(RT)、GC 频率等指标
资源使用率预警 - 当 CPU 达到 70%或 RT 超过预期时动态调整限流阈值
线程池监控 - 线程池队列长度超过阈值时可能需要更严格的限流
循序渐进 - 从保守值开始，逐步调整优化

举个例子，我在一个支付系统中，最初将接口限流设置为每秒 100QPS。但上线后发现这个值偏低，因为：

系统 CPU 使用率只有 30%左右
响应时间很稳定，平均 20ms
线程池队列几乎没有堆积

于是我们逐步将限流值调整到 300QPS，此时系统各项指标仍然稳定，既满足了业务需求，又保留了足够的安全余量。

被限流的请求如何处理？

直接拒绝 - 适合前端可以重试的场景

if (!limiter.tryAcquire()) {
    return Response.status(429).entity("请求过多，请稍后再试").build();
}

排队等待 - 适合异步任务

// 等待最多500ms获取令牌
if (limiter.tryAcquire(500, TimeUnit.MILLISECONDS)) {
    // 处理请求
} else {
    // 超时拒绝
}

降级处理 - 返回缓存数据

if (!limiter.tryAcquire()) {
    // 返回缓存数据
    return cacheService.getLatestData(key);
}

请求匀速 - 将突发请求排队处理

// 将突发请求排队
CompletableFuture.runAsync(() -> {
    limiter.acquire(); // 阻塞等待令牌
    processRequest(request);
});
return Response.accepted().build(); // 立即返回已接受状态

在一个视频转码系统中，我们采用了排队等待策略。当请求超过限制时，不是直接拒绝，而是将任务放入队列，用户收到"已加入处理队列"的反馈。这样既控制了系统负载，又不会让用户感到被拒绝。

总结

限流算法	原理	优点	缺点	分布式支持	典型应用场景	典型开源实现
固定窗口计数器	时间窗口内计数限流	实现简单，内存占用少	窗口边界流量尖刺问题	单机/Redis	简单接口防刷	自定义实现
滑动窗口	细分窗口滑动统计	平滑限流效果，避免流量尖刺	实现复杂度高，内存占用大	单机/Redis	金融交易等高敏感场景	Redis+Lua
漏桶算法	请求匀速处理	处理速率恒定，保护系统	无法应对流量尖刺	单机/自定义	消息队列处理、任务调度	自定义实现
令牌桶算法	按速率生成令牌，请求获取令牌	可以应对流量尖刺	令牌生成和消费的实现复杂	单机/Sentinel	电商抢购、API 网关限流	Guava/Sentinel

限流是高并发系统不可或缺的保护机制，合理使用限流技术可以在保护系统的同时，为用户提供稳定的服务体验。在实际应用中，往往需要根据业务特点选择合适的限流算法，并结合监控系统不断调整优化限流策略。