Java 反爬虫系统实战:打造多层次防护体系

495 阅读14分钟

每天,成千上万的爬虫在悄无声息地抓取网站数据,不仅占用服务器资源,还会窃取核心业务数据。当你发现服务器 CPU 突然飙升,或者竞争对手不知从何处获得了你的最新产品信息时,构建一套有效的反爬虫系统已成为当务之急。本文将从实战角度出发,手把手教你如何用 Java 构建既能保护数据又能兼顾用户体验的强大防护网。

反爬虫策略全景图

反爬虫系统本质是场持续的技术对抗,需要多层次防御。从实现方式看,可分为以下几类:

下面我们逐层深入各防护策略的 Java 实现细节。

1. 请求频率控制

1.1 内存优化的 IP 限流器

请求频率控制是反爬系统的第一道防线,下面是经过内存优化的 IP 限流实现:

import com.google.common.util.concurrent.RateLimiter;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Component
public class IpRateLimitInterceptor implements HandlerInterceptor {
    // 每个IP每秒允许的请求数
    private static final double PERMITS_PER_SECOND = 10.0;

    // 使用自定义类封装限流器和访问时间,减少Map数量
    private static class RateLimiterWrapper {
        final RateLimiter limiter;
        long lastAccessTime;

        RateLimiterWrapper(double permitsPerSecond) {
            this.limiter = RateLimiter.create(permitsPerSecond);
            this.lastAccessTime = System.currentTimeMillis();
        }

        void updateAccessTime() {
            this.lastAccessTime = System.currentTimeMillis();
        }

        boolean isExpired(long expiryMillis) {
            return System.currentTimeMillis() - lastAccessTime > expiryMillis;
        }
    }

    // 存储IP对应的限流器
    private final Map<String, RateLimiterWrapper> LIMITERS = new ConcurrentHashMap<>();

    // 清理任务线程池
    private final ScheduledExecutorService CLEANER = Executors.newSingleThreadScheduledExecutor();

    public IpRateLimitInterceptor() {
        // 每小时清理一次未使用的限流器,避免内存泄漏
        CLEANER.scheduleAtFixedRate(this::cleanupLimiters, 1, 1, TimeUnit.HOURS);
    }

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        String ip = getClientIp(request);
        if (ip == null || ip.isEmpty()) {
            log.warn("无法获取客户端IP,拒绝请求");
            response.setStatus(HttpServletResponse.SC_FORBIDDEN);
            return false;
        }

        try {
            // 获取或创建限流器
            RateLimiterWrapper wrapper = LIMITERS.computeIfAbsent(
                ip, k -> new RateLimiterWrapper(PERMITS_PER_SECOND));

            // 更新访问时间
            wrapper.updateAccessTime();

            // 尝试获取令牌,如果获取不到则拒绝请求
            if (!wrapper.limiter.tryAcquire()) {
                response.setStatus(HttpServletResponse.SC_TOO_MANY_REQUESTS);
                response.getWriter().write("请求频率过高,请稍后再试");
                log.info("IP: {} 请求频率超限", ip);
                return false;
            }

            return true;
        } catch (Exception e) {
            log.error("IP限流处理异常", e);
            // 发生异常时放行请求,避免因限流组件故障影响正常业务
            return true;
        }
    }

    private String getClientIp(HttpServletRequest request) {
        String ip = null;
        String[] HEADERS = {
            "X-Forwarded-For",
            "Proxy-Client-IP",
            "WL-Proxy-Client-IP",
            "HTTP_X_FORWARDED_FOR",
            "HTTP_CLIENT_IP",
            "REMOTE_ADDR"
        };

        for (String header : HEADERS) {
            String value = request.getHeader(header);
            if (value != null && value.length() > 0 && !"unknown".equalsIgnoreCase(value)) {
                // 处理多级代理情况
                if (value.contains(",")) {
                    // 从右到左解析,获取第一个非unknown的IP
                    String[] ips = value.split(",");
                    for (int i = ips.length - 1; i >= 0; i--) {
                        String ipItem = ips[i].trim();
                        if (!"unknown".equalsIgnoreCase(ipItem)) {
                            return ipItem;
                        }
                    }
                } else {
                    return value;
                }
                break;
            }
        }

        return ip != null ? ip : request.getRemoteAddr();
    }

    private void cleanupLimiters() {
        long expiryMillis = TimeUnit.HOURS.toMillis(1); // 1小时未访问则清理
        int sizeBefore = LIMITERS.size();

        // 清理长时间未使用的限流器
        LIMITERS.entrySet().removeIf(entry -> entry.getValue().isExpired(expiryMillis));

        int sizeAfter = LIMITERS.size();
        if (sizeBefore > sizeAfter) {
            log.info("清理完成,释放限流器: {} -> {}", sizeBefore, sizeAfter);
        }
    }
}

这个优化版本使用自定义的RateLimiterWrapper类同时管理限流器和访问时间,减少内存占用并提高数据一致性。

1.2 增强型滑动窗口限流

针对边界条件优化的 Redis 滑动窗口限流实现:

import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.List;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.connection.RedisConnection;
import org.springframework.data.redis.core.RedisCallback;

@Slf4j
@Service
@RequiredArgsConstructor
public class RedisRateLimiter {

    private final StringRedisTemplate redisTemplate;

    /**
     * 毫秒级精度滑动窗口限流
     * @param key 限流标识,通常是IP
     * @param windowMs 窗口大小(毫秒)
     * @param maxRequests 窗口内最大请求数
     * @return 是否允许请求
     */
    public boolean isAllowed(String key, int windowMs, int maxRequests) {
        final long now = Instant.now().toEpochMilli(); // 毫秒级时间戳
        final String requestKey = "rate_limiter:" + key;

        // 使用pipeline批量操作,减少网络IO
        List<Object> results = redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
            // 1. 添加当前时间戳到有序集合
            connection.zAdd(requestKey.getBytes(), now, String.valueOf(now).getBytes());

            // 2. 移除窗口外的数据
            connection.zRemRangeByScore(requestKey.getBytes(), 0, now - windowMs);

            // 3. 限制集合大小,避免内存无限增长
            connection.zRemRangeByRank(requestKey.getBytes(), 0, -(maxRequests + 1));

            // 4. 获取窗口内的请求数
            connection.zCard(requestKey.getBytes());

            // 5. 设置过期时间,避免内存泄漏
            connection.expire(requestKey.getBytes(), windowMs / 1000 * 2);

            return null;
        });

        // 结果中第四个元素是zCard的返回值,即窗口内的请求数
        Long requestCount = (Long) results.get(3);
        // 处理null值情况(Redis集合为空)
        requestCount = requestCount != null ? requestCount : 0L;

        // 记录限流情况
        if (requestCount > maxRequests) {
            log.debug("限流生效: key={}, 当前请求数={}, 限制={}", key, requestCount, maxRequests);
        }

        return requestCount <= maxRequests;
    }

    /**
     * 针对不同API路径的精细化限流
     * @param ip 客户端IP
     * @param api API路径
     * @param windowMs 窗口大小(毫秒)
     * @param maxRequests 窗口内最大请求数
     * @return 是否允许请求
     */
    public boolean isApiAllowed(String ip, String api, int windowMs, int maxRequests) {
        // 对特定API+IP组合进行限流
        String key = ip + ":" + api.replaceAll("/", "_");
        return isAllowed(key, windowMs, maxRequests);
    }
}

1.3 三种限流算法对比

限流算法像水流控制阀,不同场景下选择合适的算法至关重要:

graph LR
    A[限流算法] --> B[固定窗口]
    A --> C[滑动窗口]
    A --> D[令牌桶算法]

    B --> B1[简单易实现]
    B --> B2[边界存在突发风险]
    B --> B3[适合低精度场景]

    C --> C1[流量平滑限制]
    C --> C2[存储开销较大]
    C --> C3[适合精确控制]

    D --> D1[允许短时突发]
    D --> D2[内存占用小]
    D --> D3[适合大多数Web应用]

打个比方,限流算法就像水管控制:

  • 固定窗口:每分钟只能接 10 桶水,但你可以在最后 10 秒内快速接满 10 桶
  • 滑动窗口:任意连续 60 秒内只能接 10 桶水,更加平滑
  • 令牌桶:有个装水票的桶,每秒加 1 张,最多存 10 张,取水需要票,突发情况可以短时间多取

2. 分布式行为特征识别

2.1 Redis 存储的会话行为分析

会话行为分析的分布式实现,支持集群环境:

@Component
@Slf4j
@RequiredArgsConstructor
public class RedisSessionTracker implements HandlerInterceptor {

    private final StringRedisTemplate redisTemplate;

    // 可疑行为模式定义
    private static final int MIN_SAMPLES = 10;              // 最低样本数
    private static final int FAST_ACCESS_THRESHOLD = 100;   // 快速访问阈值(ms)
    private static final double VARIANCE_THRESHOLD = 0.1;   // 访问间隔方差阈值
    private static final String SESSION_KEY_PREFIX = "session:tracker:";
    private static final int MAX_HISTORY_SIZE = 100;        // 每个会话最多保存的访问记录数

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
        try {
            HttpSession session = request.getSession(true);
            String sessionId = session.getId();
            long now = System.currentTimeMillis();
            String historyKey = SESSION_KEY_PREFIX + sessionId;

            // 添加当前访问时间
            redisTemplate.opsForList().rightPush(historyKey, String.valueOf(now));

            // 设置会话数据过期时间(与会话一致)
            redisTemplate.expire(historyKey, 30, TimeUnit.MINUTES);

            // 保持历史记录在合理大小内
            long size = redisTemplate.opsForList().size(historyKey);
            if (size > MAX_HISTORY_SIZE) {
                redisTemplate.opsForList().trim(historyKey, size - MAX_HISTORY_SIZE, -1);
            }

            // 如果样本不足,允许请求
            if (size < MIN_SAMPLES) {
                return true;
            }

            // 检测可疑行为
            if (isSuspiciousPattern(sessionId, historyKey)) {
                log.warn("检测到可疑访问模式: sessionId={}, URI={}", sessionId, request.getRequestURI());
                response.setStatus(HttpServletResponse.SC_TOO_MANY_REQUESTS);
                response.getWriter().write("检测到异常访问模式");
                return false;
            }

            return true;
        } catch (Exception e) {
            log.error("会话行为分析异常", e);
            // 异常情况下放行请求
            return true;
        }
    }

    /**
     * 分析访问模式是否可疑
     */
    private boolean isSuspiciousPattern(String sessionId, String historyKey) {
        // 获取最近的访问历史
        List<String> historyStrings = redisTemplate.opsForList().range(historyKey, -MIN_SAMPLES, -1);
        if (historyStrings == null || historyStrings.size() < MIN_SAMPLES) {
            return false;
        }

        // 转换为时间戳
        List<Long> accessTimes = historyStrings.stream()
                .map(Long::parseLong)
                .sorted()
                .collect(Collectors.toList());

        // 计算访问间隔
        List<Long> intervals = new ArrayList<>();
        for (int i = 1; i < accessTimes.size(); i++) {
            intervals.add(accessTimes.get(i) - accessTimes.get(i-1));
        }

        // 检查1: 是否存在过多快速连续访问
        long fastAccessCount = intervals.stream()
                .filter(interval -> interval < FAST_ACCESS_THRESHOLD)
                .count();

        if (fastAccessCount >= MIN_SAMPLES / 2) {
            log.debug("会话{}检测到快速连续访问: {}/{}", sessionId, fastAccessCount, intervals.size());
            return true;
        }

        // 检查2: 访问间隔是否过于规律(机器人特征)
        double mean = intervals.stream().mapToLong(Long::longValue).average().orElse(0);
        double variance = intervals.stream()
                .mapToDouble(interval -> Math.pow(interval - mean, 2))
                .average().orElse(0);

        // 方差/均值平方 < 阈值,说明间隔非常规律
        if (mean > 0 && variance / (mean * mean) < VARIANCE_THRESHOLD) {
            log.debug("会话{}检测到规律访问模式: 均值={}, 方差={}", sessionId, mean, variance);
            return true;
        }

        return false;
    }
}

行为分析就像侦探破案:

  • 爬虫通常按固定间隔请求(比如每 200ms 一次)
  • 正常用户点击有随机性(快慢不一,有停顿思考)
  • 通过统计时间间隔的方差,能区分人和机器

2.2 高级蜜罐链接技术

增强版蜜罐链接实现,新增 Referer 验证和频率分析:

@Controller
@Slf4j
@RequiredArgsConstructor
public class HoneypotController {

    private final StringRedisTemplate redisTemplate;

    /**
     * 正常产品页面,动态注入隐藏链接
     */
    @GetMapping("/products")
    public String productsPage(Model model, HttpServletRequest request) {
        // 生成唯一的蜜罐路径
        String honeypotPath = generateHoneypotPath();

        // 将路径存入Redis,记录合法生成的蜜罐链接
        String clientIp = getClientIp(request);
        redisTemplate.opsForSet().add("honeypot:valid:" + clientIp, honeypotPath);
        redisTemplate.expire("honeypot:valid:" + clientIp, 30, TimeUnit.MINUTES);

        // 传递到页面中,由JavaScript动态创建隐藏链接
        model.addAttribute("honeypotLink", "/hidden-resource/" + honeypotPath);

        return "products";
    }

    /**
     * 蜜罐端点 - 只有爬虫会访问
     */
    @GetMapping("/hidden-resource/{path}")
    @ResponseBody
    public ResponseEntity<?> honeypot(
            @PathVariable String path,
            HttpServletRequest request) {

        String clientIp = getClientIp(request);
        String userAgent = request.getHeader("User-Agent");
        String referer = request.getHeader("Referer");

        // 检查该IP是否曾经获取过这个蜜罐链接
        boolean isValidLink = Boolean.TRUE.equals(
                redisTemplate.opsForSet().isMember("honeypot:valid:" + clientIp, path));

        // 检查Referer是否来自本站
        boolean hasValidReferer = false;
        if (referer != null && (referer.startsWith("https://yourdomain.com") ||
                                referer.startsWith("http://localhost"))) {
            hasValidReferer = true;
        }

        // 记录访问情况
        log.warn("蜜罐链接访问: IP={}, UA={}, 链接合法={}, Referer有效={}",
                clientIp, userAgent, isValidLink, hasValidReferer);

        // 评估可疑程度
        int suspiciousScore = 0;

        // 1. 无效的蜜罐链接(猜测或扫描)
        if (!isValidLink) {
            suspiciousScore += 30;
        }

        // 2. 无Referer或Referer不是来自本站
        if (!hasValidReferer) {
            suspiciousScore += 20;
        }

        // 3. 频繁访问蜜罐链接
        Long honeypotHits = redisTemplate.opsForValue().increment("honeypot:hits:" + clientIp);
        redisTemplate.expire("honeypot:hits:" + clientIp, 1, TimeUnit.DAYS);

        if (honeypotHits != null && honeypotHits > 3) {
            suspiciousScore += 10 * honeypotHits; // 累加惩罚
        }

        // 标记可疑IP
        markSuspiciousIp(clientIp, "访问蜜罐资源: " + path, suspiciousScore);

        // 故意延迟响应3秒,降低爬虫效率
        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        // 返回看似正常的数据,诱导爬虫继续抓取
        // 里面包含更多的蜜罐链接,构成"蜜罐网络"
        Map<String, Object> result = new HashMap<>();
        result.put("status", "success");
        result.put("totalItems", 128);
        result.put("data", generateFakeData());

        return ResponseEntity.ok(result);
    }

    /**
     * 生成假数据,包含更多蜜罐链接
     */
    private List<Map<String, Object>> generateFakeData() {
        List<Map<String, Object>> items = new ArrayList<>();
        Random random = new Random();

        for (int i = 0; i < 10; i++) {
            Map<String, Object> item = new HashMap<>();
            item.put("id", UUID.randomUUID().toString());
            item.put("name", "Product " + (random.nextInt(1000) + 1));
            item.put("price", random.nextInt(10000) / 100.0);
            item.put("url", "/hidden-resource/" + generateHoneypotPath());
            items.add(item);
        }
        return items;
    }

    /**
     * 将IP标记为可疑
     */
    private void markSuspiciousIp(String ip, String reason, int score) {
        // 记录可疑原因
        String logEntry = System.currentTimeMillis() + ":" + reason + ":" + score;
        redisTemplate.opsForList().rightPush("honeypot:reasons:" + ip, logEntry);
        redisTemplate.expire("honeypot:reasons:" + ip, 24, TimeUnit.HOURS);

        // 更新IP信用分数
        String scoreKey = "ip:score:" + ip;
        Double currentScore = redisTemplate.opsForValue().increment(scoreKey, -score);
        redisTemplate.expire(scoreKey, 7, TimeUnit.DAYS);

        // 分数太低,加入黑名单
        if (currentScore != null && currentScore < -100) {
            redisTemplate.opsForSet().add("blacklist:ip", ip);
            log.warn("IP已加入黑名单: {}, 当前分数: {}", ip, currentScore);
        }
    }

    /**
     * 生成随机蜜罐路径
     */
    private String generateHoneypotPath() {
        return UUID.randomUUID().toString().substring(0, 12);
    }
}

前端 JavaScript,动态创建用户不可见的蜜罐链接:

document.addEventListener('DOMContentLoaded', function() {
    // 获取服务端生成的蜜罐链接
    const honeypotLink = document.getElementById('honeypot-data').getAttribute('data-link');

    // 创建一个用户看不到但爬虫能发现的链接
    const link = document.createElement('a');
    link.href = honeypotLink;
    link.textContent = '特价商品列表';

    // 方法1:放在页面不可见区域
    link.style.position = 'absolute';
    link.style.left = '-9999px';
    link.style.fontSize = '0px';

    // 方法2:对用户不可点击,但爬虫能发现
    link.style.pointerEvents = 'none';
    link.setAttribute('aria-hidden', 'true');

    // 添加到DOM
    document.body.appendChild(link);

    // 补充:在页面源代码里添加注释诱饵,针对分析HTML的爬虫
    document.body.appendChild(document.createComment('隐藏资源链接: ' + honeypotLink));
});

蜜罐链接就像钓鱼:

  • 正常用户看不到这些链接(用 CSS 隐藏或放在看不见的位置)
  • 爬虫会机械地提取所有链接并访问
  • 当检测到访问蜜罐链接,就能确认是爬虫
  • 提供假数据并故意延迟,浪费爬虫资源

3. 高精度客户端特征识别

3.1 高性能 User-Agent 分析

通过缓存优化的 User-Agent 分析,减少重复计算:

import com.blueconic.browscap.BrowsCapField;
import com.blueconic.browscap.Capabilities;
import com.blueconic.browscap.ParseException;
import com.blueconic.browscap.UserAgentParser;
import com.blueconic.browscap.UserAgentService;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;

@Component
@Slf4j
public class UserAgentAnalyzer implements HandlerInterceptor {

    private UserAgentParser parser;
    private final LoadingCache<String, Boolean> uaCache;
    private final ThreadLocal<Boolean> bypassCheck = ThreadLocal.withInitial(() -> false);

    public UserAgentAnalyzer() {
        // 初始化UA缓存,避免重复解析相同UA
        this.uaCache = CacheBuilder.newBuilder()
                .maximumSize(10000)
                .expireAfterWrite(1, TimeUnit.HOURS)
                .build(new CacheLoader<String, Boolean>() {
                    @Override
                    public Boolean load(String userAgent) {
                        return checkIfCrawlerInternal(userAgent);
                    }
                });
    }

    @PostConstruct
    public void init() {
        try {
            parser = new UserAgentService().loadParser();
            log.info("User-Agent解析器初始化成功");
        } catch (IOException | ParseException e) {
            log.error("初始化User-Agent解析器失败,将使用正则表达式备用方案", e);
        }
    }

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
        // 检查是否需要跳过检查(如静态资源)
        if (shouldSkipCheck(request)) {
            return true;
        }

        String userAgent = request.getHeader("User-Agent");

        try {
            // 基本检查:是否提供User-Agent
            if (userAgent == null || userAgent.isEmpty()) {
                log.info("拒绝无User-Agent请求: {}", request.getRequestURI());
                response.setStatus(HttpServletResponse.SC_FORBIDDEN);
                return false;
            }

            // 使用缓存检查是否爬虫
            boolean isCrawler = uaCache.get(userAgent);

            if (isCrawler) {
                log.info("识别到爬虫UA: {}", userAgent);
                response.setStatus(HttpServletResponse.SC_FORBIDDEN);
                return false;
            }

            return true;
        } catch (Exception e) {
            log.error("User-Agent分析异常", e);
            // 异常情况下放行请求
            return true;
        } finally {
            // 清理ThreadLocal
            bypassCheck.remove();
        }
    }

    /**
     * 判断是否应该跳过检查
     */
    private boolean shouldSkipCheck(HttpServletRequest request) {
        String path = request.getRequestURI();
        // 跳过静态资源和白名单路径
        return path.matches(".+\\.(css|js|jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$") ||
               path.startsWith("/public/") ||
               path.startsWith("/captcha/") ||
               bypassCheck.get();
    }

    /**
     * 临时跳过UA检查(用于内部调用)
     */
    public void setBypassCheck(boolean bypass) {
        bypassCheck.set(bypass);
    }

    /**
     * 判断UA是否爬虫
     */
    private boolean checkIfCrawlerInternal(String userAgent) {
        // 检查1:常见爬虫关键词
        String lowerUA = userAgent.toLowerCase();
        if (lowerUA.contains("bot") && !lowerUA.contains("robot")) {
            return true;
        }

        if (lowerUA.contains("crawler") ||
            lowerUA.contains("spider") ||
            lowerUA.contains("scrape") ||
            (lowerUA.contains("http") && lowerUA.contains("client"))) {
            return true;
        }

        // 检查2:使用browscap库深度分析
        if (parser != null) {
            try {
                Capabilities capabilities = parser.parse(userAgent);
                return Boolean.TRUE.toString().equalsIgnoreCase(
                        capabilities.getValue(BrowsCapField.CRAWLER));
            } catch (Exception e) {
                log.warn("解析UA异常: {}", e.getMessage());
            }
        }

        // 检查3:不像浏览器的UA(浏览器UA通常较长且包含版本信息)
        if (userAgent.length() < 40 &&
            !userAgent.contains("Mozilla") &&
            !userAgent.contains("Chrome") &&
            !userAgent.contains("Safari") &&
            !userAgent.contains("Firefox")) {
            return true;
        }

        return false;
    }
}

3.2 异步浏览器指纹采集

优化浏览器指纹采集,降低对用户体验的影响:

@RestController
@RequestMapping("/fingerprint")
@Slf4j
@RequiredArgsConstructor
public class FingerprintController {

    private final RedisTemplate<String, Object> redisTemplate;
    private final ObjectMapper objectMapper;

    /**
     * 生成指纹检测脚本
     */
    @GetMapping("/check.js")
    public ResponseEntity<String> getFingerprintScript(HttpServletRequest request) {
        // 生成唯一会话标识
        String sessionId = UUID.randomUUID().toString();

        // 记录初始信息
        Map<String, Object> initialData = new HashMap<>();
        initialData.put("ip", getClientIp(request));
        initialData.put("ua", request.getHeader("User-Agent"));
        initialData.put("time", System.currentTimeMillis());
        initialData.put("headers", getHeadersMap(request));

        // 存储到Redis
        String cacheKey = "fp:session:" + sessionId;
        redisTemplate.opsForValue().set(cacheKey, initialData, 10, TimeUnit.MINUTES);

        // 返回异步指纹采集脚本
        return ResponseEntity.ok()
                .contentType(MediaType.APPLICATION_JAVASCRIPT)
                .body(FingerprintUtils.generateAsyncFingerprintScript(sessionId));
    }

    /**
     * 验证浏览器指纹
     */
    @PostMapping("/verify")
    public ResponseEntity<?> verifyFingerprint(
            @RequestBody FingerprintVerifyRequest request,
            HttpServletRequest httpRequest) {

        String sessionId = request.getSessionId();
        Map<String, Object> fingerprint = request.getFingerprint();

        try {
            // 获取初始会话数据
            String cacheKey = "fp:session:" + sessionId;
            Object storedData = redisTemplate.opsForValue().get(cacheKey);
            if (storedData == null) {
                return ResponseEntity.status(403).body(Collections.singletonMap(
                    "error", "无效的会话"));
            }

            Map<String, Object> initialData = (Map<String, Object>) storedData;
            String initialIp = (String) initialData.get("ip");
            String initialUa = (String) initialData.get("ua");

            // 当前请求信息
            String currentIp = getClientIp(httpRequest);
            String currentUa = httpRequest.getHeader("User-Agent");

            // 基本一致性检查
            if (!currentIp.equals(initialIp)) {
                log.warn("指纹验证IP不一致: 初始={}, 当前={}", initialIp, currentIp);
                return ResponseEntity.status(403).body(Collections.singletonMap(
                    "error", "客户端IP变化"));
            }

            // 检测爬虫特征
            Map<String, Object> botFeatures = FingerprintUtils.detectBotFeatures(fingerprint);
            if (!botFeatures.isEmpty()) {
                log.warn("检测到爬虫特征: IP={}, 特征={}", currentIp, botFeatures);

                // 记录爬虫特征
                String logKey = "botdetection:log:" + UUID.randomUUID().toString();
                Map<String, Object> logData = new HashMap<>();
                logData.put("ip", currentIp);
                logData.put("ua", currentUa);
                logData.put("time", System.currentTimeMillis());
                logData.put("botFeatures", botFeatures);
                redisTemplate.opsForValue().set(logKey, objectMapper.writeValueAsString(logData), 7, TimeUnit.DAYS);

                // 更新IP评分
                updateIpScore(currentIp, -20);

                return ResponseEntity.status(403).body(Collections.singletonMap(
                    "error", "检测到异常客户端行为"));
            }

            // 生成访问令牌
            String token = generateAccessToken(sessionId, fingerprint);

            // 更新IP评分(奖励正常客户端)
            updateIpScore(currentIp, 5);

            return ResponseEntity.ok(Collections.singletonMap("token", token));
        } catch (Exception e) {
            log.error("指纹验证异常", e);
            return ResponseEntity.status(500).body(Collections.singletonMap(
                "error", "服务器处理异常"));
        }
    }

    /**
     * 更新IP信用评分
     */
    private void updateIpScore(String ip, int scoreDelta) {
        String key = "ip:score:" + ip;
        Double currentScore = redisTemplate.opsForValue().increment(key, scoreDelta);

        // 设置过期时间
        if (redisTemplate.getExpire(key) < 0) {
            redisTemplate.expire(key, 7, TimeUnit.DAYS);
        }

        // 如果分数太低,加入黑名单
        if (currentScore != null && currentScore < 30) {
            redisTemplate.opsForSet().add("blacklist:ip", ip);
            log.warn("IP加入黑名单: {}, 当前分数: {}", ip, currentScore);
        }
    }

    /**
     * 生成访问令牌
     */
    private String generateAccessToken(String sessionId, Map<String, Object> fingerprint) {
        Map<String, Object> tokenData = new HashMap<>();
        tokenData.put("sid", sessionId);
        tokenData.put("fp", FingerprintUtils.generateFingerprintHash(fingerprint));
        tokenData.put("exp", System.currentTimeMillis() + TimeUnit.HOURS.toMillis(24));

        // 生成令牌
        String tokenJson = null;
        try {
            tokenJson = objectMapper.writeValueAsString(tokenData);
        } catch (Exception e) {
            log.error("生成令牌异常", e);
            tokenJson = sessionId;
        }

        String token = Base64.getEncoder().encodeToString(tokenJson.getBytes());

        // 存储令牌
        redisTemplate.opsForValue().set("fp:token:" + token, "1", 24, TimeUnit.HOURS);

        return token;
    }

    // 辅助类
    @Data
    public static class FingerprintVerifyRequest {
        private String sessionId;
        private Map<String, Object> fingerprint;
    }

    /**
     * 获取请求头信息
     */
    private Map<String, String> getHeadersMap(HttpServletRequest request) {
        Map<String, String> headers = new HashMap<>();
        Enumeration<String> headerNames = request.getHeaderNames();
        while (headerNames.hasMoreElements()) {
            String headerName = headerNames.nextElement();
            headers.put(headerName, request.getHeader(headerName));
        }
        return headers;
    }
}

指纹工具类(提取公共逻辑):

@UtilityClass
public class FingerprintUtils {

    /**
     * 生成异步指纹采集脚本
     */
    public String generateAsyncFingerprintScript(String sessionId) {
        return "// 浏览器指纹采集\n" +
               "const _fpSessionId = '" + sessionId + "';\n" +
               "// 延迟执行,不阻塞页面加载\n" +
               "window.addEventListener('load', function() {\n" +
               "  setTimeout(function() {\n" +
               "    collectFingerprint();\n" +
               "  }, 1000);\n" +
               "});\n\n" +
               "function collectFingerprint() {\n" +
               "  // 使用Web Worker执行指纹采集,避免阻塞主线程\n" +
               "  if (window.Worker) {\n" +
               "    const workerCode = `\n" +
               "      ${getFingerprintCollectionCode()}\n" +
               "      collectAndSend();\n" +
               "    `;\n" +
               "    const blob = new Blob([workerCode], {type: 'application/javascript'});\n" +
               "    const worker = new Worker(URL.createObjectURL(blob));\n" +
               "    worker.onmessage = function(e) {\n" +
               "      sendFingerprintData(e.data);\n" +
               "    };\n" +
               "  } else {\n" +
               "    // 降级处理:在主线程采集\n" +
               "    const fingerprint = collectFingerprintData();\n" +
               "    sendFingerprintData(fingerprint);\n" +
               "  }\n" +
               "}\n\n" +
               getFingerprintCollectionCode() +
               "\n" +
               "function sendFingerprintData(fingerprint) {\n" +
               "  fetch('/fingerprint/verify', {\n" +
               "    method: 'POST',\n" +
               "    headers: { 'Content-Type': 'application/json' },\n" +
               "    body: JSON.stringify({ sessionId: _fpSessionId, fingerprint })\n" +
               "  })\n" +
               "  .then(res => res.json())\n" +
               "  .then(data => {\n" +
               "    if (data.token) {\n" +
               "      localStorage.setItem('fpToken', data.token);\n" +
               "      // 触发事件\n" +
               "      document.dispatchEvent(new CustomEvent('fingerprintReady', {detail: data}));\n" +
               "    }\n" +
               "  })\n" +
               "  .catch(err => console.error('指纹验证失败', err));\n" +
               "}";
    }

    /**
     * 获取指纹采集代码
     */
    private String getFingerprintCollectionCode() {
        return "function collectFingerprintData() {\n" +
               "  const fingerprint = {};\n" +
               "  try {\n" +
               "    // 基本信息\n" +
               "    fingerprint.userAgent = navigator.userAgent;\n" +
               "    fingerprint.language = navigator.language;\n" +
               "    fingerprint.languages = Array.from(navigator.languages || []);\n" +
               "    fingerprint.colorDepth = screen.colorDepth;\n" +
               "    fingerprint.screenSize = [screen.width, screen.height];\n" +
               "    fingerprint.availScreenSize = [screen.availWidth, screen.availHeight];\n" +
               "    fingerprint.timezone = new Date().getTimezoneOffset();\n" +
               "    fingerprint.timezoneStr = Intl.DateTimeFormat().resolvedOptions().timeZone;\n" +
               "    fingerprint.platform = navigator.platform;\n" +
               "    fingerprint.doNotTrack = navigator.doNotTrack;\n" +
               "    fingerprint.cookieEnabled = navigator.cookieEnabled;\n" +
               "    fingerprint.localStorage = !!window.localStorage;\n" +
               "    fingerprint.sessionStorage = !!window.sessionStorage;\n" +
               "    fingerprint.cpuCores = navigator.hardwareConcurrency || 0;\n" +
               "    fingerprint.deviceMemory = navigator.deviceMemory || 0;\n" +
               "    fingerprint.touchPoints = navigator.maxTouchPoints || 0;\n\n" +
               "    // 插件检测\n" +
               "    try {\n" +
               "      fingerprint.plugins = Array.from(navigator.plugins || []).map(p => p.name);\n" +
               "    } catch(e) { fingerprint.pluginsError = e.toString(); }\n\n" +
               "    // Canvas指纹\n" +
               "    try {\n" +
               "      const canvas = document.createElement('canvas');\n" +
               "      canvas.width = 200;\n" +
               "      canvas.height = 50;\n" +
               "      const ctx = canvas.getContext('2d');\n" +
               "      if (ctx) {\n" +
               "        ctx.textBaseline = 'top';\n" +
               "        ctx.font = '14px Arial';\n" +
               "        ctx.fillStyle = '#F98B88';\n" +
               "        ctx.fillRect(0, 0, 120, 30);\n" +
               "        ctx.fillStyle = '#424242';\n" +
               "        ctx.fillText('Browser Fingerprint', 4, 12);\n" +
               "        ctx.strokeStyle = '#2196F3';\n" +
               "        ctx.strokeText('Canvas Test', 60, 28);\n" +
               "        fingerprint.canvasHash = canvas.toDataURL().slice(-64);\n" +
               "      } else {\n" +
               "        fingerprint.canvasSupport = false;\n" +
               "      }\n" +
               "    } catch(e) { fingerprint.canvasError = e.toString(); }\n" +
               "  } catch(e) {\n" +
               "    fingerprint.error = e.toString();\n" +
               "  }\n" +
               "  return fingerprint;\n" +
               "}\n\n" +
               "// 在Worker中使用\n" +
               "function collectAndSend() {\n" +
               "  const data = collectFingerprintData();\n" +
               "  postMessage(data);\n" +
               "}";
    }

    /**
     * 检测爬虫特征
     */
    public Map<String, Object> detectBotFeatures(Map<String, Object> fingerprint) {
        Map<String, Object> botFeatures = new HashMap<>();

        // 特征1: Canvas支持
        if (!fingerprint.containsKey("canvasHash") && !fingerprint.containsKey("canvasError")) {
            botFeatures.put("noCanvas", true);
        }

        // 特征2: 插件列表为空(常见爬虫特征)
        List<String> plugins = (List<String>) fingerprint.get("plugins");
        if (plugins != null && plugins.isEmpty() && fingerprint.get("pluginsError") == null) {
            botFeatures.put("emptyPlugins", true);
        }

        // 特征3: 不正常的屏幕尺寸
        List<Integer> screenSize = (List<Integer>) fingerprint.get("screenSize");
        if (screenSize != null) {
            if (screenSize.get(0) < 640 || screenSize.get(1) < 480) {
                botFeatures.put("suspiciousScreenSize", screenSize);
            }
        }

        // 特征4: 操作系统与浏览器不匹配
        String ua = (String) fingerprint.get("userAgent");
        String platform = (String) fingerprint.get("platform");

        if (ua != null && platform != null) {
            if ((ua.contains("Windows") && !platform.contains("Win")) ||
                (ua.contains("Macintosh") && !platform.contains("Mac")) ||
                (ua.contains("Linux") && !platform.contains("Linux"))) {
                botFeatures.put("platformMismatch", true);
            }
        }

        return botFeatures;
    }

    /**
     * 计算指纹哈希值
     */
    public String generateFingerprintHash(Map<String, Object> fingerprint) {
        try {
            // 提取关键特征,减少随机性
            Map<String, Object> keyFeatures = new HashMap<>();
            keyFeatures.put("ua", fingerprint.get("userAgent"));
            keyFeatures.put("screen", fingerprint.get("screenSize"));
            keyFeatures.put("canvas", fingerprint.get("canvasHash"));
            keyFeatures.put("platform", fingerprint.get("platform"));
            keyFeatures.put("language", fingerprint.get("language"));

            String fingerprintStr = new ObjectMapper().writeValueAsString(keyFeatures);
            MessageDigest md = MessageDigest.getInstance("SHA-256");
            byte[] digest = md.digest(fingerprintStr.getBytes(StandardCharsets.UTF_8));
            return Base64.getEncoder().encodeToString(digest);
        } catch (Exception e) {
            return UUID.randomUUID().toString();
        }
    }
}

浏览器指纹的异步采集可以大幅减少对用户体验的影响,并通过 Web Worker 在后台线程执行 CPU 密集型操作。

4. 加密与保护策略

4.1 密钥管理增强的数据加密

使用外部密钥管理的安全加密方案:

@Service
@Slf4j
public class EncryptionService {

    private final KeyVaultClient keyVaultClient;
    private final String keyIdentifier;
    private final SecretKey localKey;
    private static final int GCM_IV_LENGTH = 12;
    private static final int GCM_TAG_LENGTH = 16;

    /**
     * 构造函数,初始化密钥
     * 支持从密钥库(如HashiCorp Vault或AWS KMS)获取密钥
     */
    public EncryptionService(
            @Value("${app.security.key-vault.enabled:false}") boolean keyVaultEnabled,
            @Value("${app.security.key-vault.key-identifier:}") String configKeyIdentifier,
            @Value("${app.security.encryption.local-key:}") String fallbackKeyString,
            @Autowired(required = false) KeyVaultClient keyVaultClientBean) {

        this.keyVaultClient = keyVaultClientBean;
        this.keyIdentifier = configKeyIdentifier;

        // 使用密钥库或本地配置的密钥
        if (keyVaultEnabled && keyVaultClient != null) {
            log.info("使用密钥库作为加密密钥源");
            this.localKey = null;  // 将在需要时从密钥库获取
        } else {
            log.info("使用本地配置的加密密钥");
            if (fallbackKeyString == null || fallbackKeyString.length() < 32) {
                log.warn("本地密钥配置错误,将生成随机密钥(重启后丢失)");
                fallbackKeyString = generateRandomKey(32);
            }

            byte[] keyBytes = fallbackKeyString.substring(0, 32).getBytes(StandardCharsets.UTF_8);
            this.localKey = new SecretKeySpec(keyBytes, "AES");
        }
    }

    /**
     * 加密数据 - 使用AES-GCM模式(提供认证加密)
     */
    public String encrypt(String data) throws Exception {
        if (data == null || data.isEmpty()) {
            return data;
        }

        SecretKey key = getEncryptionKey();
        Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
        byte[] iv = generateRandomIV(GCM_IV_LENGTH);
        GCMParameterSpec parameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);

        cipher.init(Cipher.ENCRYPT_MODE, key, parameterSpec);

        byte[] encryptedData = cipher.doFinal(data.getBytes(StandardCharsets.UTF_8));

        // 组合IV和加密数据
        byte[] combined = new byte[GCM_IV_LENGTH + encryptedData.length];
        System.arraycopy(iv, 0, combined, 0, GCM_IV_LENGTH);
        System.arraycopy(encryptedData, 0, combined, GCM_IV_LENGTH, encryptedData.length);

        return Base64.getEncoder().encodeToString(combined);
    }

    /**
     * 解密数据
     */
    public String decrypt(String encryptedData) throws Exception {
        if (encryptedData == null || encryptedData.isEmpty()) {
            return encryptedData;
        }

        byte[] combined = Base64.getDecoder().decode(encryptedData);
        if (combined.length < GCM_IV_LENGTH) {
            throw new IllegalArgumentException("加密数据格式不正确");
        }

        // 分离IV和加密数据
        byte[] iv = new byte[GCM_IV_LENGTH];
        byte[] encrypted = new byte[combined.length - GCM_IV_LENGTH];
        System.arraycopy(combined, 0, iv, 0, GCM_IV_LENGTH);
        System.arraycopy(combined, GCM_IV_LENGTH, encrypted, 0, encrypted.length);

        SecretKey key = getEncryptionKey();
        Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
        GCMParameterSpec parameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);

        cipher.init(Cipher.DECRYPT_MODE, key, parameterSpec);

        byte[] decrypted = cipher.doFinal(encrypted);
        return new String(decrypted, StandardCharsets.UTF_8);
    }

    /**
     * 获取加密密钥
     * 优先使用密钥库,失败时使用本地密钥
     */
    private SecretKey getEncryptionKey() throws Exception {
        if (keyVaultClient != null) {
            try {
                // 从密钥库获取密钥(示例实现)
                String keyStr = keyVaultClient.getSecret(keyIdentifier);
                byte[] keyBytes = Base64.getDecoder().decode(keyStr);
                return new SecretKeySpec(keyBytes, "AES");
            } catch (Exception e) {
                log.error("从密钥库获取密钥失败,使用本地密钥", e);
                if (localKey == null) {
                    throw new IllegalStateException("密钥获取失败且无本地备用密钥");
                }
            }
        }

        return localKey;
    }

    /**
     * 生成随机IV
     */
    private byte[] generateRandomIV(int length) {
        byte[] iv = new byte[length];
        new SecureRandom().nextBytes(iv);
        return iv;
    }

    /**
     * 生成随机密钥
     */
    private String generateRandomKey(int length) {
        byte[] key = new byte[length];
        new SecureRandom().nextBytes(key);
        return Base64.getEncoder().encodeToString(key);
    }

    /**
     * 密钥库客户端接口
     * 实际使用时可以实现对接AWS KMS、HashiCorp Vault等
     */
    public interface KeyVaultClient {
        String getSecret(String keyIdentifier) throws Exception;
    }
}

这个实现支持从外部密钥管理系统获取密钥,增强了安全性:

  • 密钥不再硬编码在配置文件中
  • 支持密钥轮换(更换密钥)
  • 提供本地密钥作为备份机制
  • 使用 AES-GCM 提供数据完整性校验

4.2 AB 测试敏感的前端保护

带 AB 测试支持的动态数据保护策略:

@RestController
@RequestMapping("/api/products")
@Slf4j
@RequiredArgsConstructor
public class SecureProductController {

    private final ProductService productService;
    private final EncryptionService encryptionService;
    private final StringRedisTemplate redisTemplate;
    private final ABTestManager abTestManager;

    /**
     * 产品列表页面 - 不包含敏感数据
     */
    @GetMapping("")
    public ModelAndView getProductsPage(HttpServletRequest request) {
        // 生成一次性访问令牌
        String accessToken = generateAccessToken(request);

        // 获取AB测试配置
        String userId = getUserId(request);
        DataProtectionLevel protectionLevel = abTestManager.getDataProtectionLevel(userId);

        ModelAndView mav = new ModelAndView("products/list");
        mav.addObject("accessToken", accessToken);
        mav.addObject("protectionLevel", protectionLevel.name());

        // 根据保护级别决定返回数据方式
        if (protectionLevel == DataProtectionLevel.BASIC) {
            // 基础保护:全部数据一次返回
            List<ProductDto> products = productService.getAllProductsWithDetails();
            mav.addObject("products", products);
        } else {
            // 高级保护:仅返回基本信息,敏感数据通过AJAX加载
            List<ProductBasicInfo> basicProducts = productService.getBasicProductInfoList();
            mav.addObject("products", basicProducts);
        }

        return mav;
    }

    /**
     * 获取产品敏感数据(价格、库存等)
     * 用于高级保护模式
     */
    @GetMapping("/data")
    @ResponseBody
    public ResponseEntity<?> getProductsData(
            @RequestParam String productId,
            @RequestParam String token,
            HttpServletRequest request) {

        try {
            // 验证令牌
            if (!isValidToken(token, request)) {
                return ResponseEntity.status(403).body(Collections.singletonMap(
                    "error", "无效的访问令牌"));
            }

            // 验证请求合法性
            if (!validateRequest(request)) {
                log.warn("可疑的产品数据请求: IP={}, 产品={}", getClientIp(request), productId);
                updateRequestMetrics(request, false);
                return ResponseEntity.status(403).body(Collections.singletonMap(
                    "error", "请求已被拒绝"));
            }

            // 获取敏感数据
            ProductSensitiveData productData = productService.getProductSensitiveData(productId);
            if (productData == null) {
                return ResponseEntity.notFound().build();
            }

            // 记录成功请求
            updateRequestMetrics(request, true);

            // 获取用户的保护级别
            String userId = getUserId(request);
            DataProtectionLevel protectionLevel = abTestManager.getDataProtectionLevel(userId);

            // 构建响应
            Map<String, Object> responseData = new HashMap<>();
            responseData.put("id", productId);

            if (protectionLevel == DataProtectionLevel.ADVANCED) {
                // 高级保护:加密敏感数据
                responseData.put("price", encryptionService.encrypt(productData.getPrice().toString()));
                responseData.put("stock", encryptionService.encrypt(productData.getStock().toString()));
                if (productData.getDiscount() != null) {
                    responseData.put("discount", encryptionService.encrypt(productData.getDiscount().toString()));
                }
            } else if (protectionLevel == DataProtectionLevel.MEDIUM) {
                // 中级保护:Base64编码
                responseData.put("price", Base64.getEncoder().encodeToString(
                        productData.getPrice().toString().getBytes()));
                responseData.put("stock", Base64.getEncoder().encodeToString(
                        productData.getStock().toString().getBytes()));
            } else {
                // 基础保护:明文返回
                responseData.put("price", productData.getPrice());
                responseData.put("stock", productData.getStock());
            }

            // 生成新令牌
            String newToken = generateAccessToken(request);
            responseData.put("token", newToken);

            return ResponseEntity.ok(responseData);
        } catch (Exception e) {
            log.error("获取产品数据异常", e);
            return ResponseEntity.status(500).body(Collections.singletonMap(
                "error", "服务器处理异常"));
        }
    }

    /**
     * 验证令牌有效性
     */
    private boolean isValidToken(String token, HttpServletRequest request) {
        String storedIp = redisTemplate.opsForValue().get("access_token:" + token);
        if (storedIp == null) {
            return false;
        }

        String currentIp = getClientIp(request);
        return storedIp.equals(currentIp);
    }

    /**
     * 验证请求合法性
     */
    private boolean validateRequest(HttpServletRequest request) {
        // 检查Referer
        String referer = request.getHeader("Referer");
        if (referer == null || !referer.startsWith("https://yourdomain.com/")) {
            return false;
        }

        // 检查是否AJAX请求
        String requestedWith = request.getHeader("X-Requested-With");
        if (!"XMLHttpRequest".equals(requestedWith)) {
            return false;
        }

        // 检查指纹令牌(如果有)
        String fpToken = request.getHeader("X-FP-Token");
        if (fpToken != null && !fpToken.isEmpty()) {
            return Boolean.TRUE.equals(redisTemplate.hasKey("fp:token:" + fpToken));
        }

        return true;
    }

    /**
     * 更新请求指标,用于AB测试评估
     */
    private void updateRequestMetrics(HttpServletRequest request, boolean success) {
        try {
            String userId = getUserId(request);
            String protectionLevel = abTestManager.getDataProtectionLevel(userId).name();

            // 记录成功/失败次数
            String resultKey = "abtest:result:" + protectionLevel + ":" + (success ? "success" : "fail");
            redisTemplate.opsForValue().increment(resultKey);

            // 记录响应时间
            Long startTime = (Long) request.getAttribute("requestStartTime");
            if (startTime != null) {
                long duration = System.currentTimeMillis() - startTime;
                String timeKey = "abtest:time:" + protectionLevel;
                redisTemplate.opsForList().rightPush(timeKey, String.valueOf(duration));
                // 保持列表大小合理
                if (redisTemplate.opsForList().size(timeKey) > 1000) {
                    redisTemplate.opsForList().trim(timeKey, -1000, -1);
                }
            }
        } catch (Exception e) {
            log.warn("更新AB测试指标异常", e);
        }
    }

    /**
     * 获取用户ID(从Cookie或会话)
     */
    private String getUserId(HttpServletRequest request) {
        // 从Cookie获取用户ID
        Cookie[] cookies = request.getCookies();
        if (cookies != null) {
            for (Cookie cookie : cookies) {
                if ("uid".equals(cookie.getName())) {
                    return cookie.getValue();
                }
            }
        }

        // 从会话获取
        HttpSession session = request.getSession(true);
        String userId = (String) session.getAttribute("userId");
        if (userId == null) {
            userId = UUID.randomUUID().toString();
            session.setAttribute("userId", userId);
        }

        return userId;
    }

    /**
     * 数据保护级别枚举
     */
    public enum DataProtectionLevel {
        BASIC,      // 基础保护:明文数据
        MEDIUM,     // 中级保护:简单编码
        ADVANCED    // 高级保护:加密数据
    }

    /**
     * 生成一次性访问令牌
     */
    private String generateAccessToken(HttpServletRequest request) {
        // 实现省略,与前面相同
        return UUID.randomUUID().toString();
    }
}

配套的 AB 测试管理器:

@Component
@Slf4j
public class ABTestManager {

    private final StringRedisTemplate redisTemplate;
    private final ScheduledExecutorService scheduler;

    // 保护策略分布
    private Map<String, Integer> protectionDistribution = new HashMap<>();

    public ABTestManager(StringRedisTemplate redisTemplate) {
        this.redisTemplate = redisTemplate;
        this.scheduler = Executors.newSingleThreadScheduledExecutor();

        // 初始化默认分布
        protectionDistribution.put(DataProtectionLevel.BASIC.name(), 10);    // 10%
        protectionDistribution.put(DataProtectionLevel.MEDIUM.name(), 30);   // 30%
        protectionDistribution.put(DataProtectionLevel.ADVANCED.name(), 60); // 60%

        // 定期优化分布
        scheduler.scheduleAtFixedRate(this::optimizeDistribution, 1, 12, TimeUnit.HOURS);
    }

    /**
     * 根据用户ID获取数据保护级别
     */
    public DataProtectionLevel getDataProtectionLevel(String userId) {
        // 检查用户是否已分配保护级别
        String assignedLevel = redisTemplate.opsForValue().get("abtest:user:" + userId);

        if (assignedLevel != null) {
            try {
                return DataProtectionLevel.valueOf(assignedLevel);
            } catch (IllegalArgumentException e) {
                log.warn("无效的保护级别: {}", assignedLevel);
            }
        }

        // 新用户随机分配保护级别
        DataProtectionLevel level = assignRandomLevel();
        redisTemplate.opsForValue().set("abtest:user:" + userId, level.name(), 30, TimeUnit.DAYS);

        return level;
    }

    /**
     * 随机分配保护级别,根据当前分布
     */
    private DataProtectionLevel assignRandomLevel() {
        int total = protectionDistribution.values().stream().mapToInt(Integer::intValue).sum();
        int rand = new Random().nextInt(total) + 1;

        int cumulativeSum = 0;
        for (Map.Entry<String, Integer> entry : protectionDistribution.entrySet()) {
            cumulativeSum += entry.getValue();
            if (rand <= cumulativeSum) {
                return DataProtectionLevel.valueOf(entry.getKey());
            }
        }

        // 默认返回最高级别
        return DataProtectionLevel.ADVANCED;
    }

    /**
     * 优化分布 - 基于性能和成功率
     */
    private void optimizeDistribution() {
        try {
            log.info("开始优化AB测试分布...");

            Map<String, Double> successRates = new HashMap<>();
            Map<String, Double> avgResponseTimes = new HashMap<>();

            // 计算各级别的成功率
            for (DataProtectionLevel level : DataProtectionLevel.values()) {
                String levelName = level.name();
                Long success = getLongValue("abtest:result:" + levelName + ":success");
                Long fail = getLongValue("abtest:result:" + levelName + ":fail");

                Double successRate = success.doubleValue() / (success + fail);
                successRates.put(levelName, successRate);

                // 计算平均响应时间
                List<String> timesStr = redisTemplate.opsForList().range("abtest:time:" + levelName, 0, -1);
                if (timesStr != null && !timesStr.isEmpty()) {
                    double avgTime = timesStr.stream()
                            .mapToDouble(Double::parseDouble)
                            .average()
                            .orElse(0);
                    avgResponseTimes.put(levelName, avgTime);
                }
            }

            // 根据成功率和响应时间调整分布
            Map<String, Integer> newDistribution = new HashMap<>();

            // 基础保护:如果高级保护成功率太低或响应时间太长,增加基础保护比例
            double advSuccessRate = successRates.getOrDefault(DataProtectionLevel.ADVANCED.name(), 0.0);
            double advResponseTime = avgResponseTimes.getOrDefault(DataProtectionLevel.ADVANCED.name(), 0.0);

            if (advSuccessRate < 0.95 || advResponseTime > 500) {
                newDistribution.put(DataProtectionLevel.BASIC.name(), 20);
                newDistribution.put(DataProtectionLevel.MEDIUM.name(), 30);
                newDistribution.put(DataProtectionLevel.ADVANCED.name(), 50);
                log.info("检测到高级保护问题,增加基础保护比例");
            } else {
                // 保持现有分布或小幅调整
                newDistribution.put(DataProtectionLevel.BASIC.name(), 10);
                newDistribution.put(DataProtectionLevel.MEDIUM.name(), 30);
                newDistribution.put(DataProtectionLevel.ADVANCED.name(), 60);
            }

            // 更新分布
            protectionDistribution = newDistribution;

            log.info("AB测试分布已优化: {}", protectionDistribution);
        } catch (Exception e) {
            log.error("优化AB测试分布异常", e);
        }
    }

    private Long getLongValue(String key) {
        Object value = redisTemplate.opsForValue().get(key);
        if (value == null) {
            return 0L;
        }
        return Long.parseLong(value.toString());
    }
}

AB 测试就像科学实验:

  • 将用户分成不同组,应用不同保护级别
  • 收集每组的数据(如请求成功率、响应时间)
  • 根据数据调整策略,找到安全与用户体验的平衡点
  • 自动化这个过程,系统会逐渐收敛到最优方案

5. 部署与监控

5.1 多层防护架构

5.2 用户反馈监控系统

增加用户反馈功能,优化反爬虫系统精确度:

@RestController
@RequestMapping("/feedback")
@Slf4j
@RequiredArgsConstructor
public class UserFeedbackController {

    private final StringRedisTemplate redisTemplate;
    private final ObjectMapper objectMapper;

    /**
     * 用户提交误判反馈
     */
    @PostMapping("/report-false-positive")
    public ResponseEntity<?> reportFalsePositive(
            @RequestBody FalsePositiveReport report,
            HttpServletRequest request) {

        String clientIp = getClientIp(request);
        String reportId = UUID.randomUUID().toString();

        try {
            // 记录反馈
            Map<String, Object> feedbackData = new HashMap<>();
            feedbackData.put("ip", clientIp);
            feedbackData.put("ua", request.getHeader("User-Agent"));
            feedbackData.put("time", System.currentTimeMillis());
            feedbackData.put("page", report.getPage());
            feedbackData.put("errorType", report.getErrorType());
            feedbackData.put("description", report.getDescription());

            // 存储反馈
            String feedbackJson = objectMapper.writeValueAsString(feedbackData);
            redisTemplate.opsForValue().set("feedback:falsepositive:" + reportId, feedbackJson);

            // 临时改善用户体验
            if ("BLOCKED".equals(report.getErrorType())) {
                // 添加到临时白名单
                redisTemplate.opsForSet().add("whitelist:temp", clientIp);
                redisTemplate.expire("whitelist:temp", 30, TimeUnit.MINUTES);

                // 提高IP信用分
                redisTemplate.opsForValue().increment("ip:score:" + clientIp, 50);
            }

            log.info("收到误判反馈: IP={}, 类型={}, 页面={}", clientIp, report.getErrorType(), report.getPage());

            return ResponseEntity.ok(Collections.singletonMap("reportId", reportId));
        } catch (Exception e) {
            log.error("处理误判反馈异常", e);
            return ResponseEntity.status(500).body(Collections.singletonMap(
                "error", "提交反馈失败"));
        }
    }

    /**
     * 管理API:获取最近的误判反馈
     */
    @GetMapping("/admin/recent-reports")
    public ResponseEntity<?> getRecentReports(
            @RequestHeader("X-Admin-Token") String adminToken) {

        // 验证管理员令牌
        if (!isValidAdminToken(adminToken)) {
            return ResponseEntity.status(403).body(Collections.singletonMap(
                "error", "无权访问"));
        }

        try {
            // 获取最近100条反馈
            Set<String> keys = redisTemplate.keys("feedback:falsepositive:*");
            if (keys == null || keys.isEmpty()) {
                return ResponseEntity.ok(Collections.emptyList());
            }

            List<Map<String, Object>> reports = new ArrayList<>();

            for (String key : keys) {
                String reportJson = redisTemplate.opsForValue().get(key);
                if (reportJson != null) {
                    Map<String, Object> report = objectMapper.readValue(reportJson, Map.class);
                    report.put("reportId", key.substring("feedback:falsepositive:".length()));
                    reports.add(report);
                }
            }

            // 按时间排序
            reports.sort((r1, r2) -> {
                Long t1 = (Long) r1.get("time");
                Long t2 = (Long) r2.get("time");
                return t2.compareTo(t1); // 降序
            });

            // 返回前100条
            return ResponseEntity.ok(reports.stream().limit(100).collect(Collectors.toList()));
        } catch (Exception e) {
            log.error("获取误判反馈异常", e);
            return ResponseEntity.status(500).body(Collections.singletonMap(
                "error", "获取反馈失败"));
        }
    }

    /**
     * 管理API:处理误判反馈
     */
    @PostMapping("/admin/process-report")
    public ResponseEntity<?> processReport(
            @RequestBody ProcessReportRequest request,
            @RequestHeader("X-Admin-Token") String adminToken) {

        // 验证管理员令牌
        if (!isValidAdminToken(adminToken)) {
            return ResponseEntity.status(403).body(Collections.singletonMap(
                "error", "无权访问"));
        }

        try {
            String reportId = request.getReportId();
            String reportKey = "feedback:falsepositive:" + reportId;

            // 获取反馈数据
            String reportJson = redisTemplate.opsForValue().get(reportKey);
            if (reportJson == null) {
                return ResponseEntity.notFound().build();
            }

            Map<String, Object> reportData = objectMapper.readValue(reportJson, Map.class);
            String ip = (String) reportData.get("ip");
            String errorType = (String) reportData.get("errorType");

            // 根据处理决定调整系统
            if ("APPROVE".equals(request.getAction())) {
                // 确认是误判,调整系统参数

                // 1. 添加到白名单(如果是封禁误判)
                if ("BLOCKED".equals(errorType)) {
                    redisTemplate.opsForSet().add("whitelist:permanent", ip);
                    redisTemplate.opsForSet().remove("blacklist:ip", ip);
                }

                // 2. 提高IP信用分
                redisTemplate.opsForValue().increment("ip:score:" + ip, 100);

                // 3. 记录误判模式,用于改进系统
                String patternKey = "improvement:patterns:" + errorType.toLowerCase();
                redisTemplate.opsForList().rightPush(patternKey, reportJson);

                log.info("批准误判反馈: 报告ID={}, IP={}, 类型={}", reportId, ip, errorType);
            } else if ("REJECT".equals(request.getAction())) {
                // 拒绝反馈,可能是爬虫尝试规避检测
                log.info("拒绝误判反馈: 报告ID={}, IP={}", reportId, ip);

                // 如果在临时白名单中,移除
                redisTemplate.opsForSet().remove("whitelist:temp", ip);
            }

            // 标记反馈已处理
            reportData.put("processed", true);
            reportData.put("processingAction", request.getAction());
            reportData.put("processingTime", System.currentTimeMillis());
            reportData.put("processingNote", request.getNote());

            redisTemplate.opsForValue().set(reportKey, objectMapper.writeValueAsString(reportData));

            return ResponseEntity.ok(Collections.singletonMap("success", true));
        } catch (Exception e) {
            log.error("处理误判反馈异常", e);
            return ResponseEntity.status(500).body(Collections.singletonMap(
                "error", "处理反馈失败"));
        }
    }

    /**
     * 检查管理员令牌是否有效
     */
    private boolean isValidAdminToken(String adminToken) {
        // 实际应用中应使用更安全的认证机制
        return "valid-admin-token".equals(adminToken);
    }

    /**
     * 获取客户端IP
     */
    private String getClientIp(HttpServletRequest request) {
        // 实现省略,与前面相同
        return request.getRemoteAddr();
    }

    // 请求体类
    @Data
    public static class FalsePositiveReport {
        private String page;          // 发生问题的页面
        private String errorType;     // 错误类型(BLOCKED, CAPTCHA, SLOWDOWN等)
        private String description;   // 用户描述
    }

    @Data
    public static class ProcessReportRequest {
        private String reportId;      // 反馈ID
        private String action;        // 处理动作(APPROVE或REJECT)
        private String note;          // 处理备注
    }
}

前端误判反馈组件(React):

import React, { useState } from 'react';
import './FeedbackForm.css';

const FalsePositiveReporter = () => {
  const [isOpen, setIsOpen] = useState(false);
  const [errorType, setErrorType] = useState('BLOCKED');
  const [description, setDescription] = useState('');
  const [isSubmitting, setIsSubmitting] = useState(false);
  const [feedbackResult, setFeedbackResult] = useState(null);

  const submitFeedback = async () => {
    setIsSubmitting(true);

    try {
      const response = await fetch('/feedback/report-false-positive', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          page: window.location.href,
          errorType,
          description
        })
      });

      const result = await response.json();

      if (response.ok) {
        setFeedbackResult({
          success: true,
          message: '感谢您的反馈,我们会尽快处理!'
        });

        // 3秒后关闭表单
        setTimeout(() => {
          setIsOpen(false);
          setFeedbackResult(null);
          setDescription('');
        }, 3000);
      } else {
        setFeedbackResult({
          success: false,
          message: result.error || '提交反馈失败,请稍后再试'
        });
      }
    } catch (err) {
      setFeedbackResult({
        success: false,
        message: '提交反馈出现错误,请稍后再试'
      });
    } finally {
      setIsSubmitting(false);
    }
  };

  return (
    <>
      {!isOpen && (
        <button
          className="feedback-button"
          onClick={() => setIsOpen(true)}
        >
          遇到问题?
        </button>
      )}

      {isOpen && (
        <div className="feedback-overlay">
          <div className="feedback-form">
            <h3>反馈网站访问问题</h3>

            {feedbackResult ? (
              <div className={`feedback-result ${feedbackResult.success ? 'success' : 'error'}`}>
                {feedbackResult.message}
              </div>
            ) : (
              <>
                <div className="form-group">
                  <label>您遇到了什么问题?</label>
                  <select
                    value={errorType}
                    onChange={(e) => setErrorType(e.target.value)}
                  >
                    <option value="BLOCKED">无法访问(显示被禁止)</option>
                    <option value="CAPTCHA">频繁显示验证码</option>
                    <option value="SLOWDOWN">网站访问非常缓慢</option>
                    <option value="OTHER">其他问题</option>
                  </select>
                </div>

                <div className="form-group">
                  <label>请简要描述问题</label>
                  <textarea
                    rows="3"
                    value={description}
                    onChange={(e) => setDescription(e.target.value)}
                    placeholder="请描述您遇到的具体情况..."
                  ></textarea>
                </div>

                <div className="form-actions">
                  <button
                    className="cancel-button"
                    onClick={() => setIsOpen(false)}
                  >
                    取消
                  </button>
                  <button
                    className="submit-button"
                    onClick={submitFeedback}
                    disabled={isSubmitting}
                  >
                    {isSubmitting ? '提交中...' : '提交反馈'}
                  </button>
                </div>
              </>
            )}
          </div>
        </div>
      )}
    </>
  );
};

export default FalsePositiveReporter;

通过这个反馈系统:

  • 误判的正常用户可以提交反馈,快速获得访问权限
  • 管理员能看到系统误判情况,调整参数和规则
  • 收集的数据用于持续优化防爬策略,减少对正常用户的影响

总结

策略类型实现方式优点缺点适用场景绕过难度资源消耗
请求频率控制IP 限流、滑动窗口实现简单,有效拦截高频爬虫难以识别分布式爬虫基础防护层★★☆☆☆
行为特征识别会话分析、蜜罐链接主动防御,识别高级爬虫实现复杂,可能误判中等防护需求★★★★☆
客户端特征分析UA 分析、指纹识别标识性强,难以完美模拟复杂爬虫可伪造特征全面防护体系★★★☆☆
验证码机制图形验证码、行为验证有效阻断自动爬虫影响用户体验敏感操作保护★★★★★
内容加密与混淆动态加载、数据加密保护核心数据,提高爬取成本增加服务器负担高价值数据保护★★★★☆
设备指纹识别Canvas 指纹、WebRTC 检测难以完美模拟,准确度高实现复杂,可能误判高级防护需求★★★★★

反爬虫系统需要根据网站特点和数据敏感度选择合适策略。对于大型网站,建议采用分层防护,将多种技术组合使用:

  1. 第一层使用频率控制拦截基础爬虫
  2. 第二层通过行为和客户端特征识别高级爬虫
  3. 第三层保护核心数据,增加爬取成本

通过 AB 测试和用户反馈不断优化系统,在安全防护和用户体验之间找到平衡点。最后,反爬虫系统应该是动态进化的,随着爬虫技术的发展而不断调整策略,持续保护你的网站数据资产。