每天,成千上万的爬虫在悄无声息地抓取网站数据,不仅占用服务器资源,还会窃取核心业务数据。当你发现服务器 CPU 突然飙升,或者竞争对手不知从何处获得了你的最新产品信息时,构建一套有效的反爬虫系统已成为当务之急。本文将从实战角度出发,手把手教你如何用 Java 构建既能保护数据又能兼顾用户体验的强大防护网。
反爬虫策略全景图
反爬虫系统本质是场持续的技术对抗,需要多层次防御。从实现方式看,可分为以下几类:
下面我们逐层深入各防护策略的 Java 实现细节。
1. 请求频率控制
1.1 内存优化的 IP 限流器
请求频率控制是反爬系统的第一道防线,下面是经过内存优化的 IP 限流实现:
import com.google.common.util.concurrent.RateLimiter;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import lombok.extern.slf4j.Slf4j;
@Slf4j
@Component
public class IpRateLimitInterceptor implements HandlerInterceptor {
// 每个IP每秒允许的请求数
private static final double PERMITS_PER_SECOND = 10.0;
// 使用自定义类封装限流器和访问时间,减少Map数量
private static class RateLimiterWrapper {
final RateLimiter limiter;
long lastAccessTime;
RateLimiterWrapper(double permitsPerSecond) {
this.limiter = RateLimiter.create(permitsPerSecond);
this.lastAccessTime = System.currentTimeMillis();
}
void updateAccessTime() {
this.lastAccessTime = System.currentTimeMillis();
}
boolean isExpired(long expiryMillis) {
return System.currentTimeMillis() - lastAccessTime > expiryMillis;
}
}
// 存储IP对应的限流器
private final Map<String, RateLimiterWrapper> LIMITERS = new ConcurrentHashMap<>();
// 清理任务线程池
private final ScheduledExecutorService CLEANER = Executors.newSingleThreadScheduledExecutor();
public IpRateLimitInterceptor() {
// 每小时清理一次未使用的限流器,避免内存泄漏
CLEANER.scheduleAtFixedRate(this::cleanupLimiters, 1, 1, TimeUnit.HOURS);
}
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
String ip = getClientIp(request);
if (ip == null || ip.isEmpty()) {
log.warn("无法获取客户端IP,拒绝请求");
response.setStatus(HttpServletResponse.SC_FORBIDDEN);
return false;
}
try {
// 获取或创建限流器
RateLimiterWrapper wrapper = LIMITERS.computeIfAbsent(
ip, k -> new RateLimiterWrapper(PERMITS_PER_SECOND));
// 更新访问时间
wrapper.updateAccessTime();
// 尝试获取令牌,如果获取不到则拒绝请求
if (!wrapper.limiter.tryAcquire()) {
response.setStatus(HttpServletResponse.SC_TOO_MANY_REQUESTS);
response.getWriter().write("请求频率过高,请稍后再试");
log.info("IP: {} 请求频率超限", ip);
return false;
}
return true;
} catch (Exception e) {
log.error("IP限流处理异常", e);
// 发生异常时放行请求,避免因限流组件故障影响正常业务
return true;
}
}
private String getClientIp(HttpServletRequest request) {
String ip = null;
String[] HEADERS = {
"X-Forwarded-For",
"Proxy-Client-IP",
"WL-Proxy-Client-IP",
"HTTP_X_FORWARDED_FOR",
"HTTP_CLIENT_IP",
"REMOTE_ADDR"
};
for (String header : HEADERS) {
String value = request.getHeader(header);
if (value != null && value.length() > 0 && !"unknown".equalsIgnoreCase(value)) {
// 处理多级代理情况
if (value.contains(",")) {
// 从右到左解析,获取第一个非unknown的IP
String[] ips = value.split(",");
for (int i = ips.length - 1; i >= 0; i--) {
String ipItem = ips[i].trim();
if (!"unknown".equalsIgnoreCase(ipItem)) {
return ipItem;
}
}
} else {
return value;
}
break;
}
}
return ip != null ? ip : request.getRemoteAddr();
}
private void cleanupLimiters() {
long expiryMillis = TimeUnit.HOURS.toMillis(1); // 1小时未访问则清理
int sizeBefore = LIMITERS.size();
// 清理长时间未使用的限流器
LIMITERS.entrySet().removeIf(entry -> entry.getValue().isExpired(expiryMillis));
int sizeAfter = LIMITERS.size();
if (sizeBefore > sizeAfter) {
log.info("清理完成,释放限流器: {} -> {}", sizeBefore, sizeAfter);
}
}
}
这个优化版本使用自定义的RateLimiterWrapper类同时管理限流器和访问时间,减少内存占用并提高数据一致性。
1.2 增强型滑动窗口限流
针对边界条件优化的 Redis 滑动窗口限流实现:
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.List;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.connection.RedisConnection;
import org.springframework.data.redis.core.RedisCallback;
@Slf4j
@Service
@RequiredArgsConstructor
public class RedisRateLimiter {
private final StringRedisTemplate redisTemplate;
/**
* 毫秒级精度滑动窗口限流
* @param key 限流标识,通常是IP
* @param windowMs 窗口大小(毫秒)
* @param maxRequests 窗口内最大请求数
* @return 是否允许请求
*/
public boolean isAllowed(String key, int windowMs, int maxRequests) {
final long now = Instant.now().toEpochMilli(); // 毫秒级时间戳
final String requestKey = "rate_limiter:" + key;
// 使用pipeline批量操作,减少网络IO
List<Object> results = redisTemplate.executePipelined((RedisCallback<Object>) connection -> {
// 1. 添加当前时间戳到有序集合
connection.zAdd(requestKey.getBytes(), now, String.valueOf(now).getBytes());
// 2. 移除窗口外的数据
connection.zRemRangeByScore(requestKey.getBytes(), 0, now - windowMs);
// 3. 限制集合大小,避免内存无限增长
connection.zRemRangeByRank(requestKey.getBytes(), 0, -(maxRequests + 1));
// 4. 获取窗口内的请求数
connection.zCard(requestKey.getBytes());
// 5. 设置过期时间,避免内存泄漏
connection.expire(requestKey.getBytes(), windowMs / 1000 * 2);
return null;
});
// 结果中第四个元素是zCard的返回值,即窗口内的请求数
Long requestCount = (Long) results.get(3);
// 处理null值情况(Redis集合为空)
requestCount = requestCount != null ? requestCount : 0L;
// 记录限流情况
if (requestCount > maxRequests) {
log.debug("限流生效: key={}, 当前请求数={}, 限制={}", key, requestCount, maxRequests);
}
return requestCount <= maxRequests;
}
/**
* 针对不同API路径的精细化限流
* @param ip 客户端IP
* @param api API路径
* @param windowMs 窗口大小(毫秒)
* @param maxRequests 窗口内最大请求数
* @return 是否允许请求
*/
public boolean isApiAllowed(String ip, String api, int windowMs, int maxRequests) {
// 对特定API+IP组合进行限流
String key = ip + ":" + api.replaceAll("/", "_");
return isAllowed(key, windowMs, maxRequests);
}
}
1.3 三种限流算法对比
限流算法像水流控制阀,不同场景下选择合适的算法至关重要:
graph LR
A[限流算法] --> B[固定窗口]
A --> C[滑动窗口]
A --> D[令牌桶算法]
B --> B1[简单易实现]
B --> B2[边界存在突发风险]
B --> B3[适合低精度场景]
C --> C1[流量平滑限制]
C --> C2[存储开销较大]
C --> C3[适合精确控制]
D --> D1[允许短时突发]
D --> D2[内存占用小]
D --> D3[适合大多数Web应用]
打个比方,限流算法就像水管控制:
- 固定窗口:每分钟只能接 10 桶水,但你可以在最后 10 秒内快速接满 10 桶
- 滑动窗口:任意连续 60 秒内只能接 10 桶水,更加平滑
- 令牌桶:有个装水票的桶,每秒加 1 张,最多存 10 张,取水需要票,突发情况可以短时间多取
2. 分布式行为特征识别
2.1 Redis 存储的会话行为分析
会话行为分析的分布式实现,支持集群环境:
@Component
@Slf4j
@RequiredArgsConstructor
public class RedisSessionTracker implements HandlerInterceptor {
private final StringRedisTemplate redisTemplate;
// 可疑行为模式定义
private static final int MIN_SAMPLES = 10; // 最低样本数
private static final int FAST_ACCESS_THRESHOLD = 100; // 快速访问阈值(ms)
private static final double VARIANCE_THRESHOLD = 0.1; // 访问间隔方差阈值
private static final String SESSION_KEY_PREFIX = "session:tracker:";
private static final int MAX_HISTORY_SIZE = 100; // 每个会话最多保存的访问记录数
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
try {
HttpSession session = request.getSession(true);
String sessionId = session.getId();
long now = System.currentTimeMillis();
String historyKey = SESSION_KEY_PREFIX + sessionId;
// 添加当前访问时间
redisTemplate.opsForList().rightPush(historyKey, String.valueOf(now));
// 设置会话数据过期时间(与会话一致)
redisTemplate.expire(historyKey, 30, TimeUnit.MINUTES);
// 保持历史记录在合理大小内
long size = redisTemplate.opsForList().size(historyKey);
if (size > MAX_HISTORY_SIZE) {
redisTemplate.opsForList().trim(historyKey, size - MAX_HISTORY_SIZE, -1);
}
// 如果样本不足,允许请求
if (size < MIN_SAMPLES) {
return true;
}
// 检测可疑行为
if (isSuspiciousPattern(sessionId, historyKey)) {
log.warn("检测到可疑访问模式: sessionId={}, URI={}", sessionId, request.getRequestURI());
response.setStatus(HttpServletResponse.SC_TOO_MANY_REQUESTS);
response.getWriter().write("检测到异常访问模式");
return false;
}
return true;
} catch (Exception e) {
log.error("会话行为分析异常", e);
// 异常情况下放行请求
return true;
}
}
/**
* 分析访问模式是否可疑
*/
private boolean isSuspiciousPattern(String sessionId, String historyKey) {
// 获取最近的访问历史
List<String> historyStrings = redisTemplate.opsForList().range(historyKey, -MIN_SAMPLES, -1);
if (historyStrings == null || historyStrings.size() < MIN_SAMPLES) {
return false;
}
// 转换为时间戳
List<Long> accessTimes = historyStrings.stream()
.map(Long::parseLong)
.sorted()
.collect(Collectors.toList());
// 计算访问间隔
List<Long> intervals = new ArrayList<>();
for (int i = 1; i < accessTimes.size(); i++) {
intervals.add(accessTimes.get(i) - accessTimes.get(i-1));
}
// 检查1: 是否存在过多快速连续访问
long fastAccessCount = intervals.stream()
.filter(interval -> interval < FAST_ACCESS_THRESHOLD)
.count();
if (fastAccessCount >= MIN_SAMPLES / 2) {
log.debug("会话{}检测到快速连续访问: {}/{}", sessionId, fastAccessCount, intervals.size());
return true;
}
// 检查2: 访问间隔是否过于规律(机器人特征)
double mean = intervals.stream().mapToLong(Long::longValue).average().orElse(0);
double variance = intervals.stream()
.mapToDouble(interval -> Math.pow(interval - mean, 2))
.average().orElse(0);
// 方差/均值平方 < 阈值,说明间隔非常规律
if (mean > 0 && variance / (mean * mean) < VARIANCE_THRESHOLD) {
log.debug("会话{}检测到规律访问模式: 均值={}, 方差={}", sessionId, mean, variance);
return true;
}
return false;
}
}
行为分析就像侦探破案:
- 爬虫通常按固定间隔请求(比如每 200ms 一次)
- 正常用户点击有随机性(快慢不一,有停顿思考)
- 通过统计时间间隔的方差,能区分人和机器
2.2 高级蜜罐链接技术
增强版蜜罐链接实现,新增 Referer 验证和频率分析:
@Controller
@Slf4j
@RequiredArgsConstructor
public class HoneypotController {
private final StringRedisTemplate redisTemplate;
/**
* 正常产品页面,动态注入隐藏链接
*/
@GetMapping("/products")
public String productsPage(Model model, HttpServletRequest request) {
// 生成唯一的蜜罐路径
String honeypotPath = generateHoneypotPath();
// 将路径存入Redis,记录合法生成的蜜罐链接
String clientIp = getClientIp(request);
redisTemplate.opsForSet().add("honeypot:valid:" + clientIp, honeypotPath);
redisTemplate.expire("honeypot:valid:" + clientIp, 30, TimeUnit.MINUTES);
// 传递到页面中,由JavaScript动态创建隐藏链接
model.addAttribute("honeypotLink", "/hidden-resource/" + honeypotPath);
return "products";
}
/**
* 蜜罐端点 - 只有爬虫会访问
*/
@GetMapping("/hidden-resource/{path}")
@ResponseBody
public ResponseEntity<?> honeypot(
@PathVariable String path,
HttpServletRequest request) {
String clientIp = getClientIp(request);
String userAgent = request.getHeader("User-Agent");
String referer = request.getHeader("Referer");
// 检查该IP是否曾经获取过这个蜜罐链接
boolean isValidLink = Boolean.TRUE.equals(
redisTemplate.opsForSet().isMember("honeypot:valid:" + clientIp, path));
// 检查Referer是否来自本站
boolean hasValidReferer = false;
if (referer != null && (referer.startsWith("https://yourdomain.com") ||
referer.startsWith("http://localhost"))) {
hasValidReferer = true;
}
// 记录访问情况
log.warn("蜜罐链接访问: IP={}, UA={}, 链接合法={}, Referer有效={}",
clientIp, userAgent, isValidLink, hasValidReferer);
// 评估可疑程度
int suspiciousScore = 0;
// 1. 无效的蜜罐链接(猜测或扫描)
if (!isValidLink) {
suspiciousScore += 30;
}
// 2. 无Referer或Referer不是来自本站
if (!hasValidReferer) {
suspiciousScore += 20;
}
// 3. 频繁访问蜜罐链接
Long honeypotHits = redisTemplate.opsForValue().increment("honeypot:hits:" + clientIp);
redisTemplate.expire("honeypot:hits:" + clientIp, 1, TimeUnit.DAYS);
if (honeypotHits != null && honeypotHits > 3) {
suspiciousScore += 10 * honeypotHits; // 累加惩罚
}
// 标记可疑IP
markSuspiciousIp(clientIp, "访问蜜罐资源: " + path, suspiciousScore);
// 故意延迟响应3秒,降低爬虫效率
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// 返回看似正常的数据,诱导爬虫继续抓取
// 里面包含更多的蜜罐链接,构成"蜜罐网络"
Map<String, Object> result = new HashMap<>();
result.put("status", "success");
result.put("totalItems", 128);
result.put("data", generateFakeData());
return ResponseEntity.ok(result);
}
/**
* 生成假数据,包含更多蜜罐链接
*/
private List<Map<String, Object>> generateFakeData() {
List<Map<String, Object>> items = new ArrayList<>();
Random random = new Random();
for (int i = 0; i < 10; i++) {
Map<String, Object> item = new HashMap<>();
item.put("id", UUID.randomUUID().toString());
item.put("name", "Product " + (random.nextInt(1000) + 1));
item.put("price", random.nextInt(10000) / 100.0);
item.put("url", "/hidden-resource/" + generateHoneypotPath());
items.add(item);
}
return items;
}
/**
* 将IP标记为可疑
*/
private void markSuspiciousIp(String ip, String reason, int score) {
// 记录可疑原因
String logEntry = System.currentTimeMillis() + ":" + reason + ":" + score;
redisTemplate.opsForList().rightPush("honeypot:reasons:" + ip, logEntry);
redisTemplate.expire("honeypot:reasons:" + ip, 24, TimeUnit.HOURS);
// 更新IP信用分数
String scoreKey = "ip:score:" + ip;
Double currentScore = redisTemplate.opsForValue().increment(scoreKey, -score);
redisTemplate.expire(scoreKey, 7, TimeUnit.DAYS);
// 分数太低,加入黑名单
if (currentScore != null && currentScore < -100) {
redisTemplate.opsForSet().add("blacklist:ip", ip);
log.warn("IP已加入黑名单: {}, 当前分数: {}", ip, currentScore);
}
}
/**
* 生成随机蜜罐路径
*/
private String generateHoneypotPath() {
return UUID.randomUUID().toString().substring(0, 12);
}
}
前端 JavaScript,动态创建用户不可见的蜜罐链接:
document.addEventListener('DOMContentLoaded', function() {
// 获取服务端生成的蜜罐链接
const honeypotLink = document.getElementById('honeypot-data').getAttribute('data-link');
// 创建一个用户看不到但爬虫能发现的链接
const link = document.createElement('a');
link.href = honeypotLink;
link.textContent = '特价商品列表';
// 方法1:放在页面不可见区域
link.style.position = 'absolute';
link.style.left = '-9999px';
link.style.fontSize = '0px';
// 方法2:对用户不可点击,但爬虫能发现
link.style.pointerEvents = 'none';
link.setAttribute('aria-hidden', 'true');
// 添加到DOM
document.body.appendChild(link);
// 补充:在页面源代码里添加注释诱饵,针对分析HTML的爬虫
document.body.appendChild(document.createComment('隐藏资源链接: ' + honeypotLink));
});
蜜罐链接就像钓鱼:
- 正常用户看不到这些链接(用 CSS 隐藏或放在看不见的位置)
- 爬虫会机械地提取所有链接并访问
- 当检测到访问蜜罐链接,就能确认是爬虫
- 提供假数据并故意延迟,浪费爬虫资源
3. 高精度客户端特征识别
3.1 高性能 User-Agent 分析
通过缓存优化的 User-Agent 分析,减少重复计算:
import com.blueconic.browscap.BrowsCapField;
import com.blueconic.browscap.Capabilities;
import com.blueconic.browscap.ParseException;
import com.blueconic.browscap.UserAgentParser;
import com.blueconic.browscap.UserAgentService;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
@Component
@Slf4j
public class UserAgentAnalyzer implements HandlerInterceptor {
private UserAgentParser parser;
private final LoadingCache<String, Boolean> uaCache;
private final ThreadLocal<Boolean> bypassCheck = ThreadLocal.withInitial(() -> false);
public UserAgentAnalyzer() {
// 初始化UA缓存,避免重复解析相同UA
this.uaCache = CacheBuilder.newBuilder()
.maximumSize(10000)
.expireAfterWrite(1, TimeUnit.HOURS)
.build(new CacheLoader<String, Boolean>() {
@Override
public Boolean load(String userAgent) {
return checkIfCrawlerInternal(userAgent);
}
});
}
@PostConstruct
public void init() {
try {
parser = new UserAgentService().loadParser();
log.info("User-Agent解析器初始化成功");
} catch (IOException | ParseException e) {
log.error("初始化User-Agent解析器失败,将使用正则表达式备用方案", e);
}
}
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
// 检查是否需要跳过检查(如静态资源)
if (shouldSkipCheck(request)) {
return true;
}
String userAgent = request.getHeader("User-Agent");
try {
// 基本检查:是否提供User-Agent
if (userAgent == null || userAgent.isEmpty()) {
log.info("拒绝无User-Agent请求: {}", request.getRequestURI());
response.setStatus(HttpServletResponse.SC_FORBIDDEN);
return false;
}
// 使用缓存检查是否爬虫
boolean isCrawler = uaCache.get(userAgent);
if (isCrawler) {
log.info("识别到爬虫UA: {}", userAgent);
response.setStatus(HttpServletResponse.SC_FORBIDDEN);
return false;
}
return true;
} catch (Exception e) {
log.error("User-Agent分析异常", e);
// 异常情况下放行请求
return true;
} finally {
// 清理ThreadLocal
bypassCheck.remove();
}
}
/**
* 判断是否应该跳过检查
*/
private boolean shouldSkipCheck(HttpServletRequest request) {
String path = request.getRequestURI();
// 跳过静态资源和白名单路径
return path.matches(".+\\.(css|js|jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$") ||
path.startsWith("/public/") ||
path.startsWith("/captcha/") ||
bypassCheck.get();
}
/**
* 临时跳过UA检查(用于内部调用)
*/
public void setBypassCheck(boolean bypass) {
bypassCheck.set(bypass);
}
/**
* 判断UA是否爬虫
*/
private boolean checkIfCrawlerInternal(String userAgent) {
// 检查1:常见爬虫关键词
String lowerUA = userAgent.toLowerCase();
if (lowerUA.contains("bot") && !lowerUA.contains("robot")) {
return true;
}
if (lowerUA.contains("crawler") ||
lowerUA.contains("spider") ||
lowerUA.contains("scrape") ||
(lowerUA.contains("http") && lowerUA.contains("client"))) {
return true;
}
// 检查2:使用browscap库深度分析
if (parser != null) {
try {
Capabilities capabilities = parser.parse(userAgent);
return Boolean.TRUE.toString().equalsIgnoreCase(
capabilities.getValue(BrowsCapField.CRAWLER));
} catch (Exception e) {
log.warn("解析UA异常: {}", e.getMessage());
}
}
// 检查3:不像浏览器的UA(浏览器UA通常较长且包含版本信息)
if (userAgent.length() < 40 &&
!userAgent.contains("Mozilla") &&
!userAgent.contains("Chrome") &&
!userAgent.contains("Safari") &&
!userAgent.contains("Firefox")) {
return true;
}
return false;
}
}
3.2 异步浏览器指纹采集
优化浏览器指纹采集,降低对用户体验的影响:
@RestController
@RequestMapping("/fingerprint")
@Slf4j
@RequiredArgsConstructor
public class FingerprintController {
private final RedisTemplate<String, Object> redisTemplate;
private final ObjectMapper objectMapper;
/**
* 生成指纹检测脚本
*/
@GetMapping("/check.js")
public ResponseEntity<String> getFingerprintScript(HttpServletRequest request) {
// 生成唯一会话标识
String sessionId = UUID.randomUUID().toString();
// 记录初始信息
Map<String, Object> initialData = new HashMap<>();
initialData.put("ip", getClientIp(request));
initialData.put("ua", request.getHeader("User-Agent"));
initialData.put("time", System.currentTimeMillis());
initialData.put("headers", getHeadersMap(request));
// 存储到Redis
String cacheKey = "fp:session:" + sessionId;
redisTemplate.opsForValue().set(cacheKey, initialData, 10, TimeUnit.MINUTES);
// 返回异步指纹采集脚本
return ResponseEntity.ok()
.contentType(MediaType.APPLICATION_JAVASCRIPT)
.body(FingerprintUtils.generateAsyncFingerprintScript(sessionId));
}
/**
* 验证浏览器指纹
*/
@PostMapping("/verify")
public ResponseEntity<?> verifyFingerprint(
@RequestBody FingerprintVerifyRequest request,
HttpServletRequest httpRequest) {
String sessionId = request.getSessionId();
Map<String, Object> fingerprint = request.getFingerprint();
try {
// 获取初始会话数据
String cacheKey = "fp:session:" + sessionId;
Object storedData = redisTemplate.opsForValue().get(cacheKey);
if (storedData == null) {
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "无效的会话"));
}
Map<String, Object> initialData = (Map<String, Object>) storedData;
String initialIp = (String) initialData.get("ip");
String initialUa = (String) initialData.get("ua");
// 当前请求信息
String currentIp = getClientIp(httpRequest);
String currentUa = httpRequest.getHeader("User-Agent");
// 基本一致性检查
if (!currentIp.equals(initialIp)) {
log.warn("指纹验证IP不一致: 初始={}, 当前={}", initialIp, currentIp);
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "客户端IP变化"));
}
// 检测爬虫特征
Map<String, Object> botFeatures = FingerprintUtils.detectBotFeatures(fingerprint);
if (!botFeatures.isEmpty()) {
log.warn("检测到爬虫特征: IP={}, 特征={}", currentIp, botFeatures);
// 记录爬虫特征
String logKey = "botdetection:log:" + UUID.randomUUID().toString();
Map<String, Object> logData = new HashMap<>();
logData.put("ip", currentIp);
logData.put("ua", currentUa);
logData.put("time", System.currentTimeMillis());
logData.put("botFeatures", botFeatures);
redisTemplate.opsForValue().set(logKey, objectMapper.writeValueAsString(logData), 7, TimeUnit.DAYS);
// 更新IP评分
updateIpScore(currentIp, -20);
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "检测到异常客户端行为"));
}
// 生成访问令牌
String token = generateAccessToken(sessionId, fingerprint);
// 更新IP评分(奖励正常客户端)
updateIpScore(currentIp, 5);
return ResponseEntity.ok(Collections.singletonMap("token", token));
} catch (Exception e) {
log.error("指纹验证异常", e);
return ResponseEntity.status(500).body(Collections.singletonMap(
"error", "服务器处理异常"));
}
}
/**
* 更新IP信用评分
*/
private void updateIpScore(String ip, int scoreDelta) {
String key = "ip:score:" + ip;
Double currentScore = redisTemplate.opsForValue().increment(key, scoreDelta);
// 设置过期时间
if (redisTemplate.getExpire(key) < 0) {
redisTemplate.expire(key, 7, TimeUnit.DAYS);
}
// 如果分数太低,加入黑名单
if (currentScore != null && currentScore < 30) {
redisTemplate.opsForSet().add("blacklist:ip", ip);
log.warn("IP加入黑名单: {}, 当前分数: {}", ip, currentScore);
}
}
/**
* 生成访问令牌
*/
private String generateAccessToken(String sessionId, Map<String, Object> fingerprint) {
Map<String, Object> tokenData = new HashMap<>();
tokenData.put("sid", sessionId);
tokenData.put("fp", FingerprintUtils.generateFingerprintHash(fingerprint));
tokenData.put("exp", System.currentTimeMillis() + TimeUnit.HOURS.toMillis(24));
// 生成令牌
String tokenJson = null;
try {
tokenJson = objectMapper.writeValueAsString(tokenData);
} catch (Exception e) {
log.error("生成令牌异常", e);
tokenJson = sessionId;
}
String token = Base64.getEncoder().encodeToString(tokenJson.getBytes());
// 存储令牌
redisTemplate.opsForValue().set("fp:token:" + token, "1", 24, TimeUnit.HOURS);
return token;
}
// 辅助类
@Data
public static class FingerprintVerifyRequest {
private String sessionId;
private Map<String, Object> fingerprint;
}
/**
* 获取请求头信息
*/
private Map<String, String> getHeadersMap(HttpServletRequest request) {
Map<String, String> headers = new HashMap<>();
Enumeration<String> headerNames = request.getHeaderNames();
while (headerNames.hasMoreElements()) {
String headerName = headerNames.nextElement();
headers.put(headerName, request.getHeader(headerName));
}
return headers;
}
}
指纹工具类(提取公共逻辑):
@UtilityClass
public class FingerprintUtils {
/**
* 生成异步指纹采集脚本
*/
public String generateAsyncFingerprintScript(String sessionId) {
return "// 浏览器指纹采集\n" +
"const _fpSessionId = '" + sessionId + "';\n" +
"// 延迟执行,不阻塞页面加载\n" +
"window.addEventListener('load', function() {\n" +
" setTimeout(function() {\n" +
" collectFingerprint();\n" +
" }, 1000);\n" +
"});\n\n" +
"function collectFingerprint() {\n" +
" // 使用Web Worker执行指纹采集,避免阻塞主线程\n" +
" if (window.Worker) {\n" +
" const workerCode = `\n" +
" ${getFingerprintCollectionCode()}\n" +
" collectAndSend();\n" +
" `;\n" +
" const blob = new Blob([workerCode], {type: 'application/javascript'});\n" +
" const worker = new Worker(URL.createObjectURL(blob));\n" +
" worker.onmessage = function(e) {\n" +
" sendFingerprintData(e.data);\n" +
" };\n" +
" } else {\n" +
" // 降级处理:在主线程采集\n" +
" const fingerprint = collectFingerprintData();\n" +
" sendFingerprintData(fingerprint);\n" +
" }\n" +
"}\n\n" +
getFingerprintCollectionCode() +
"\n" +
"function sendFingerprintData(fingerprint) {\n" +
" fetch('/fingerprint/verify', {\n" +
" method: 'POST',\n" +
" headers: { 'Content-Type': 'application/json' },\n" +
" body: JSON.stringify({ sessionId: _fpSessionId, fingerprint })\n" +
" })\n" +
" .then(res => res.json())\n" +
" .then(data => {\n" +
" if (data.token) {\n" +
" localStorage.setItem('fpToken', data.token);\n" +
" // 触发事件\n" +
" document.dispatchEvent(new CustomEvent('fingerprintReady', {detail: data}));\n" +
" }\n" +
" })\n" +
" .catch(err => console.error('指纹验证失败', err));\n" +
"}";
}
/**
* 获取指纹采集代码
*/
private String getFingerprintCollectionCode() {
return "function collectFingerprintData() {\n" +
" const fingerprint = {};\n" +
" try {\n" +
" // 基本信息\n" +
" fingerprint.userAgent = navigator.userAgent;\n" +
" fingerprint.language = navigator.language;\n" +
" fingerprint.languages = Array.from(navigator.languages || []);\n" +
" fingerprint.colorDepth = screen.colorDepth;\n" +
" fingerprint.screenSize = [screen.width, screen.height];\n" +
" fingerprint.availScreenSize = [screen.availWidth, screen.availHeight];\n" +
" fingerprint.timezone = new Date().getTimezoneOffset();\n" +
" fingerprint.timezoneStr = Intl.DateTimeFormat().resolvedOptions().timeZone;\n" +
" fingerprint.platform = navigator.platform;\n" +
" fingerprint.doNotTrack = navigator.doNotTrack;\n" +
" fingerprint.cookieEnabled = navigator.cookieEnabled;\n" +
" fingerprint.localStorage = !!window.localStorage;\n" +
" fingerprint.sessionStorage = !!window.sessionStorage;\n" +
" fingerprint.cpuCores = navigator.hardwareConcurrency || 0;\n" +
" fingerprint.deviceMemory = navigator.deviceMemory || 0;\n" +
" fingerprint.touchPoints = navigator.maxTouchPoints || 0;\n\n" +
" // 插件检测\n" +
" try {\n" +
" fingerprint.plugins = Array.from(navigator.plugins || []).map(p => p.name);\n" +
" } catch(e) { fingerprint.pluginsError = e.toString(); }\n\n" +
" // Canvas指纹\n" +
" try {\n" +
" const canvas = document.createElement('canvas');\n" +
" canvas.width = 200;\n" +
" canvas.height = 50;\n" +
" const ctx = canvas.getContext('2d');\n" +
" if (ctx) {\n" +
" ctx.textBaseline = 'top';\n" +
" ctx.font = '14px Arial';\n" +
" ctx.fillStyle = '#F98B88';\n" +
" ctx.fillRect(0, 0, 120, 30);\n" +
" ctx.fillStyle = '#424242';\n" +
" ctx.fillText('Browser Fingerprint', 4, 12);\n" +
" ctx.strokeStyle = '#2196F3';\n" +
" ctx.strokeText('Canvas Test', 60, 28);\n" +
" fingerprint.canvasHash = canvas.toDataURL().slice(-64);\n" +
" } else {\n" +
" fingerprint.canvasSupport = false;\n" +
" }\n" +
" } catch(e) { fingerprint.canvasError = e.toString(); }\n" +
" } catch(e) {\n" +
" fingerprint.error = e.toString();\n" +
" }\n" +
" return fingerprint;\n" +
"}\n\n" +
"// 在Worker中使用\n" +
"function collectAndSend() {\n" +
" const data = collectFingerprintData();\n" +
" postMessage(data);\n" +
"}";
}
/**
* 检测爬虫特征
*/
public Map<String, Object> detectBotFeatures(Map<String, Object> fingerprint) {
Map<String, Object> botFeatures = new HashMap<>();
// 特征1: Canvas支持
if (!fingerprint.containsKey("canvasHash") && !fingerprint.containsKey("canvasError")) {
botFeatures.put("noCanvas", true);
}
// 特征2: 插件列表为空(常见爬虫特征)
List<String> plugins = (List<String>) fingerprint.get("plugins");
if (plugins != null && plugins.isEmpty() && fingerprint.get("pluginsError") == null) {
botFeatures.put("emptyPlugins", true);
}
// 特征3: 不正常的屏幕尺寸
List<Integer> screenSize = (List<Integer>) fingerprint.get("screenSize");
if (screenSize != null) {
if (screenSize.get(0) < 640 || screenSize.get(1) < 480) {
botFeatures.put("suspiciousScreenSize", screenSize);
}
}
// 特征4: 操作系统与浏览器不匹配
String ua = (String) fingerprint.get("userAgent");
String platform = (String) fingerprint.get("platform");
if (ua != null && platform != null) {
if ((ua.contains("Windows") && !platform.contains("Win")) ||
(ua.contains("Macintosh") && !platform.contains("Mac")) ||
(ua.contains("Linux") && !platform.contains("Linux"))) {
botFeatures.put("platformMismatch", true);
}
}
return botFeatures;
}
/**
* 计算指纹哈希值
*/
public String generateFingerprintHash(Map<String, Object> fingerprint) {
try {
// 提取关键特征,减少随机性
Map<String, Object> keyFeatures = new HashMap<>();
keyFeatures.put("ua", fingerprint.get("userAgent"));
keyFeatures.put("screen", fingerprint.get("screenSize"));
keyFeatures.put("canvas", fingerprint.get("canvasHash"));
keyFeatures.put("platform", fingerprint.get("platform"));
keyFeatures.put("language", fingerprint.get("language"));
String fingerprintStr = new ObjectMapper().writeValueAsString(keyFeatures);
MessageDigest md = MessageDigest.getInstance("SHA-256");
byte[] digest = md.digest(fingerprintStr.getBytes(StandardCharsets.UTF_8));
return Base64.getEncoder().encodeToString(digest);
} catch (Exception e) {
return UUID.randomUUID().toString();
}
}
}
浏览器指纹的异步采集可以大幅减少对用户体验的影响,并通过 Web Worker 在后台线程执行 CPU 密集型操作。
4. 加密与保护策略
4.1 密钥管理增强的数据加密
使用外部密钥管理的安全加密方案:
@Service
@Slf4j
public class EncryptionService {
private final KeyVaultClient keyVaultClient;
private final String keyIdentifier;
private final SecretKey localKey;
private static final int GCM_IV_LENGTH = 12;
private static final int GCM_TAG_LENGTH = 16;
/**
* 构造函数,初始化密钥
* 支持从密钥库(如HashiCorp Vault或AWS KMS)获取密钥
*/
public EncryptionService(
@Value("${app.security.key-vault.enabled:false}") boolean keyVaultEnabled,
@Value("${app.security.key-vault.key-identifier:}") String configKeyIdentifier,
@Value("${app.security.encryption.local-key:}") String fallbackKeyString,
@Autowired(required = false) KeyVaultClient keyVaultClientBean) {
this.keyVaultClient = keyVaultClientBean;
this.keyIdentifier = configKeyIdentifier;
// 使用密钥库或本地配置的密钥
if (keyVaultEnabled && keyVaultClient != null) {
log.info("使用密钥库作为加密密钥源");
this.localKey = null; // 将在需要时从密钥库获取
} else {
log.info("使用本地配置的加密密钥");
if (fallbackKeyString == null || fallbackKeyString.length() < 32) {
log.warn("本地密钥配置错误,将生成随机密钥(重启后丢失)");
fallbackKeyString = generateRandomKey(32);
}
byte[] keyBytes = fallbackKeyString.substring(0, 32).getBytes(StandardCharsets.UTF_8);
this.localKey = new SecretKeySpec(keyBytes, "AES");
}
}
/**
* 加密数据 - 使用AES-GCM模式(提供认证加密)
*/
public String encrypt(String data) throws Exception {
if (data == null || data.isEmpty()) {
return data;
}
SecretKey key = getEncryptionKey();
Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
byte[] iv = generateRandomIV(GCM_IV_LENGTH);
GCMParameterSpec parameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);
cipher.init(Cipher.ENCRYPT_MODE, key, parameterSpec);
byte[] encryptedData = cipher.doFinal(data.getBytes(StandardCharsets.UTF_8));
// 组合IV和加密数据
byte[] combined = new byte[GCM_IV_LENGTH + encryptedData.length];
System.arraycopy(iv, 0, combined, 0, GCM_IV_LENGTH);
System.arraycopy(encryptedData, 0, combined, GCM_IV_LENGTH, encryptedData.length);
return Base64.getEncoder().encodeToString(combined);
}
/**
* 解密数据
*/
public String decrypt(String encryptedData) throws Exception {
if (encryptedData == null || encryptedData.isEmpty()) {
return encryptedData;
}
byte[] combined = Base64.getDecoder().decode(encryptedData);
if (combined.length < GCM_IV_LENGTH) {
throw new IllegalArgumentException("加密数据格式不正确");
}
// 分离IV和加密数据
byte[] iv = new byte[GCM_IV_LENGTH];
byte[] encrypted = new byte[combined.length - GCM_IV_LENGTH];
System.arraycopy(combined, 0, iv, 0, GCM_IV_LENGTH);
System.arraycopy(combined, GCM_IV_LENGTH, encrypted, 0, encrypted.length);
SecretKey key = getEncryptionKey();
Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
GCMParameterSpec parameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);
cipher.init(Cipher.DECRYPT_MODE, key, parameterSpec);
byte[] decrypted = cipher.doFinal(encrypted);
return new String(decrypted, StandardCharsets.UTF_8);
}
/**
* 获取加密密钥
* 优先使用密钥库,失败时使用本地密钥
*/
private SecretKey getEncryptionKey() throws Exception {
if (keyVaultClient != null) {
try {
// 从密钥库获取密钥(示例实现)
String keyStr = keyVaultClient.getSecret(keyIdentifier);
byte[] keyBytes = Base64.getDecoder().decode(keyStr);
return new SecretKeySpec(keyBytes, "AES");
} catch (Exception e) {
log.error("从密钥库获取密钥失败,使用本地密钥", e);
if (localKey == null) {
throw new IllegalStateException("密钥获取失败且无本地备用密钥");
}
}
}
return localKey;
}
/**
* 生成随机IV
*/
private byte[] generateRandomIV(int length) {
byte[] iv = new byte[length];
new SecureRandom().nextBytes(iv);
return iv;
}
/**
* 生成随机密钥
*/
private String generateRandomKey(int length) {
byte[] key = new byte[length];
new SecureRandom().nextBytes(key);
return Base64.getEncoder().encodeToString(key);
}
/**
* 密钥库客户端接口
* 实际使用时可以实现对接AWS KMS、HashiCorp Vault等
*/
public interface KeyVaultClient {
String getSecret(String keyIdentifier) throws Exception;
}
}
这个实现支持从外部密钥管理系统获取密钥,增强了安全性:
- 密钥不再硬编码在配置文件中
- 支持密钥轮换(更换密钥)
- 提供本地密钥作为备份机制
- 使用 AES-GCM 提供数据完整性校验
4.2 AB 测试敏感的前端保护
带 AB 测试支持的动态数据保护策略:
@RestController
@RequestMapping("/api/products")
@Slf4j
@RequiredArgsConstructor
public class SecureProductController {
private final ProductService productService;
private final EncryptionService encryptionService;
private final StringRedisTemplate redisTemplate;
private final ABTestManager abTestManager;
/**
* 产品列表页面 - 不包含敏感数据
*/
@GetMapping("")
public ModelAndView getProductsPage(HttpServletRequest request) {
// 生成一次性访问令牌
String accessToken = generateAccessToken(request);
// 获取AB测试配置
String userId = getUserId(request);
DataProtectionLevel protectionLevel = abTestManager.getDataProtectionLevel(userId);
ModelAndView mav = new ModelAndView("products/list");
mav.addObject("accessToken", accessToken);
mav.addObject("protectionLevel", protectionLevel.name());
// 根据保护级别决定返回数据方式
if (protectionLevel == DataProtectionLevel.BASIC) {
// 基础保护:全部数据一次返回
List<ProductDto> products = productService.getAllProductsWithDetails();
mav.addObject("products", products);
} else {
// 高级保护:仅返回基本信息,敏感数据通过AJAX加载
List<ProductBasicInfo> basicProducts = productService.getBasicProductInfoList();
mav.addObject("products", basicProducts);
}
return mav;
}
/**
* 获取产品敏感数据(价格、库存等)
* 用于高级保护模式
*/
@GetMapping("/data")
@ResponseBody
public ResponseEntity<?> getProductsData(
@RequestParam String productId,
@RequestParam String token,
HttpServletRequest request) {
try {
// 验证令牌
if (!isValidToken(token, request)) {
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "无效的访问令牌"));
}
// 验证请求合法性
if (!validateRequest(request)) {
log.warn("可疑的产品数据请求: IP={}, 产品={}", getClientIp(request), productId);
updateRequestMetrics(request, false);
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "请求已被拒绝"));
}
// 获取敏感数据
ProductSensitiveData productData = productService.getProductSensitiveData(productId);
if (productData == null) {
return ResponseEntity.notFound().build();
}
// 记录成功请求
updateRequestMetrics(request, true);
// 获取用户的保护级别
String userId = getUserId(request);
DataProtectionLevel protectionLevel = abTestManager.getDataProtectionLevel(userId);
// 构建响应
Map<String, Object> responseData = new HashMap<>();
responseData.put("id", productId);
if (protectionLevel == DataProtectionLevel.ADVANCED) {
// 高级保护:加密敏感数据
responseData.put("price", encryptionService.encrypt(productData.getPrice().toString()));
responseData.put("stock", encryptionService.encrypt(productData.getStock().toString()));
if (productData.getDiscount() != null) {
responseData.put("discount", encryptionService.encrypt(productData.getDiscount().toString()));
}
} else if (protectionLevel == DataProtectionLevel.MEDIUM) {
// 中级保护:Base64编码
responseData.put("price", Base64.getEncoder().encodeToString(
productData.getPrice().toString().getBytes()));
responseData.put("stock", Base64.getEncoder().encodeToString(
productData.getStock().toString().getBytes()));
} else {
// 基础保护:明文返回
responseData.put("price", productData.getPrice());
responseData.put("stock", productData.getStock());
}
// 生成新令牌
String newToken = generateAccessToken(request);
responseData.put("token", newToken);
return ResponseEntity.ok(responseData);
} catch (Exception e) {
log.error("获取产品数据异常", e);
return ResponseEntity.status(500).body(Collections.singletonMap(
"error", "服务器处理异常"));
}
}
/**
* 验证令牌有效性
*/
private boolean isValidToken(String token, HttpServletRequest request) {
String storedIp = redisTemplate.opsForValue().get("access_token:" + token);
if (storedIp == null) {
return false;
}
String currentIp = getClientIp(request);
return storedIp.equals(currentIp);
}
/**
* 验证请求合法性
*/
private boolean validateRequest(HttpServletRequest request) {
// 检查Referer
String referer = request.getHeader("Referer");
if (referer == null || !referer.startsWith("https://yourdomain.com/")) {
return false;
}
// 检查是否AJAX请求
String requestedWith = request.getHeader("X-Requested-With");
if (!"XMLHttpRequest".equals(requestedWith)) {
return false;
}
// 检查指纹令牌(如果有)
String fpToken = request.getHeader("X-FP-Token");
if (fpToken != null && !fpToken.isEmpty()) {
return Boolean.TRUE.equals(redisTemplate.hasKey("fp:token:" + fpToken));
}
return true;
}
/**
* 更新请求指标,用于AB测试评估
*/
private void updateRequestMetrics(HttpServletRequest request, boolean success) {
try {
String userId = getUserId(request);
String protectionLevel = abTestManager.getDataProtectionLevel(userId).name();
// 记录成功/失败次数
String resultKey = "abtest:result:" + protectionLevel + ":" + (success ? "success" : "fail");
redisTemplate.opsForValue().increment(resultKey);
// 记录响应时间
Long startTime = (Long) request.getAttribute("requestStartTime");
if (startTime != null) {
long duration = System.currentTimeMillis() - startTime;
String timeKey = "abtest:time:" + protectionLevel;
redisTemplate.opsForList().rightPush(timeKey, String.valueOf(duration));
// 保持列表大小合理
if (redisTemplate.opsForList().size(timeKey) > 1000) {
redisTemplate.opsForList().trim(timeKey, -1000, -1);
}
}
} catch (Exception e) {
log.warn("更新AB测试指标异常", e);
}
}
/**
* 获取用户ID(从Cookie或会话)
*/
private String getUserId(HttpServletRequest request) {
// 从Cookie获取用户ID
Cookie[] cookies = request.getCookies();
if (cookies != null) {
for (Cookie cookie : cookies) {
if ("uid".equals(cookie.getName())) {
return cookie.getValue();
}
}
}
// 从会话获取
HttpSession session = request.getSession(true);
String userId = (String) session.getAttribute("userId");
if (userId == null) {
userId = UUID.randomUUID().toString();
session.setAttribute("userId", userId);
}
return userId;
}
/**
* 数据保护级别枚举
*/
public enum DataProtectionLevel {
BASIC, // 基础保护:明文数据
MEDIUM, // 中级保护:简单编码
ADVANCED // 高级保护:加密数据
}
/**
* 生成一次性访问令牌
*/
private String generateAccessToken(HttpServletRequest request) {
// 实现省略,与前面相同
return UUID.randomUUID().toString();
}
}
配套的 AB 测试管理器:
@Component
@Slf4j
public class ABTestManager {
private final StringRedisTemplate redisTemplate;
private final ScheduledExecutorService scheduler;
// 保护策略分布
private Map<String, Integer> protectionDistribution = new HashMap<>();
public ABTestManager(StringRedisTemplate redisTemplate) {
this.redisTemplate = redisTemplate;
this.scheduler = Executors.newSingleThreadScheduledExecutor();
// 初始化默认分布
protectionDistribution.put(DataProtectionLevel.BASIC.name(), 10); // 10%
protectionDistribution.put(DataProtectionLevel.MEDIUM.name(), 30); // 30%
protectionDistribution.put(DataProtectionLevel.ADVANCED.name(), 60); // 60%
// 定期优化分布
scheduler.scheduleAtFixedRate(this::optimizeDistribution, 1, 12, TimeUnit.HOURS);
}
/**
* 根据用户ID获取数据保护级别
*/
public DataProtectionLevel getDataProtectionLevel(String userId) {
// 检查用户是否已分配保护级别
String assignedLevel = redisTemplate.opsForValue().get("abtest:user:" + userId);
if (assignedLevel != null) {
try {
return DataProtectionLevel.valueOf(assignedLevel);
} catch (IllegalArgumentException e) {
log.warn("无效的保护级别: {}", assignedLevel);
}
}
// 新用户随机分配保护级别
DataProtectionLevel level = assignRandomLevel();
redisTemplate.opsForValue().set("abtest:user:" + userId, level.name(), 30, TimeUnit.DAYS);
return level;
}
/**
* 随机分配保护级别,根据当前分布
*/
private DataProtectionLevel assignRandomLevel() {
int total = protectionDistribution.values().stream().mapToInt(Integer::intValue).sum();
int rand = new Random().nextInt(total) + 1;
int cumulativeSum = 0;
for (Map.Entry<String, Integer> entry : protectionDistribution.entrySet()) {
cumulativeSum += entry.getValue();
if (rand <= cumulativeSum) {
return DataProtectionLevel.valueOf(entry.getKey());
}
}
// 默认返回最高级别
return DataProtectionLevel.ADVANCED;
}
/**
* 优化分布 - 基于性能和成功率
*/
private void optimizeDistribution() {
try {
log.info("开始优化AB测试分布...");
Map<String, Double> successRates = new HashMap<>();
Map<String, Double> avgResponseTimes = new HashMap<>();
// 计算各级别的成功率
for (DataProtectionLevel level : DataProtectionLevel.values()) {
String levelName = level.name();
Long success = getLongValue("abtest:result:" + levelName + ":success");
Long fail = getLongValue("abtest:result:" + levelName + ":fail");
Double successRate = success.doubleValue() / (success + fail);
successRates.put(levelName, successRate);
// 计算平均响应时间
List<String> timesStr = redisTemplate.opsForList().range("abtest:time:" + levelName, 0, -1);
if (timesStr != null && !timesStr.isEmpty()) {
double avgTime = timesStr.stream()
.mapToDouble(Double::parseDouble)
.average()
.orElse(0);
avgResponseTimes.put(levelName, avgTime);
}
}
// 根据成功率和响应时间调整分布
Map<String, Integer> newDistribution = new HashMap<>();
// 基础保护:如果高级保护成功率太低或响应时间太长,增加基础保护比例
double advSuccessRate = successRates.getOrDefault(DataProtectionLevel.ADVANCED.name(), 0.0);
double advResponseTime = avgResponseTimes.getOrDefault(DataProtectionLevel.ADVANCED.name(), 0.0);
if (advSuccessRate < 0.95 || advResponseTime > 500) {
newDistribution.put(DataProtectionLevel.BASIC.name(), 20);
newDistribution.put(DataProtectionLevel.MEDIUM.name(), 30);
newDistribution.put(DataProtectionLevel.ADVANCED.name(), 50);
log.info("检测到高级保护问题,增加基础保护比例");
} else {
// 保持现有分布或小幅调整
newDistribution.put(DataProtectionLevel.BASIC.name(), 10);
newDistribution.put(DataProtectionLevel.MEDIUM.name(), 30);
newDistribution.put(DataProtectionLevel.ADVANCED.name(), 60);
}
// 更新分布
protectionDistribution = newDistribution;
log.info("AB测试分布已优化: {}", protectionDistribution);
} catch (Exception e) {
log.error("优化AB测试分布异常", e);
}
}
private Long getLongValue(String key) {
Object value = redisTemplate.opsForValue().get(key);
if (value == null) {
return 0L;
}
return Long.parseLong(value.toString());
}
}
AB 测试就像科学实验:
- 将用户分成不同组,应用不同保护级别
- 收集每组的数据(如请求成功率、响应时间)
- 根据数据调整策略,找到安全与用户体验的平衡点
- 自动化这个过程,系统会逐渐收敛到最优方案
5. 部署与监控
5.1 多层防护架构
5.2 用户反馈监控系统
增加用户反馈功能,优化反爬虫系统精确度:
@RestController
@RequestMapping("/feedback")
@Slf4j
@RequiredArgsConstructor
public class UserFeedbackController {
private final StringRedisTemplate redisTemplate;
private final ObjectMapper objectMapper;
/**
* 用户提交误判反馈
*/
@PostMapping("/report-false-positive")
public ResponseEntity<?> reportFalsePositive(
@RequestBody FalsePositiveReport report,
HttpServletRequest request) {
String clientIp = getClientIp(request);
String reportId = UUID.randomUUID().toString();
try {
// 记录反馈
Map<String, Object> feedbackData = new HashMap<>();
feedbackData.put("ip", clientIp);
feedbackData.put("ua", request.getHeader("User-Agent"));
feedbackData.put("time", System.currentTimeMillis());
feedbackData.put("page", report.getPage());
feedbackData.put("errorType", report.getErrorType());
feedbackData.put("description", report.getDescription());
// 存储反馈
String feedbackJson = objectMapper.writeValueAsString(feedbackData);
redisTemplate.opsForValue().set("feedback:falsepositive:" + reportId, feedbackJson);
// 临时改善用户体验
if ("BLOCKED".equals(report.getErrorType())) {
// 添加到临时白名单
redisTemplate.opsForSet().add("whitelist:temp", clientIp);
redisTemplate.expire("whitelist:temp", 30, TimeUnit.MINUTES);
// 提高IP信用分
redisTemplate.opsForValue().increment("ip:score:" + clientIp, 50);
}
log.info("收到误判反馈: IP={}, 类型={}, 页面={}", clientIp, report.getErrorType(), report.getPage());
return ResponseEntity.ok(Collections.singletonMap("reportId", reportId));
} catch (Exception e) {
log.error("处理误判反馈异常", e);
return ResponseEntity.status(500).body(Collections.singletonMap(
"error", "提交反馈失败"));
}
}
/**
* 管理API:获取最近的误判反馈
*/
@GetMapping("/admin/recent-reports")
public ResponseEntity<?> getRecentReports(
@RequestHeader("X-Admin-Token") String adminToken) {
// 验证管理员令牌
if (!isValidAdminToken(adminToken)) {
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "无权访问"));
}
try {
// 获取最近100条反馈
Set<String> keys = redisTemplate.keys("feedback:falsepositive:*");
if (keys == null || keys.isEmpty()) {
return ResponseEntity.ok(Collections.emptyList());
}
List<Map<String, Object>> reports = new ArrayList<>();
for (String key : keys) {
String reportJson = redisTemplate.opsForValue().get(key);
if (reportJson != null) {
Map<String, Object> report = objectMapper.readValue(reportJson, Map.class);
report.put("reportId", key.substring("feedback:falsepositive:".length()));
reports.add(report);
}
}
// 按时间排序
reports.sort((r1, r2) -> {
Long t1 = (Long) r1.get("time");
Long t2 = (Long) r2.get("time");
return t2.compareTo(t1); // 降序
});
// 返回前100条
return ResponseEntity.ok(reports.stream().limit(100).collect(Collectors.toList()));
} catch (Exception e) {
log.error("获取误判反馈异常", e);
return ResponseEntity.status(500).body(Collections.singletonMap(
"error", "获取反馈失败"));
}
}
/**
* 管理API:处理误判反馈
*/
@PostMapping("/admin/process-report")
public ResponseEntity<?> processReport(
@RequestBody ProcessReportRequest request,
@RequestHeader("X-Admin-Token") String adminToken) {
// 验证管理员令牌
if (!isValidAdminToken(adminToken)) {
return ResponseEntity.status(403).body(Collections.singletonMap(
"error", "无权访问"));
}
try {
String reportId = request.getReportId();
String reportKey = "feedback:falsepositive:" + reportId;
// 获取反馈数据
String reportJson = redisTemplate.opsForValue().get(reportKey);
if (reportJson == null) {
return ResponseEntity.notFound().build();
}
Map<String, Object> reportData = objectMapper.readValue(reportJson, Map.class);
String ip = (String) reportData.get("ip");
String errorType = (String) reportData.get("errorType");
// 根据处理决定调整系统
if ("APPROVE".equals(request.getAction())) {
// 确认是误判,调整系统参数
// 1. 添加到白名单(如果是封禁误判)
if ("BLOCKED".equals(errorType)) {
redisTemplate.opsForSet().add("whitelist:permanent", ip);
redisTemplate.opsForSet().remove("blacklist:ip", ip);
}
// 2. 提高IP信用分
redisTemplate.opsForValue().increment("ip:score:" + ip, 100);
// 3. 记录误判模式,用于改进系统
String patternKey = "improvement:patterns:" + errorType.toLowerCase();
redisTemplate.opsForList().rightPush(patternKey, reportJson);
log.info("批准误判反馈: 报告ID={}, IP={}, 类型={}", reportId, ip, errorType);
} else if ("REJECT".equals(request.getAction())) {
// 拒绝反馈,可能是爬虫尝试规避检测
log.info("拒绝误判反馈: 报告ID={}, IP={}", reportId, ip);
// 如果在临时白名单中,移除
redisTemplate.opsForSet().remove("whitelist:temp", ip);
}
// 标记反馈已处理
reportData.put("processed", true);
reportData.put("processingAction", request.getAction());
reportData.put("processingTime", System.currentTimeMillis());
reportData.put("processingNote", request.getNote());
redisTemplate.opsForValue().set(reportKey, objectMapper.writeValueAsString(reportData));
return ResponseEntity.ok(Collections.singletonMap("success", true));
} catch (Exception e) {
log.error("处理误判反馈异常", e);
return ResponseEntity.status(500).body(Collections.singletonMap(
"error", "处理反馈失败"));
}
}
/**
* 检查管理员令牌是否有效
*/
private boolean isValidAdminToken(String adminToken) {
// 实际应用中应使用更安全的认证机制
return "valid-admin-token".equals(adminToken);
}
/**
* 获取客户端IP
*/
private String getClientIp(HttpServletRequest request) {
// 实现省略,与前面相同
return request.getRemoteAddr();
}
// 请求体类
@Data
public static class FalsePositiveReport {
private String page; // 发生问题的页面
private String errorType; // 错误类型(BLOCKED, CAPTCHA, SLOWDOWN等)
private String description; // 用户描述
}
@Data
public static class ProcessReportRequest {
private String reportId; // 反馈ID
private String action; // 处理动作(APPROVE或REJECT)
private String note; // 处理备注
}
}
前端误判反馈组件(React):
import React, { useState } from 'react';
import './FeedbackForm.css';
const FalsePositiveReporter = () => {
const [isOpen, setIsOpen] = useState(false);
const [errorType, setErrorType] = useState('BLOCKED');
const [description, setDescription] = useState('');
const [isSubmitting, setIsSubmitting] = useState(false);
const [feedbackResult, setFeedbackResult] = useState(null);
const submitFeedback = async () => {
setIsSubmitting(true);
try {
const response = await fetch('/feedback/report-false-positive', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
page: window.location.href,
errorType,
description
})
});
const result = await response.json();
if (response.ok) {
setFeedbackResult({
success: true,
message: '感谢您的反馈,我们会尽快处理!'
});
// 3秒后关闭表单
setTimeout(() => {
setIsOpen(false);
setFeedbackResult(null);
setDescription('');
}, 3000);
} else {
setFeedbackResult({
success: false,
message: result.error || '提交反馈失败,请稍后再试'
});
}
} catch (err) {
setFeedbackResult({
success: false,
message: '提交反馈出现错误,请稍后再试'
});
} finally {
setIsSubmitting(false);
}
};
return (
<>
{!isOpen && (
<button
className="feedback-button"
onClick={() => setIsOpen(true)}
>
遇到问题?
</button>
)}
{isOpen && (
<div className="feedback-overlay">
<div className="feedback-form">
<h3>反馈网站访问问题</h3>
{feedbackResult ? (
<div className={`feedback-result ${feedbackResult.success ? 'success' : 'error'}`}>
{feedbackResult.message}
</div>
) : (
<>
<div className="form-group">
<label>您遇到了什么问题?</label>
<select
value={errorType}
onChange={(e) => setErrorType(e.target.value)}
>
<option value="BLOCKED">无法访问(显示被禁止)</option>
<option value="CAPTCHA">频繁显示验证码</option>
<option value="SLOWDOWN">网站访问非常缓慢</option>
<option value="OTHER">其他问题</option>
</select>
</div>
<div className="form-group">
<label>请简要描述问题</label>
<textarea
rows="3"
value={description}
onChange={(e) => setDescription(e.target.value)}
placeholder="请描述您遇到的具体情况..."
></textarea>
</div>
<div className="form-actions">
<button
className="cancel-button"
onClick={() => setIsOpen(false)}
>
取消
</button>
<button
className="submit-button"
onClick={submitFeedback}
disabled={isSubmitting}
>
{isSubmitting ? '提交中...' : '提交反馈'}
</button>
</div>
</>
)}
</div>
</div>
)}
</>
);
};
export default FalsePositiveReporter;
通过这个反馈系统:
- 误判的正常用户可以提交反馈,快速获得访问权限
- 管理员能看到系统误判情况,调整参数和规则
- 收集的数据用于持续优化防爬策略,减少对正常用户的影响
总结
| 策略类型 | 实现方式 | 优点 | 缺点 | 适用场景 | 绕过难度 | 资源消耗 |
|---|---|---|---|---|---|---|
| 请求频率控制 | IP 限流、滑动窗口 | 实现简单,有效拦截高频爬虫 | 难以识别分布式爬虫 | 基础防护层 | ★★☆☆☆ | 低 |
| 行为特征识别 | 会话分析、蜜罐链接 | 主动防御,识别高级爬虫 | 实现复杂,可能误判 | 中等防护需求 | ★★★★☆ | 中 |
| 客户端特征分析 | UA 分析、指纹识别 | 标识性强,难以完美模拟 | 复杂爬虫可伪造特征 | 全面防护体系 | ★★★☆☆ | 中 |
| 验证码机制 | 图形验证码、行为验证 | 有效阻断自动爬虫 | 影响用户体验 | 敏感操作保护 | ★★★★★ | 高 |
| 内容加密与混淆 | 动态加载、数据加密 | 保护核心数据,提高爬取成本 | 增加服务器负担 | 高价值数据保护 | ★★★★☆ | 高 |
| 设备指纹识别 | Canvas 指纹、WebRTC 检测 | 难以完美模拟,准确度高 | 实现复杂,可能误判 | 高级防护需求 | ★★★★★ | 中 |
反爬虫系统需要根据网站特点和数据敏感度选择合适策略。对于大型网站,建议采用分层防护,将多种技术组合使用:
- 第一层使用频率控制拦截基础爬虫
- 第二层通过行为和客户端特征识别高级爬虫
- 第三层保护核心数据,增加爬取成本
通过 AB 测试和用户反馈不断优化系统,在安全防护和用户体验之间找到平衡点。最后,反爬虫系统应该是动态进化的,随着爬虫技术的发展而不断调整策略,持续保护你的网站数据资产。