时间:40分钟 | 难度:⭐⭐⭐⭐ | Week 4 Day 23
在生产环境中,LLM应用面临着各种不确定性:API限流、网络超时、模型幻觉、格式错误等。一个健壮的错误处理和容错机制是保障系统稳定性的关键。本文将深入探讨如何在LangChain4J应用中构建生产级的容错架构。
📋 学习目标
- 理解LLM应用中常见的错误类型和分类 ✅ 2026-03-11
- 掌握指数退避重试机制的设计和实现 ✅ 2026-03-11
- 学会使用Resilience4j实现熔断器模式 ✅ 2026-03-11
- 实现多层降级策略保障服务可用性 ✅ 2026-03-11
- 设计合理的限流保护机制 ✅ 2026-03-11
- 构建用户友好的错误响应系统 ✅ 2026-03-11
- 整合多种容错机制形成完整的防御体系 ✅ 2026-03-11
⚠️ LLM应用常见错误分类
在LangChain4J应用中,错误可以分为三大类:
错误分类表
| 错误类型 | 具体错误 | 是否可重试 | 推荐策略 |
|---|---|---|---|
| API错误 | 429 Rate Limit | ✅ | 指数退避重试 |
| 500 Internal Server Error | ✅ | 重试 + 降级 | |
| 503 Service Unavailable | ✅ | 重试 + 熔断 | |
| Network Timeout | ✅ | 重试 + 超时配置 | |
| 401 Unauthorized | ❌ | 立即失败 + 告警 | |
| 业务错误 | JSON格式解析失败 | ⚠️ | 重试 + 格式修正提示 |
| 输出验证失败 | ⚠️ | 重新生成 + 约束强化 | |
| 幻觉检测失败 | ❌ | 降级 + 人工审核 | |
| 内容安全违规 | ❌ | 拒绝 + 记录 | |
| 系统错误 | 内存溢出 | ❌ | 限流 + 扩容 |
| 数据库连接失败 | ✅ | 连接池重试 | |
| 缓存服务不可用 | ✅ | 降级到无缓存模式 |
错误识别代码示例
package com.example.langchain4j.error;
import dev.langchain4j.exception.*;
import lombok.extern.slf4j.Slf4j;
@Slf4j
public class ErrorClassifier {
/**
* 判断错误是否可重试
*/
public static boolean isRetryable(Exception ex) {
// API限流错误 - 可重试
if (ex instanceof RateLimitException) {
return true;
}
// 网络超时 - 可重试
if (ex instanceof TimeoutException) {
return true;
}
// 服务端临时错误 - 可重试
if (ex instanceof ServiceUnavailableException) {
return true;
}
// 认证错误 - 不可重试
if (ex instanceof AuthenticationException) {
log.error("Authentication failed - check API key", ex);
return false;
}
// 业务验证错误 - 有条件重试
if (ex instanceof ValidationException) {
return isBusinessRetryable((ValidationException) ex);
}
// 默认不重试
return false;
}
/**
* 判断业务错误是否可重试
*/
private static boolean isBusinessRetryable(ValidationException ex) {
String message = ex.getMessage().toLowerCase();
// JSON解析错误可以重试
if (message.contains("json") || message.contains("format")) {
return true;
}
// 内容安全违规不可重试
if (message.contains("safety") || message.contains("policy")) {
return false;
}
return false;
}
/**
* 获取错误严重级别
*/
public static ErrorSeverity getSeverity(Exception ex) {
if (ex instanceof AuthenticationException) {
return ErrorSeverity.CRITICAL;
}
if (ex instanceof RateLimitException) {
return ErrorSeverity.WARNING;
}
if (ex instanceof TimeoutException) {
return ErrorSeverity.WARNING;
}
return ErrorSeverity.ERROR;
}
public enum ErrorSeverity {
CRITICAL, // 需要立即处理
ERROR, // 需要告警
WARNING, // 需要监控
INFO // 仅记录
}
}
🔄 重试机制设计
指数退避算法
指数退避(Exponential Backoff)是处理瞬时错误的最佳实践:
第1次重试:等待 1秒
第2次重试:等待 2秒
第3次重试:等待 4秒
第4次重试:等待 8秒
Resilience4j集成
<!-- pom.xml -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-retry</artifactId>
<version>2.1.0</version>
</dependency>
重试配置
# application.yml
resilience4j:
retry:
instances:
llmService:
max-attempts: 4
wait-duration: 1s
enable-exponential-backoff: true
exponential-backoff-multiplier: 2
retry-exceptions:
- dev.langchain4j.exception.RateLimitException
- dev.langchain4j.exception.TimeoutException
- dev.langchain4j.exception.ServiceUnavailableException
ignore-exceptions:
- dev.langchain4j.exception.AuthenticationException
- dev.langchain4j.exception.InvalidRequestException
重试服务实现
package com.example.langchain4j.service;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.data.message.AiMessage;
import io.github.resilience4j.retry.annotation.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
@Slf4j
@Service
@RequiredArgsConstructor
public class ResilientLLMService {
private final ChatLanguageModel model;
/**
* 带重试的LLM调用
*/
@Retry(name = "llmService", fallbackMethod = "generateFallback")
public String generate(String prompt) {
log.info("Calling LLM with prompt: {}", prompt);
try {
AiMessage response = model.generate(prompt);
log.info("LLM call succeeded");
return response.text();
} catch (Exception ex) {
log.warn("LLM call failed: {}", ex.getMessage());
throw ex; // 让Resilience4j处理重试
}
}
/**
* 降级方法
*/
private String generateFallback(String prompt, Exception ex) {
log.error("All retry attempts failed for prompt: {}", prompt, ex);
return "抱歉,服务暂时不可用,请稍后重试。";
}
}
自定义重试配置
package com.example.langchain4j.config;
import com.example.langchain4j.error.ErrorClassifier;
import io.github.resilience4j.retry.Retry;
import io.github.resilience4j.retry.RetryConfig;
import io.github.resilience4j.retry.RetryRegistry;
import lombok.extern.slf4j.Slf4j;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.time.Duration;
@Slf4j
@Configuration
public class RetryConfiguration {
@Bean
public RetryRegistry retryRegistry() {
RetryConfig config = RetryConfig.custom()
.maxAttempts(4)
.waitDuration(Duration.ofSeconds(1))
.intervalFunction(io.github.resilience4j.core.IntervalFunction
.ofExponentialBackoff(Duration.ofSeconds(1), 2))
.retryOnException(ErrorClassifier::isRetryable)
.onRetry(event -> {
log.warn("Retry attempt {} for operation: {}",
event.getNumberOfRetryAttempts(),
event.getName());
})
.onSuccess(event -> {
log.info("Operation succeeded after {} attempts",
event.getNumberOfRetryAttempts());
})
.onError(event -> {
log.error("All retry attempts failed for operation: {}",
event.getName(), event.getLastThrowable());
})
.build();
return RetryRegistry.of(config);
}
@Bean
public Retry llmRetry(RetryRegistry registry) {
return registry.retry("llmService");
}
}
编程式重试使用
package com.example.langchain4j.service;
import io.github.resilience4j.retry.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.util.function.Supplier;
@Slf4j
@Service
@RequiredArgsConstructor
public class ProgrammaticRetryService {
private final Retry llmRetry;
/**
* 编程式重试调用
*/
public String callWithRetry(Supplier<String> operation) {
Supplier<String> decoratedSupplier = Retry
.decorateSupplier(llmRetry, operation);
try {
return decoratedSupplier.get();
} catch (Exception ex) {
log.error("Operation failed after all retries", ex);
throw ex;
}
}
/**
* 使用示例
*/
public String generateText(String prompt) {
return callWithRetry(() -> {
log.info("Executing LLM call");
// 实际的LLM调用
return performLLMCall(prompt);
});
}
private String performLLMCall(String prompt) {
// 实际实现
return "Generated text";
}
}
🔌 熔断器模式
熔断器(Circuit Breaker)可以防止系统持续调用已经故障的服务,快速失败并进行降级。
熔断器状态机
CLOSED (闭合) → OPEN (开启) → HALF_OPEN (半开) → CLOSED
↓ ↓ ↓
正常运行 快速失败 尝试恢复
状态转换规则
- CLOSED → OPEN: 失败率超过阈值(如50%)
- OPEN → HALF_OPEN: 等待时间到达(如60秒)
- HALF_OPEN → CLOSED: 测试调用成功
- HALF_OPEN → OPEN: 测试调用失败
熔断器配置
# application.yml
resilience4j:
circuitbreaker:
instances:
llmService:
# 失败率阈值(50%)
failure-rate-threshold: 50
# 慢调用比例阈值
slow-call-rate-threshold: 50
# 慢调用时间阈值(5秒)
slow-call-duration-threshold: 5s
# 滑动窗口大小
sliding-window-size: 10
# 滑动窗口类型(基于次数)
sliding-window-type: count_based
# 最小调用次数(达到后才计算失败率)
minimum-number-of-calls: 5
# OPEN状态等待时间
wait-duration-in-open-state: 60s
# HALF_OPEN状态允许的调用次数
permitted-number-of-calls-in-half-open-state: 3
# 自动从OPEN转换到HALF_OPEN
automatic-transition-from-open-to-half-open-enabled: true
# 记录的异常
record-exceptions:
- java.lang.Exception
# 忽略的异常
ignore-exceptions:
- dev.langchain4j.exception.AuthenticationException
熔断器服务实现
package com.example.langchain4j.service;
import dev.langchain4j.model.chat.ChatLanguageModel;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
@Slf4j
@Service
@RequiredArgsConstructor
public class CircuitBreakerLLMService {
private final ChatLanguageModel primaryModel;
private final ChatLanguageModel backupModel;
/**
* 带熔断器的LLM调用
*/
@CircuitBreaker(name = "llmService", fallbackMethod = "generateFallback")
public String generate(String prompt) {
log.info("Calling primary LLM model");
return primaryModel.generate(prompt).content();
}
/**
* 熔断器打开时的降级方法
*/
private String generateFallback(String prompt, Exception ex) {
log.warn("Circuit breaker open, using backup model");
try {
return backupModel.generate(prompt).content();
} catch (Exception backupEx) {
log.error("Backup model also failed", backupEx);
return "服务暂时不可用,请稍后重试。";
}
}
}
熔断器监听器
package com.example.langchain4j.config;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.github.resilience4j.circuitbreaker.event.CircuitBreakerOnStateTransitionEvent;
import jakarta.annotation.PostConstruct;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
@Slf4j
@Component
@RequiredArgsConstructor
public class CircuitBreakerEventListener {
private final CircuitBreakerRegistry registry;
@PostConstruct
public void init() {
registry.circuitBreaker("llmService")
.getEventPublisher()
.onStateTransition(this::onStateTransition)
.onSuccess(event -> log.info("Call succeeded"))
.onError(event -> log.warn("Call failed: {}",
event.getThrowable().getMessage()))
.onIgnoredError(event -> log.debug("Error ignored: {}",
event.getThrowable().getMessage()));
}
private void onStateTransition(CircuitBreakerOnStateTransitionEvent event) {
log.warn("Circuit breaker state changed: {} -> {}",
event.getStateTransition().getFromState(),
event.getStateTransition().getToState());
// 发送告警通知
if (event.getStateTransition().getToState() == CircuitBreaker.State.OPEN) {
sendAlert("Circuit breaker opened for: " + event.getCircuitBreakerName());
}
}
private void sendAlert(String message) {
// 实现告警逻辑(邮件、钉钉、Slack等)
log.error("ALERT: {}", message);
}
}
熔断器状态查询
package com.example.langchain4j.controller;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import java.util.HashMap;
import java.util.Map;
@RestController
@RequestMapping("/api/circuit-breaker")
@RequiredArgsConstructor
public class CircuitBreakerController {
private final CircuitBreakerRegistry registry;
/**
* 获取熔断器状态
*/
@GetMapping("/status/{name}")
public Map<String, Object> getStatus(@PathVariable String name) {
CircuitBreaker circuitBreaker = registry.circuitBreaker(name);
CircuitBreaker.Metrics metrics = circuitBreaker.getMetrics();
Map<String, Object> status = new HashMap<>();
status.put("state", circuitBreaker.getState().toString());
status.put("failureRate", metrics.getFailureRate());
status.put("slowCallRate", metrics.getSlowCallRate());
status.put("numberOfBufferedCalls", metrics.getNumberOfBufferedCalls());
status.put("numberOfFailedCalls", metrics.getNumberOfFailedCalls());
status.put("numberOfSuccessfulCalls", metrics.getNumberOfSuccessfulCalls());
status.put("numberOfSlowCalls", metrics.getNumberOfSlowCalls());
return status;
}
/**
* 手动切换熔断器状态
*/
@PostMapping("/transition/{name}")
public String transitionState(
@PathVariable String name,
@RequestParam String toState) {
CircuitBreaker circuitBreaker = registry.circuitBreaker(name);
switch (toState.toUpperCase()) {
case "CLOSED" -> circuitBreaker.transitionToClosedState();
case "OPEN" -> circuitBreaker.transitionToOpenState();
case "HALF_OPEN" -> circuitBreaker.transitionToHalfOpenState();
case "DISABLED" -> circuitBreaker.transitionToDisabledState();
case "FORCED_OPEN" -> circuitBreaker.transitionToForcedOpenState();
default -> throw new IllegalArgumentException("Invalid state: " + toState);
}
return "Transitioned to " + toState;
}
}
📉 降级策略
降级是保障系统可用性的最后一道防线。当主服务不可用时,通过降级策略提供有限但可用的服务。
多层降级架构
第1层: GPT-4 (高质量,高成本)
↓ 失败
第2层: GPT-3.5 (中等质量,中等成本)
↓ 失败
第3层: 本地模型 (基本质量,低成本)
↓ 失败
第4层: 静态响应 (保底方案)
模型降级链实现
package com.example.langchain4j.service;
import dev.langchain4j.model.chat.ChatLanguageModel;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.util.List;
@Slf4j
@Service
public class FallbackChainService {
private final List<ModelTier> modelTiers;
public FallbackChainService(
ChatLanguageModel gpt4Model,
ChatLanguageModel gpt35Model,
ChatLanguageModel localModel) {
this.modelTiers = List.of(
new ModelTier("GPT-4", gpt4Model, 1),
new ModelTier("GPT-3.5", gpt35Model, 2),
new ModelTier("Local Model", localModel, 3)
);
}
/**
* 级联降级调用
*/
public String generateWithFallback(String prompt) {
Exception lastException = null;
for (ModelTier tier : modelTiers) {
try {
log.info("Trying tier {}: {}", tier.level, tier.name);
String result = tier.model.generate(prompt).content();
log.info("Tier {} succeeded", tier.level);
return result;
} catch (Exception ex) {
log.warn("Tier {} failed: {}", tier.level, ex.getMessage());
lastException = ex;
// 继续尝试下一层
}
}
// 所有层级都失败,返回静态响应
log.error("All model tiers failed", lastException);
return getFallbackResponse(prompt);
}
/**
* 最终的静态降级响应
*/
private String getFallbackResponse(String prompt) {
return "抱歉,AI服务暂时不可用。我们已记录您的请求,稍后会为您处理。";
}
private record ModelTier(
String name,
ChatLanguageModel model,
int level
) {}
}
功能级别降级
package com.example.langchain4j.service;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.time.Instant;
import java.util.Optional;
@Slf4j
@Service
@RequiredArgsConstructor
public class FeatureFallbackService {
private final ChatLanguageModel realtimeModel;
private final CacheManager cacheManager;
/**
* 功能降级:实时 → 缓存 → 静态
*/
public String getAnswer(String question) {
// 第1层:尝试实时生成
try {
log.info("Attempting real-time generation");
String answer = realtimeModel.generate(question).content();
cacheAnswer(question, answer);
return answer;
} catch (Exception ex) {
log.warn("Real-time generation failed: {}", ex.getMessage());
}
// 第2层:从缓存获取
Optional<String> cachedAnswer = getCachedAnswer(question);
if (cachedAnswer.isPresent()) {
log.info("Returning cached answer");
return cachedAnswer.get() + "\n\n(来自缓存)";
}
// 第3层:返回静态答案
log.warn("Returning static fallback");
return getStaticFallback(question);
}
private void cacheAnswer(String question, String answer) {
Cache cache = cacheManager.getCache("llm-answers");
if (cache != null) {
cache.put(question, answer);
}
}
private Optional<String> getCachedAnswer(String question) {
Cache cache = cacheManager.getCache("llm-answers");
if (cache != null) {
Cache.ValueWrapper wrapper = cache.get(question);
if (wrapper != null) {
return Optional.of((String) wrapper.get());
}
}
return Optional.empty();
}
private String getStaticFallback(String question) {
return """
抱歉,当前无法生成实时回答。
常见问题可以参考:
- 产品使用文档:https://docs.example.com
- 技术支持:support@example.com
- 客服热线:400-XXX-XXXX
""";
}
}
智能降级决策器
package com.example.langchain4j.service;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;
@Slf4j
@Component
public class DegradationDecider {
private final AtomicInteger errorCount = new AtomicInteger(0);
private final AtomicReference<Instant> lastErrorTime =
new AtomicReference<>(Instant.now());
// 降级阈值
private static final int ERROR_THRESHOLD = 5;
private static final Duration ERROR_WINDOW = Duration.ofMinutes(5);
/**
* 判断是否应该降级
*/
public boolean shouldDegrade() {
int errors = errorCount.get();
Instant lastError = lastErrorTime.get();
Duration timeSinceLastError = Duration.between(lastError, Instant.now());
// 如果错误窗口过期,重置计数
if (timeSinceLastError.compareTo(ERROR_WINDOW) > 0) {
errorCount.set(0);
return false;
}
// 如果错误次数超过阈值,进行降级
return errors >= ERROR_THRESHOLD;
}
/**
* 记录错误
*/
public void recordError() {
errorCount.incrementAndGet();
lastErrorTime.set(Instant.now());
log.warn("Error recorded. Total errors: {}", errorCount.get());
}
/**
* 记录成功(可以逐步恢复)
*/
public void recordSuccess() {
if (errorCount.get() > 0) {
errorCount.decrementAndGet();
log.info("Success recorded. Remaining errors: {}", errorCount.get());
}
}
/**
* 重置状态
*/
public void reset() {
errorCount.set(0);
log.info("Degradation state reset");
}
}
降级模式控制器
package com.example.langchain4j.controller;
import com.example.langchain4j.service.DegradationDecider;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
import java.util.Map;
@RestController
@RequestMapping("/api/degradation")
@RequiredArgsConstructor
public class DegradationController {
private final DegradationDecider decider;
/**
* 获取降级状态
*/
@GetMapping("/status")
public Map<String, Object> getStatus() {
return Map.of(
"degraded", decider.shouldDegrade(),
"mode", decider.shouldDegrade() ? "DEGRADED" : "NORMAL"
);
}
/**
* 手动触发降级
*/
@PostMapping("/enable")
public String enableDegradation() {
for (int i = 0; i < 10; i++) {
decider.recordError();
}
return "Degradation mode enabled";
}
/**
* 恢复正常模式
*/
@PostMapping("/disable")
public String disableDegradation() {
decider.reset();
return "Normal mode restored";
}
}
🚰 限流保护
限流(Rate Limiting)防止系统过载,保护后端服务和控制成本。
限流算法对比
| 算法 | 原理 | 优点 | 缺点 | 适用场景 |
|---|---|---|---|---|
| 固定窗口 | 每分钟固定次数 | 实现简单 | 边界突刺 | 简单场景 |
| 滑动窗口 | 平滑的时间窗口 | 更精确 | 内存占用高 | 精确控制 |
| 令牌桶 | 恒定速率放入令牌 | 允许突发 | 复杂度中等 | API网关 |
| 漏桶 | 恒定速率流出 | 流量平滑 | 不允许突发 | 流量整形 |
Resilience4j限流配置
# application.yml
resilience4j:
ratelimiter:
instances:
llmService:
# 限流周期(每秒)
limit-refresh-period: 1s
# 周期内允许的请求数
limit-for-period: 10
# 等待许可的超时时间
timeout-duration: 5s
限流器实现
package com.example.langchain4j.service;
import io.github.resilience4j.ratelimiter.annotation.RateLimiter;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
@Slf4j
@Service
@RequiredArgsConstructor
public class RateLimitedLLMService {
private final ChatLanguageModel model;
/**
* 带限流的LLM调用
*/
@RateLimiter(name = "llmService", fallbackMethod = "rateLimitFallback")
public String generate(String prompt) {
log.info("Processing request within rate limit");
return model.generate(prompt).content();
}
/**
* 限流降级方法
*/
private String rateLimitFallback(String prompt, Exception ex) {
log.warn("Rate limit exceeded for request");
return "请求过于频繁,请稍后再试。";
}
}
按用户限流
package com.example.langchain4j.service;
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import io.github.resilience4j.ratelimiter.RateLimiterRegistry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;
@Slf4j
@Service
@RequiredArgsConstructor
public class PerUserRateLimiter {
private final ConcurrentHashMap<String, RateLimiter> userLimiters =
new ConcurrentHashMap<>();
private final RateLimiterRegistry registry;
/**
* 获取或创建用户专属限流器
*/
private RateLimiter getUserLimiter(String userId, int requestsPerMinute) {
return userLimiters.computeIfAbsent(userId, id -> {
RateLimiterConfig config = RateLimiterConfig.custom()
.limitRefreshPeriod(Duration.ofMinutes(1))
.limitForPeriod(requestsPerMinute)
.timeoutDuration(Duration.ofSeconds(5))
.build();
return registry.rateLimiter("user-" + id, config);
});
}
/**
* 执行带用户限流的操作
*/
public <T> T executeWithUserLimit(
String userId,
int requestsPerMinute,
Supplier<T> operation) {
RateLimiter limiter = getUserLimiter(userId, requestsPerMinute);
Supplier<T> decoratedSupplier = RateLimiter
.decorateSupplier(limiter, operation);
try {
return decoratedSupplier.get();
} catch (Exception ex) {
log.warn("Rate limit exceeded for user: {}", userId);
throw new RateLimitExceededException(
"用户 " + userId + " 请求过于频繁");
}
}
/**
* 使用示例
*/
public String generateForUser(String userId, String prompt) {
return executeWithUserLimit(userId, 10, () -> {
log.info("Generating for user: {}", userId);
// 实际的LLM调用
return "Generated content";
});
}
public static class RateLimitExceededException extends RuntimeException {
public RateLimitExceededException(String message) {
super(message);
}
}
}
分级限流策略
package com.example.langchain4j.service;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import java.time.Duration;
@Service
public class TieredRateLimitService {
private final PerUserRateLimiter rateLimiter;
/**
* 根据用户等级获取限流配置
*/
public int getRateLimitForUser(String userId, UserTier tier) {
return switch (tier) {
case FREE -> 10; // 每分钟10次
case BASIC -> 50; // 每分钟50次
case PRO -> 200; // 每分钟200次
case ENTERPRISE -> 1000; // 每分钟1000次
};
}
/**
* 带分级限流的调用
*/
public String generateWithTier(
String userId,
UserTier tier,
String prompt) {
int limit = getRateLimitForUser(userId, tier);
return rateLimiter.executeWithUserLimit(userId, limit, () -> {
// 实际的LLM调用
return performGeneration(prompt);
});
}
private String performGeneration(String prompt) {
// 实际实现
return "Generated content";
}
@Getter
@RequiredArgsConstructor
public enum UserTier {
FREE("免费版", 10),
BASIC("基础版", 50),
PRO("专业版", 200),
ENTERPRISE("企业版", 1000);
private final String displayName;
private final int requestsPerMinute;
}
}
动态限流调整
package com.example.langchain4j.service;
import io.github.resilience4j.ratelimiter.RateLimiter;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;
import java.time.LocalDateTime;
import java.time.LocalTime;
@Slf4j
@Service
@RequiredArgsConstructor
public class DynamicRateLimitService {
private final RateLimiter rateLimiter;
/**
* 根据时间段动态调整限流
* 高峰期更严格,低峰期更宽松
*/
@Scheduled(fixedRate = 60000) // 每分钟检查一次
public void adjustRateLimit() {
LocalTime now = LocalTime.now();
int newLimit;
// 高峰期 (9:00-12:00, 14:00-18:00)
if (isPeakHour(now)) {
newLimit = 5; // 更严格的限流
log.info("Peak hour detected, setting strict rate limit: {}", newLimit);
}
// 低峰期
else {
newLimit = 20; // 更宽松的限流
log.info("Off-peak hour, relaxing rate limit: {}", newLimit);
}
// 动态更新限流配置
rateLimiter.changeLimitForPeriod(newLimit);
}
private boolean isPeakHour(LocalTime time) {
return (time.isAfter(LocalTime.of(9, 0)) &&
time.isBefore(LocalTime.of(12, 0))) ||
(time.isAfter(LocalTime.of(14, 0)) &&
time.isBefore(LocalTime.of(18, 0)));
}
/**
* 根据系统负载动态调整
*/
public void adjustBasedOnLoad(double cpuUsage, double memoryUsage) {
int newLimit;
if (cpuUsage > 0.8 || memoryUsage > 0.8) {
newLimit = 3; // 系统高负载,严格限流
log.warn("High system load detected, reducing rate limit to: {}", newLimit);
} else if (cpuUsage > 0.6 || memoryUsage > 0.6) {
newLimit = 10; // 中等负载
log.info("Medium system load, setting moderate rate limit: {}", newLimit);
} else {
newLimit = 20; // 低负载,宽松限流
log.info("Low system load, setting relaxed rate limit: {}", newLimit);
}
rateLimiter.changeLimitForPeriod(newLimit);
}
}
💬 优雅的错误响应
用户友好的错误信息可以显著提升用户体验。
错误码体系设计
package com.example.langchain4j.error;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
@Getter
@RequiredArgsConstructor
public enum ErrorCode {
// 1xxx: 客户端错误
INVALID_REQUEST(1001, "请求参数无效"),
RATE_LIMIT_EXCEEDED(1002, "请求过于频繁,请稍后重试"),
UNAUTHORIZED(1003, "认证失败,请检查API密钥"),
QUOTA_EXCEEDED(1004, "已达使用配额上限"),
// 2xxx: 业务错误
CONTENT_POLICY_VIOLATION(2001, "内容违反使用政策"),
OUTPUT_VALIDATION_FAILED(2002, "输出验证失败"),
CONTEXT_LENGTH_EXCEEDED(2003, "上下文长度超出限制"),
// 3xxx: 服务端错误
SERVICE_UNAVAILABLE(3001, "服务暂时不可用"),
MODEL_OVERLOADED(3002, "模型负载过高"),
TIMEOUT(3003, "请求超时"),
INTERNAL_ERROR(3999, "内部错误");
private final int code;
private final String message;
public String getCodeString() {
return "LLM-" + code;
}
}
统一错误响应
package com.example.langchain4j.dto;
import com.example.langchain4j.error.ErrorCode;
import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;
import java.time.Instant;
@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class ErrorResponse {
private String code; // 错误码
private String message; // 用户友好的错误信息
private String technicalDetail; // 技术详情(可选)
private String suggestion; // 建议的解决方案
private Instant timestamp; // 时间戳
private String requestId; // 请求追踪ID
public static ErrorResponse from(ErrorCode errorCode) {
return ErrorResponse.builder()
.code(errorCode.getCodeString())
.message(errorCode.getMessage())
.timestamp(Instant.now())
.build();
}
public static ErrorResponse from(ErrorCode errorCode, String suggestion) {
return ErrorResponse.builder()
.code(errorCode.getCodeString())
.message(errorCode.getMessage())
.suggestion(suggestion)
.timestamp(Instant.now())
.build();
}
public static ErrorResponse fromException(
ErrorCode errorCode,
Exception ex,
String requestId) {
return ErrorResponse.builder()
.code(errorCode.getCodeString())
.message(errorCode.getMessage())
.technicalDetail(ex.getMessage())
.timestamp(Instant.now())
.requestId(requestId)
.build();
}
}
全局异常处理器
package com.example.langchain4j.exception;
import com.example.langchain4j.dto.ErrorResponse;
import com.example.langchain4j.error.ErrorCode;
import dev.langchain4j.exception.*;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import java.util.UUID;
@Slf4j
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(RateLimitException.class)
public ResponseEntity<ErrorResponse> handleRateLimit(RateLimitException ex) {
log.warn("Rate limit exceeded: {}", ex.getMessage());
ErrorResponse response = ErrorResponse.from(
ErrorCode.RATE_LIMIT_EXCEEDED,
"请等待片刻后重试,或考虑升级到更高的服务等级"
);
return ResponseEntity
.status(HttpStatus.TOO_MANY_REQUESTS)
.body(response);
}
@ExceptionHandler(TimeoutException.class)
public ResponseEntity<ErrorResponse> handleTimeout(TimeoutException ex) {
log.warn("Request timeout: {}", ex.getMessage());
ErrorResponse response = ErrorResponse.from(
ErrorCode.TIMEOUT,
"请尝试简化您的请求或重试"
);
return ResponseEntity
.status(HttpStatus.GATEWAY_TIMEOUT)
.body(response);
}
@ExceptionHandler(AuthenticationException.class)
public ResponseEntity<ErrorResponse> handleAuthentication(
AuthenticationException ex) {
log.error("Authentication failed: {}", ex.getMessage());
ErrorResponse response = ErrorResponse.from(
ErrorCode.UNAUTHORIZED,
"请检查您的API密钥是否正确配置"
);
return ResponseEntity
.status(HttpStatus.UNAUTHORIZED)
.body(response);
}
@ExceptionHandler(ServiceUnavailableException.class)
public ResponseEntity<ErrorResponse> handleServiceUnavailable(
ServiceUnavailableException ex) {
String requestId = UUID.randomUUID().toString();
log.error("Service unavailable (requestId: {}): {}", requestId, ex.getMessage());
ErrorResponse response = ErrorResponse.fromException(
ErrorCode.SERVICE_UNAVAILABLE,
ex,
requestId
);
response.setSuggestion("我们的工程师已收到通知,请稍后重试");
return ResponseEntity
.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(response);
}
@ExceptionHandler(Exception.class)
public ResponseEntity<ErrorResponse> handleGenericException(Exception ex) {
String requestId = UUID.randomUUID().toString();
log.error("Unexpected error (requestId: {}): {}", requestId, ex.getMessage(), ex);
ErrorResponse response = ErrorResponse.fromException(
ErrorCode.INTERNAL_ERROR,
ex,
requestId
);
response.setSuggestion("请联系技术支持,并提供请求ID: " + requestId);
return ResponseEntity
.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(response);
}
}
国际化错误信息
package com.example.langchain4j.service;
import com.example.langchain4j.error.ErrorCode;
import lombok.RequiredArgsConstructor;
import org.springframework.context.MessageSource;
import org.springframework.context.i18n.LocaleContextHolder;
import org.springframework.stereotype.Service;
import java.util.Locale;
@Service
@RequiredArgsConstructor
public class ErrorMessageService {
private final MessageSource messageSource;
/**
* 获取本地化的错误信息
*/
public String getMessage(ErrorCode errorCode) {
return getMessage(errorCode, LocaleContextHolder.getLocale());
}
public String getMessage(ErrorCode errorCode, Locale locale) {
return messageSource.getMessage(
"error." + errorCode.name().toLowerCase(),
null,
errorCode.getMessage(), // 默认消息
locale
);
}
/**
* 获取带建议的错误信息
*/
public String getMessageWithSuggestion(ErrorCode errorCode) {
String message = getMessage(errorCode);
String suggestion = getSuggestion(errorCode);
if (suggestion != null && !suggestion.isEmpty()) {
return message + "\n\n建议:" + suggestion;
}
return message;
}
private String getSuggestion(ErrorCode errorCode) {
Locale locale = LocaleContextHolder.getLocale();
return messageSource.getMessage(
"error." + errorCode.name().toLowerCase() + ".suggestion",
null,
"",
locale
);
}
}
# messages_zh_CN.properties
error.rate_limit_exceeded=请求过于频繁,请稍后重试
error.rate_limit_exceeded.suggestion=您可以等待1-5分钟后重试,或升级到更高的服务等级
error.timeout=请求处理超时
error.timeout.suggestion=请尝试简化您的问题,或将长文本分段处理
error.service_unavailable=服务暂时不可用
error.service_unavailable.suggestion=我们正在努力恢复服务,请稍后重试
error.unauthorized=认证失败
error.unauthorized.suggestion=请检查您的API密钥是否正确配置,或联系管理员
# messages_en_US.properties
error.rate_limit_exceeded=Rate limit exceeded, please try again later
error.rate_limit_exceeded.suggestion=You can wait 1-5 minutes or upgrade your service tier
error.timeout=Request timeout
error.timeout.suggestion=Try simplifying your request or processing long texts in segments
error.service_unavailable=Service temporarily unavailable
error.service_unavailable.suggestion=We're working to restore service, please try again later
error.unauthorized=Authentication failed
error.unauthorized.suggestion=Please check your API key configuration or contact administrator
🏗️ 生产级容错架构
将所有容错机制整合,构建完整的防御体系。
多层防御架构图
用户请求
↓
[1. 限流层] → 防止过载
↓
[2. 重试层] → 处理瞬时错误
↓
[3. 熔断层] → 快速失败
↓
[4. 降级层] → 保底服务
↓
响应返回
完整容错服务
package com.example.langchain4j.service;
import com.example.langchain4j.dto.ErrorResponse;
import com.example.langchain4j.error.ErrorCode;
import dev.langchain4j.model.chat.ChatLanguageModel;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import io.github.resilience4j.ratelimiter.annotation.RateLimiter;
import io.github.resilience4j.retry.annotation.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
@Slf4j
@Service
@RequiredArgsConstructor
public class ResilientLLMServiceComplete {
private final ChatLanguageModel primaryModel;
private final ChatLanguageModel backupModel;
private final FallbackChainService fallbackChain;
private final ErrorMessageService errorMessageService;
/**
* 完整的容错调用链
* 1. 限流保护
* 2. 重试机制
* 3. 熔断保护
* 4. 降级策略
*/
@RateLimiter(name = "llmService")
@Retry(name = "llmService")
@CircuitBreaker(name = "llmService", fallbackMethod = "completeFallback")
public String generate(String prompt) {
log.info("Processing request with full resilience");
try {
// 主模型调用
return primaryModel.generate(prompt).content();
} catch (Exception ex) {
log.warn("Primary model failed: {}", ex.getMessage());
throw ex; // 让Resilience4j处理
}
}
/**
* 完整的降级链
*/
private String completeFallback(String prompt, Exception ex) {
log.warn("Entering complete fallback chain", ex);
// 第1层降级:尝试备用模型
try {
log.info("Trying backup model");
return backupModel.generate(prompt).content();
} catch (Exception backupEx) {
log.warn("Backup model failed: {}", backupEx.getMessage());
}
// 第2层降级:使用降级链
try {
log.info("Trying fallback chain");
return fallbackChain.generateWithFallback(prompt);
} catch (Exception chainEx) {
log.warn("Fallback chain failed: {}", chainEx.getMessage());
}
// 第3层降级:返回友好的错误信息
log.error("All fallback attempts failed", ex);
return errorMessageService.getMessageWithSuggestion(
ErrorCode.SERVICE_UNAVAILABLE
);
}
}
监控和告警
package com.example.langchain4j.monitoring;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterRegistry;
import io.github.resilience4j.retry.Retry;
import io.github.resilience4j.retry.RetryRegistry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.util.HashMap;
import java.util.Map;
@Slf4j
@Component
@RequiredArgsConstructor
public class ResilienceMonitor {
private final CircuitBreakerRegistry circuitBreakerRegistry;
private final RetryRegistry retryRegistry;
private final RateLimiterRegistry rateLimiterRegistry;
/**
* 定期收集健康指标
*/
@Scheduled(fixedRate = 30000) // 每30秒
public void collectMetrics() {
Map<String, Object> metrics = new HashMap<>();
// 收集熔断器指标
circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
metrics.put("circuitbreaker." + cb.getName() + ".state",
cb.getState().toString());
metrics.put("circuitbreaker." + cb.getName() + ".failureRate",
cb.getMetrics().getFailureRate());
metrics.put("circuitbreaker." + cb.getName() + ".slowCallRate",
cb.getMetrics().getSlowCallRate());
});
// 收集限流器指标
rateLimiterRegistry.getAllRateLimiters().forEach(rl -> {
metrics.put("ratelimiter." + rl.getName() + ".availablePermissions",
rl.getMetrics().getAvailablePermissions());
metrics.put("ratelimiter." + rl.getName() + ".waitingThreads",
rl.getMetrics().getNumberOfWaitingThreads());
});
// 记录指标
log.info("Resilience metrics: {}", metrics);
// 检查告警条件
checkAlerts(metrics);
}
/**
* 检查告警条件
*/
private void checkAlerts(Map<String, Object> metrics) {
// 检查熔断器状态
circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
if (cb.getState() == CircuitBreaker.State.OPEN) {
sendAlert("熔断器打开: " + cb.getName());
}
if (cb.getMetrics().getFailureRate() > 30) {
sendAlert("失败率过高: " + cb.getName() +
" - " + cb.getMetrics().getFailureRate() + "%");
}
});
// 检查限流器状态
rateLimiterRegistry.getAllRateLimiters().forEach(rl -> {
if (rl.getMetrics().getNumberOfWaitingThreads() > 10) {
sendAlert("限流等待线程过多: " + rl.getName());
}
});
}
/**
* 发送告警
*/
private void sendAlert(String message) {
log.error("ALERT: {}", message);
// 实际实现:发送邮件、钉钉、Slack等
}
}
健康检查端点
package com.example.langchain4j.controller;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import lombok.RequiredArgsConstructor;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.HashMap;
import java.util.Map;
@Component
@RequiredArgsConstructor
class LLMServiceHealthIndicator implements HealthIndicator {
private final CircuitBreakerRegistry registry;
@Override
public Health health() {
Map<String, Object> details = new HashMap<>();
boolean allHealthy = true;
for (CircuitBreaker cb : registry.getAllCircuitBreakers()) {
String state = cb.getState().toString();
details.put(cb.getName(), state);
if (cb.getState() == CircuitBreaker.State.OPEN) {
allHealthy = false;
}
}
if (allHealthy) {
return Health.up().withDetails(details).build();
} else {
return Health.down().withDetails(details).build();
}
}
}
@RestController
@RequestMapping("/api/health")
@RequiredArgsConstructor
class HealthCheckController {
private final LLMServiceHealthIndicator healthIndicator;
private final CircuitBreakerRegistry circuitBreakerRegistry;
@GetMapping("/detailed")
public Map<String, Object> detailedHealth() {
Map<String, Object> health = new HashMap<>();
circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
Map<String, Object> cbHealth = new HashMap<>();
cbHealth.put("state", cb.getState().toString());
cbHealth.put("failureRate", cb.getMetrics().getFailureRate());
cbHealth.put("slowCallRate", cb.getMetrics().getSlowCallRate());
cbHealth.put("bufferedCalls", cb.getMetrics().getNumberOfBufferedCalls());
cbHealth.put("failedCalls", cb.getMetrics().getNumberOfFailedCalls());
cbHealth.put("successfulCalls", cb.getMetrics().getNumberOfSuccessfulCalls());
health.put(cb.getName(), cbHealth);
});
return health;
}
}
完整配置示例
# application-production.yml
spring:
application:
name: langchain4j-resilient-service
# Resilience4j完整配置
resilience4j:
# 重试配置
retry:
instances:
llmService:
max-attempts: 4
wait-duration: 1s
enable-exponential-backoff: true
exponential-backoff-multiplier: 2
retry-exceptions:
- dev.langchain4j.exception.RateLimitException
- dev.langchain4j.exception.TimeoutException
- dev.langchain4j.exception.ServiceUnavailableException
ignore-exceptions:
- dev.langchain4j.exception.AuthenticationException
# 熔断器配置
circuitbreaker:
instances:
llmService:
failure-rate-threshold: 50
slow-call-rate-threshold: 50
slow-call-duration-threshold: 5s
sliding-window-size: 10
sliding-window-type: count_based
minimum-number-of-calls: 5
wait-duration-in-open-state: 60s
permitted-number-of-calls-in-half-open-state: 3
automatic-transition-from-open-to-half-open-enabled: true
# 限流配置
ratelimiter:
instances:
llmService:
limit-refresh-period: 1s
limit-for-period: 10
timeout-duration: 5s
# 舱壁隔离配置
bulkhead:
instances:
llmService:
max-concurrent-calls: 5
max-wait-duration: 10s
# 监控配置
management:
endpoints:
web:
exposure:
include: health,metrics,circuitbreakers,ratelimiters
health:
circuitbreakers:
enabled: true
metrics:
export:
prometheus:
enabled: true
# 日志配置
logging:
level:
com.example.langchain4j: INFO
io.github.resilience4j: DEBUG
💡 实战练习
练习1:实现完整的容错服务
创建一个LLM服务,整合重试、熔断、限流和降级机制。
任务:
- 配置Resilience4j的所有组件
- 实现多层降级策略
- 添加监控和告警
- 编写集成测试验证各种错误场景
练习2:模拟故障场景
创建故障注入器,测试容错机制的有效性。
@Service
public class ChaosEngineeringService {
private Random random = new Random();
public void injectChaos(String prompt) {
int scenario = random.nextInt(5);
switch (scenario) {
case 0 -> throw new RateLimitException("模拟限流");
case 1 -> throw new TimeoutException("模拟超时");
case 2 -> simulateSlowResponse();
case 3 -> throw new ServiceUnavailableException("模拟服务不可用");
default -> {} // 正常执行
}
}
private void simulateSlowResponse() {
try {
Thread.sleep(6000); // 模拟慢响应
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
练习3:优化错误响应
为你的应用设计一套完整的错误码体系和用户友好的错误信息。
要求:
- 定义至少20个错误码
- 提供中英文双语支持
- 每个错误都有建议的解决方案
- 实现错误分类和严重级别
最后更新:2026-03-09 字数统计:5,000 字 预计阅读时间:40 分钟