23-错误处理和容错机制最佳实践

1 阅读17分钟

时间:40分钟 | 难度:⭐⭐⭐⭐ | Week 4 Day 23

在生产环境中,LLM应用面临着各种不确定性:API限流、网络超时、模型幻觉、格式错误等。一个健壮的错误处理和容错机制是保障系统稳定性的关键。本文将深入探讨如何在LangChain4J应用中构建生产级的容错架构。

📋 学习目标

  • 理解LLM应用中常见的错误类型和分类 ✅ 2026-03-11
  • 掌握指数退避重试机制的设计和实现 ✅ 2026-03-11
  • 学会使用Resilience4j实现熔断器模式 ✅ 2026-03-11
  • 实现多层降级策略保障服务可用性 ✅ 2026-03-11
  • 设计合理的限流保护机制 ✅ 2026-03-11
  • 构建用户友好的错误响应系统 ✅ 2026-03-11
  • 整合多种容错机制形成完整的防御体系 ✅ 2026-03-11

⚠️ LLM应用常见错误分类

在LangChain4J应用中,错误可以分为三大类:

错误分类表

错误类型具体错误是否可重试推荐策略
API错误429 Rate Limit指数退避重试
500 Internal Server Error重试 + 降级
503 Service Unavailable重试 + 熔断
Network Timeout重试 + 超时配置
401 Unauthorized立即失败 + 告警
业务错误JSON格式解析失败⚠️重试 + 格式修正提示
输出验证失败⚠️重新生成 + 约束强化
幻觉检测失败降级 + 人工审核
内容安全违规拒绝 + 记录
系统错误内存溢出限流 + 扩容
数据库连接失败连接池重试
缓存服务不可用降级到无缓存模式

错误识别代码示例

package com.example.langchain4j.error;

import dev.langchain4j.exception.*;
import lombok.extern.slf4j.Slf4j;

@Slf4j
public class ErrorClassifier {

    /**
     * 判断错误是否可重试
     */
    public static boolean isRetryable(Exception ex) {
        // API限流错误 - 可重试
        if (ex instanceof RateLimitException) {
            return true;
        }

        // 网络超时 - 可重试
        if (ex instanceof TimeoutException) {
            return true;
        }

        // 服务端临时错误 - 可重试
        if (ex instanceof ServiceUnavailableException) {
            return true;
        }

        // 认证错误 - 不可重试
        if (ex instanceof AuthenticationException) {
            log.error("Authentication failed - check API key", ex);
            return false;
        }

        // 业务验证错误 - 有条件重试
        if (ex instanceof ValidationException) {
            return isBusinessRetryable((ValidationException) ex);
        }

        // 默认不重试
        return false;
    }

    /**
     * 判断业务错误是否可重试
     */
    private static boolean isBusinessRetryable(ValidationException ex) {
        String message = ex.getMessage().toLowerCase();

        // JSON解析错误可以重试
        if (message.contains("json") || message.contains("format")) {
            return true;
        }

        // 内容安全违规不可重试
        if (message.contains("safety") || message.contains("policy")) {
            return false;
        }

        return false;
    }

    /**
     * 获取错误严重级别
     */
    public static ErrorSeverity getSeverity(Exception ex) {
        if (ex instanceof AuthenticationException) {
            return ErrorSeverity.CRITICAL;
        }
        if (ex instanceof RateLimitException) {
            return ErrorSeverity.WARNING;
        }
        if (ex instanceof TimeoutException) {
            return ErrorSeverity.WARNING;
        }
        return ErrorSeverity.ERROR;
    }

    public enum ErrorSeverity {
        CRITICAL,  // 需要立即处理
        ERROR,     // 需要告警
        WARNING,   // 需要监控
        INFO       // 仅记录
    }
}

🔄 重试机制设计

指数退避算法

指数退避(Exponential Backoff)是处理瞬时错误的最佳实践:

第1次重试:等待 1秒
第2次重试:等待 2秒
第3次重试:等待 4秒
第4次重试:等待 8秒

Resilience4j集成

<!-- pom.xml -->
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-retry</artifactId>
    <version>2.1.0</version>
</dependency>

重试配置

# application.yml
resilience4j:
  retry:
    instances:
      llmService:
        max-attempts: 4
        wait-duration: 1s
        enable-exponential-backoff: true
        exponential-backoff-multiplier: 2
        retry-exceptions:
          - dev.langchain4j.exception.RateLimitException
          - dev.langchain4j.exception.TimeoutException
          - dev.langchain4j.exception.ServiceUnavailableException
        ignore-exceptions:
          - dev.langchain4j.exception.AuthenticationException
          - dev.langchain4j.exception.InvalidRequestException

重试服务实现

package com.example.langchain4j.service;

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.data.message.AiMessage;
import io.github.resilience4j.retry.annotation.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

@Slf4j
@Service
@RequiredArgsConstructor
public class ResilientLLMService {

    private final ChatLanguageModel model;

    /**
     * 带重试的LLM调用
     */
    @Retry(name = "llmService", fallbackMethod = "generateFallback")
    public String generate(String prompt) {
        log.info("Calling LLM with prompt: {}", prompt);
        try {
            AiMessage response = model.generate(prompt);
            log.info("LLM call succeeded");
            return response.text();
        } catch (Exception ex) {
            log.warn("LLM call failed: {}", ex.getMessage());
            throw ex; // 让Resilience4j处理重试
        }
    }

    /**
     * 降级方法
     */
    private String generateFallback(String prompt, Exception ex) {
        log.error("All retry attempts failed for prompt: {}", prompt, ex);
        return "抱歉,服务暂时不可用,请稍后重试。";
    }
}

自定义重试配置

package com.example.langchain4j.config;

import com.example.langchain4j.error.ErrorClassifier;
import io.github.resilience4j.retry.Retry;
import io.github.resilience4j.retry.RetryConfig;
import io.github.resilience4j.retry.RetryRegistry;
import lombok.extern.slf4j.Slf4j;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.time.Duration;

@Slf4j
@Configuration
public class RetryConfiguration {

    @Bean
    public RetryRegistry retryRegistry() {
        RetryConfig config = RetryConfig.custom()
            .maxAttempts(4)
            .waitDuration(Duration.ofSeconds(1))
            .intervalFunction(io.github.resilience4j.core.IntervalFunction
                .ofExponentialBackoff(Duration.ofSeconds(1), 2))
            .retryOnException(ErrorClassifier::isRetryable)
            .onRetry(event -> {
                log.warn("Retry attempt {} for operation: {}",
                    event.getNumberOfRetryAttempts(),
                    event.getName());
            })
            .onSuccess(event -> {
                log.info("Operation succeeded after {} attempts",
                    event.getNumberOfRetryAttempts());
            })
            .onError(event -> {
                log.error("All retry attempts failed for operation: {}",
                    event.getName(), event.getLastThrowable());
            })
            .build();

        return RetryRegistry.of(config);
    }

    @Bean
    public Retry llmRetry(RetryRegistry registry) {
        return registry.retry("llmService");
    }
}

编程式重试使用

package com.example.langchain4j.service;

import io.github.resilience4j.retry.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

import java.util.function.Supplier;

@Slf4j
@Service
@RequiredArgsConstructor
public class ProgrammaticRetryService {

    private final Retry llmRetry;

    /**
     * 编程式重试调用
     */
    public String callWithRetry(Supplier<String> operation) {
        Supplier<String> decoratedSupplier = Retry
            .decorateSupplier(llmRetry, operation);

        try {
            return decoratedSupplier.get();
        } catch (Exception ex) {
            log.error("Operation failed after all retries", ex);
            throw ex;
        }
    }

    /**
     * 使用示例
     */
    public String generateText(String prompt) {
        return callWithRetry(() -> {
            log.info("Executing LLM call");
            // 实际的LLM调用
            return performLLMCall(prompt);
        });
    }

    private String performLLMCall(String prompt) {
        // 实际实现
        return "Generated text";
    }
}

🔌 熔断器模式

熔断器(Circuit Breaker)可以防止系统持续调用已经故障的服务,快速失败并进行降级。

熔断器状态机

CLOSED (闭合) → OPEN (开启) → HALF_OPEN (半开) → CLOSED
   ↓                ↓              ↓
正常运行      快速失败      尝试恢复

状态转换规则

  • CLOSED → OPEN: 失败率超过阈值(如50%)
  • OPEN → HALF_OPEN: 等待时间到达(如60秒)
  • HALF_OPEN → CLOSED: 测试调用成功
  • HALF_OPEN → OPEN: 测试调用失败

熔断器配置

# application.yml
resilience4j:
  circuitbreaker:
    instances:
      llmService:
        # 失败率阈值(50%)
        failure-rate-threshold: 50
        # 慢调用比例阈值
        slow-call-rate-threshold: 50
        # 慢调用时间阈值(5秒)
        slow-call-duration-threshold: 5s
        # 滑动窗口大小
        sliding-window-size: 10
        # 滑动窗口类型(基于次数)
        sliding-window-type: count_based
        # 最小调用次数(达到后才计算失败率)
        minimum-number-of-calls: 5
        # OPEN状态等待时间
        wait-duration-in-open-state: 60s
        # HALF_OPEN状态允许的调用次数
        permitted-number-of-calls-in-half-open-state: 3
        # 自动从OPEN转换到HALF_OPEN
        automatic-transition-from-open-to-half-open-enabled: true
        # 记录的异常
        record-exceptions:
          - java.lang.Exception
        # 忽略的异常
        ignore-exceptions:
          - dev.langchain4j.exception.AuthenticationException

熔断器服务实现

package com.example.langchain4j.service;

import dev.langchain4j.model.chat.ChatLanguageModel;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

@Slf4j
@Service
@RequiredArgsConstructor
public class CircuitBreakerLLMService {

    private final ChatLanguageModel primaryModel;
    private final ChatLanguageModel backupModel;

    /**
     * 带熔断器的LLM调用
     */
    @CircuitBreaker(name = "llmService", fallbackMethod = "generateFallback")
    public String generate(String prompt) {
        log.info("Calling primary LLM model");
        return primaryModel.generate(prompt).content();
    }

    /**
     * 熔断器打开时的降级方法
     */
    private String generateFallback(String prompt, Exception ex) {
        log.warn("Circuit breaker open, using backup model");
        try {
            return backupModel.generate(prompt).content();
        } catch (Exception backupEx) {
            log.error("Backup model also failed", backupEx);
            return "服务暂时不可用,请稍后重试。";
        }
    }
}

熔断器监听器

package com.example.langchain4j.config;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.github.resilience4j.circuitbreaker.event.CircuitBreakerOnStateTransitionEvent;
import jakarta.annotation.PostConstruct;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

@Slf4j
@Component
@RequiredArgsConstructor
public class CircuitBreakerEventListener {

    private final CircuitBreakerRegistry registry;

    @PostConstruct
    public void init() {
        registry.circuitBreaker("llmService")
            .getEventPublisher()
            .onStateTransition(this::onStateTransition)
            .onSuccess(event -> log.info("Call succeeded"))
            .onError(event -> log.warn("Call failed: {}",
                event.getThrowable().getMessage()))
            .onIgnoredError(event -> log.debug("Error ignored: {}",
                event.getThrowable().getMessage()));
    }

    private void onStateTransition(CircuitBreakerOnStateTransitionEvent event) {
        log.warn("Circuit breaker state changed: {} -> {}",
            event.getStateTransition().getFromState(),
            event.getStateTransition().getToState());

        // 发送告警通知
        if (event.getStateTransition().getToState() == CircuitBreaker.State.OPEN) {
            sendAlert("Circuit breaker opened for: " + event.getCircuitBreakerName());
        }
    }

    private void sendAlert(String message) {
        // 实现告警逻辑(邮件、钉钉、Slack等)
        log.error("ALERT: {}", message);
    }
}

熔断器状态查询

package com.example.langchain4j.controller;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;

import java.util.HashMap;
import java.util.Map;

@RestController
@RequestMapping("/api/circuit-breaker")
@RequiredArgsConstructor
public class CircuitBreakerController {

    private final CircuitBreakerRegistry registry;

    /**
     * 获取熔断器状态
     */
    @GetMapping("/status/{name}")
    public Map<String, Object> getStatus(@PathVariable String name) {
        CircuitBreaker circuitBreaker = registry.circuitBreaker(name);
        CircuitBreaker.Metrics metrics = circuitBreaker.getMetrics();

        Map<String, Object> status = new HashMap<>();
        status.put("state", circuitBreaker.getState().toString());
        status.put("failureRate", metrics.getFailureRate());
        status.put("slowCallRate", metrics.getSlowCallRate());
        status.put("numberOfBufferedCalls", metrics.getNumberOfBufferedCalls());
        status.put("numberOfFailedCalls", metrics.getNumberOfFailedCalls());
        status.put("numberOfSuccessfulCalls", metrics.getNumberOfSuccessfulCalls());
        status.put("numberOfSlowCalls", metrics.getNumberOfSlowCalls());

        return status;
    }

    /**
     * 手动切换熔断器状态
     */
    @PostMapping("/transition/{name}")
    public String transitionState(
            @PathVariable String name,
            @RequestParam String toState) {

        CircuitBreaker circuitBreaker = registry.circuitBreaker(name);

        switch (toState.toUpperCase()) {
            case "CLOSED" -> circuitBreaker.transitionToClosedState();
            case "OPEN" -> circuitBreaker.transitionToOpenState();
            case "HALF_OPEN" -> circuitBreaker.transitionToHalfOpenState();
            case "DISABLED" -> circuitBreaker.transitionToDisabledState();
            case "FORCED_OPEN" -> circuitBreaker.transitionToForcedOpenState();
            default -> throw new IllegalArgumentException("Invalid state: " + toState);
        }

        return "Transitioned to " + toState;
    }
}

📉 降级策略

降级是保障系统可用性的最后一道防线。当主服务不可用时,通过降级策略提供有限但可用的服务。

多层降级架构

第1层: GPT-4 (高质量,高成本)
  ↓ 失败
第2层: GPT-3.5 (中等质量,中等成本)
  ↓ 失败
第3层: 本地模型 (基本质量,低成本)
  ↓ 失败
第4层: 静态响应 (保底方案)

模型降级链实现

package com.example.langchain4j.service;

import dev.langchain4j.model.chat.ChatLanguageModel;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

import java.util.List;

@Slf4j
@Service
public class FallbackChainService {

    private final List<ModelTier> modelTiers;

    public FallbackChainService(
            ChatLanguageModel gpt4Model,
            ChatLanguageModel gpt35Model,
            ChatLanguageModel localModel) {

        this.modelTiers = List.of(
            new ModelTier("GPT-4", gpt4Model, 1),
            new ModelTier("GPT-3.5", gpt35Model, 2),
            new ModelTier("Local Model", localModel, 3)
        );
    }

    /**
     * 级联降级调用
     */
    public String generateWithFallback(String prompt) {
        Exception lastException = null;

        for (ModelTier tier : modelTiers) {
            try {
                log.info("Trying tier {}: {}", tier.level, tier.name);
                String result = tier.model.generate(prompt).content();
                log.info("Tier {} succeeded", tier.level);
                return result;

            } catch (Exception ex) {
                log.warn("Tier {} failed: {}", tier.level, ex.getMessage());
                lastException = ex;
                // 继续尝试下一层
            }
        }

        // 所有层级都失败,返回静态响应
        log.error("All model tiers failed", lastException);
        return getFallbackResponse(prompt);
    }

    /**
     * 最终的静态降级响应
     */
    private String getFallbackResponse(String prompt) {
        return "抱歉,AI服务暂时不可用。我们已记录您的请求,稍后会为您处理。";
    }

    private record ModelTier(
        String name,
        ChatLanguageModel model,
        int level
    ) {}
}

功能级别降级

package com.example.langchain4j.service;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;
import org.springframework.stereotype.Service;

import java.time.Duration;
import java.time.Instant;
import java.util.Optional;

@Slf4j
@Service
@RequiredArgsConstructor
public class FeatureFallbackService {

    private final ChatLanguageModel realtimeModel;
    private final CacheManager cacheManager;

    /**
     * 功能降级:实时 → 缓存 → 静态
     */
    public String getAnswer(String question) {
        // 第1层:尝试实时生成
        try {
            log.info("Attempting real-time generation");
            String answer = realtimeModel.generate(question).content();
            cacheAnswer(question, answer);
            return answer;
        } catch (Exception ex) {
            log.warn("Real-time generation failed: {}", ex.getMessage());
        }

        // 第2层:从缓存获取
        Optional<String> cachedAnswer = getCachedAnswer(question);
        if (cachedAnswer.isPresent()) {
            log.info("Returning cached answer");
            return cachedAnswer.get() + "\n\n(来自缓存)";
        }

        // 第3层:返回静态答案
        log.warn("Returning static fallback");
        return getStaticFallback(question);
    }

    private void cacheAnswer(String question, String answer) {
        Cache cache = cacheManager.getCache("llm-answers");
        if (cache != null) {
            cache.put(question, answer);
        }
    }

    private Optional<String> getCachedAnswer(String question) {
        Cache cache = cacheManager.getCache("llm-answers");
        if (cache != null) {
            Cache.ValueWrapper wrapper = cache.get(question);
            if (wrapper != null) {
                return Optional.of((String) wrapper.get());
            }
        }
        return Optional.empty();
    }

    private String getStaticFallback(String question) {
        return """
            抱歉,当前无法生成实时回答。

            常见问题可以参考:
            - 产品使用文档:https://docs.example.com
            - 技术支持:support@example.com
            - 客服热线:400-XXX-XXXX
            """;
    }
}

智能降级决策器

package com.example.langchain4j.service;

import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;

@Slf4j
@Component
public class DegradationDecider {

    private final AtomicInteger errorCount = new AtomicInteger(0);
    private final AtomicReference<Instant> lastErrorTime =
        new AtomicReference<>(Instant.now());

    // 降级阈值
    private static final int ERROR_THRESHOLD = 5;
    private static final Duration ERROR_WINDOW = Duration.ofMinutes(5);

    /**
     * 判断是否应该降级
     */
    public boolean shouldDegrade() {
        int errors = errorCount.get();
        Instant lastError = lastErrorTime.get();
        Duration timeSinceLastError = Duration.between(lastError, Instant.now());

        // 如果错误窗口过期,重置计数
        if (timeSinceLastError.compareTo(ERROR_WINDOW) > 0) {
            errorCount.set(0);
            return false;
        }

        // 如果错误次数超过阈值,进行降级
        return errors >= ERROR_THRESHOLD;
    }

    /**
     * 记录错误
     */
    public void recordError() {
        errorCount.incrementAndGet();
        lastErrorTime.set(Instant.now());
        log.warn("Error recorded. Total errors: {}", errorCount.get());
    }

    /**
     * 记录成功(可以逐步恢复)
     */
    public void recordSuccess() {
        if (errorCount.get() > 0) {
            errorCount.decrementAndGet();
            log.info("Success recorded. Remaining errors: {}", errorCount.get());
        }
    }

    /**
     * 重置状态
     */
    public void reset() {
        errorCount.set(0);
        log.info("Degradation state reset");
    }
}

降级模式控制器

package com.example.langchain4j.controller;

import com.example.langchain4j.service.DegradationDecider;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;

import java.util.Map;

@RestController
@RequestMapping("/api/degradation")
@RequiredArgsConstructor
public class DegradationController {

    private final DegradationDecider decider;

    /**
     * 获取降级状态
     */
    @GetMapping("/status")
    public Map<String, Object> getStatus() {
        return Map.of(
            "degraded", decider.shouldDegrade(),
            "mode", decider.shouldDegrade() ? "DEGRADED" : "NORMAL"
        );
    }

    /**
     * 手动触发降级
     */
    @PostMapping("/enable")
    public String enableDegradation() {
        for (int i = 0; i < 10; i++) {
            decider.recordError();
        }
        return "Degradation mode enabled";
    }

    /**
     * 恢复正常模式
     */
    @PostMapping("/disable")
    public String disableDegradation() {
        decider.reset();
        return "Normal mode restored";
    }
}

🚰 限流保护

限流(Rate Limiting)防止系统过载,保护后端服务和控制成本。

限流算法对比

算法原理优点缺点适用场景
固定窗口每分钟固定次数实现简单边界突刺简单场景
滑动窗口平滑的时间窗口更精确内存占用高精确控制
令牌桶恒定速率放入令牌允许突发复杂度中等API网关
漏桶恒定速率流出流量平滑不允许突发流量整形

Resilience4j限流配置

# application.yml
resilience4j:
  ratelimiter:
    instances:
      llmService:
        # 限流周期(每秒)
        limit-refresh-period: 1s
        # 周期内允许的请求数
        limit-for-period: 10
        # 等待许可的超时时间
        timeout-duration: 5s

限流器实现

package com.example.langchain4j.service;

import io.github.resilience4j.ratelimiter.annotation.RateLimiter;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

@Slf4j
@Service
@RequiredArgsConstructor
public class RateLimitedLLMService {

    private final ChatLanguageModel model;

    /**
     * 带限流的LLM调用
     */
    @RateLimiter(name = "llmService", fallbackMethod = "rateLimitFallback")
    public String generate(String prompt) {
        log.info("Processing request within rate limit");
        return model.generate(prompt).content();
    }

    /**
     * 限流降级方法
     */
    private String rateLimitFallback(String prompt, Exception ex) {
        log.warn("Rate limit exceeded for request");
        return "请求过于频繁,请稍后再试。";
    }
}

按用户限流

package com.example.langchain4j.service;

import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterConfig;
import io.github.resilience4j.ratelimiter.RateLimiterRegistry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;

@Slf4j
@Service
@RequiredArgsConstructor
public class PerUserRateLimiter {

    private final ConcurrentHashMap<String, RateLimiter> userLimiters =
        new ConcurrentHashMap<>();

    private final RateLimiterRegistry registry;

    /**
     * 获取或创建用户专属限流器
     */
    private RateLimiter getUserLimiter(String userId, int requestsPerMinute) {
        return userLimiters.computeIfAbsent(userId, id -> {
            RateLimiterConfig config = RateLimiterConfig.custom()
                .limitRefreshPeriod(Duration.ofMinutes(1))
                .limitForPeriod(requestsPerMinute)
                .timeoutDuration(Duration.ofSeconds(5))
                .build();

            return registry.rateLimiter("user-" + id, config);
        });
    }

    /**
     * 执行带用户限流的操作
     */
    public <T> T executeWithUserLimit(
            String userId,
            int requestsPerMinute,
            Supplier<T> operation) {

        RateLimiter limiter = getUserLimiter(userId, requestsPerMinute);

        Supplier<T> decoratedSupplier = RateLimiter
            .decorateSupplier(limiter, operation);

        try {
            return decoratedSupplier.get();
        } catch (Exception ex) {
            log.warn("Rate limit exceeded for user: {}", userId);
            throw new RateLimitExceededException(
                "用户 " + userId + " 请求过于频繁");
        }
    }

    /**
     * 使用示例
     */
    public String generateForUser(String userId, String prompt) {
        return executeWithUserLimit(userId, 10, () -> {
            log.info("Generating for user: {}", userId);
            // 实际的LLM调用
            return "Generated content";
        });
    }

    public static class RateLimitExceededException extends RuntimeException {
        public RateLimitExceededException(String message) {
            super(message);
        }
    }
}

分级限流策略

package com.example.langchain4j.service;

import lombok.Getter;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;

import java.time.Duration;

@Service
public class TieredRateLimitService {

    private final PerUserRateLimiter rateLimiter;

    /**
     * 根据用户等级获取限流配置
     */
    public int getRateLimitForUser(String userId, UserTier tier) {
        return switch (tier) {
            case FREE -> 10;      // 每分钟10次
            case BASIC -> 50;     // 每分钟50次
            case PRO -> 200;      // 每分钟200次
            case ENTERPRISE -> 1000; // 每分钟1000次
        };
    }

    /**
     * 带分级限流的调用
     */
    public String generateWithTier(
            String userId,
            UserTier tier,
            String prompt) {

        int limit = getRateLimitForUser(userId, tier);

        return rateLimiter.executeWithUserLimit(userId, limit, () -> {
            // 实际的LLM调用
            return performGeneration(prompt);
        });
    }

    private String performGeneration(String prompt) {
        // 实际实现
        return "Generated content";
    }

    @Getter
    @RequiredArgsConstructor
    public enum UserTier {
        FREE("免费版", 10),
        BASIC("基础版", 50),
        PRO("专业版", 200),
        ENTERPRISE("企业版", 1000);

        private final String displayName;
        private final int requestsPerMinute;
    }
}

动态限流调整

package com.example.langchain4j.service;

import io.github.resilience4j.ratelimiter.RateLimiter;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;

import java.time.LocalDateTime;
import java.time.LocalTime;

@Slf4j
@Service
@RequiredArgsConstructor
public class DynamicRateLimitService {

    private final RateLimiter rateLimiter;

    /**
     * 根据时间段动态调整限流
     * 高峰期更严格,低峰期更宽松
     */
    @Scheduled(fixedRate = 60000) // 每分钟检查一次
    public void adjustRateLimit() {
        LocalTime now = LocalTime.now();
        int newLimit;

        // 高峰期 (9:00-12:00, 14:00-18:00)
        if (isPeakHour(now)) {
            newLimit = 5; // 更严格的限流
            log.info("Peak hour detected, setting strict rate limit: {}", newLimit);
        }
        // 低峰期
        else {
            newLimit = 20; // 更宽松的限流
            log.info("Off-peak hour, relaxing rate limit: {}", newLimit);
        }

        // 动态更新限流配置
        rateLimiter.changeLimitForPeriod(newLimit);
    }

    private boolean isPeakHour(LocalTime time) {
        return (time.isAfter(LocalTime.of(9, 0)) &&
                time.isBefore(LocalTime.of(12, 0))) ||
               (time.isAfter(LocalTime.of(14, 0)) &&
                time.isBefore(LocalTime.of(18, 0)));
    }

    /**
     * 根据系统负载动态调整
     */
    public void adjustBasedOnLoad(double cpuUsage, double memoryUsage) {
        int newLimit;

        if (cpuUsage > 0.8 || memoryUsage > 0.8) {
            newLimit = 3; // 系统高负载,严格限流
            log.warn("High system load detected, reducing rate limit to: {}", newLimit);
        } else if (cpuUsage > 0.6 || memoryUsage > 0.6) {
            newLimit = 10; // 中等负载
            log.info("Medium system load, setting moderate rate limit: {}", newLimit);
        } else {
            newLimit = 20; // 低负载,宽松限流
            log.info("Low system load, setting relaxed rate limit: {}", newLimit);
        }

        rateLimiter.changeLimitForPeriod(newLimit);
    }
}

💬 优雅的错误响应

用户友好的错误信息可以显著提升用户体验。

错误码体系设计

package com.example.langchain4j.error;

import lombok.Getter;
import lombok.RequiredArgsConstructor;

@Getter
@RequiredArgsConstructor
public enum ErrorCode {

    // 1xxx: 客户端错误
    INVALID_REQUEST(1001, "请求参数无效"),
    RATE_LIMIT_EXCEEDED(1002, "请求过于频繁,请稍后重试"),
    UNAUTHORIZED(1003, "认证失败,请检查API密钥"),
    QUOTA_EXCEEDED(1004, "已达使用配额上限"),

    // 2xxx: 业务错误
    CONTENT_POLICY_VIOLATION(2001, "内容违反使用政策"),
    OUTPUT_VALIDATION_FAILED(2002, "输出验证失败"),
    CONTEXT_LENGTH_EXCEEDED(2003, "上下文长度超出限制"),

    // 3xxx: 服务端错误
    SERVICE_UNAVAILABLE(3001, "服务暂时不可用"),
    MODEL_OVERLOADED(3002, "模型负载过高"),
    TIMEOUT(3003, "请求超时"),
    INTERNAL_ERROR(3999, "内部错误");

    private final int code;
    private final String message;

    public String getCodeString() {
        return "LLM-" + code;
    }
}

统一错误响应

package com.example.langchain4j.dto;

import com.example.langchain4j.error.ErrorCode;
import com.fasterxml.jackson.annotation.JsonInclude;
import lombok.Builder;
import lombok.Data;

import java.time.Instant;

@Data
@Builder
@JsonInclude(JsonInclude.Include.NON_NULL)
public class ErrorResponse {

    private String code;           // 错误码
    private String message;        // 用户友好的错误信息
    private String technicalDetail; // 技术详情(可选)
    private String suggestion;     // 建议的解决方案
    private Instant timestamp;     // 时间戳
    private String requestId;      // 请求追踪ID

    public static ErrorResponse from(ErrorCode errorCode) {
        return ErrorResponse.builder()
            .code(errorCode.getCodeString())
            .message(errorCode.getMessage())
            .timestamp(Instant.now())
            .build();
    }

    public static ErrorResponse from(ErrorCode errorCode, String suggestion) {
        return ErrorResponse.builder()
            .code(errorCode.getCodeString())
            .message(errorCode.getMessage())
            .suggestion(suggestion)
            .timestamp(Instant.now())
            .build();
    }

    public static ErrorResponse fromException(
            ErrorCode errorCode,
            Exception ex,
            String requestId) {

        return ErrorResponse.builder()
            .code(errorCode.getCodeString())
            .message(errorCode.getMessage())
            .technicalDetail(ex.getMessage())
            .timestamp(Instant.now())
            .requestId(requestId)
            .build();
    }
}

全局异常处理器

package com.example.langchain4j.exception;

import com.example.langchain4j.dto.ErrorResponse;
import com.example.langchain4j.error.ErrorCode;
import dev.langchain4j.exception.*;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.util.UUID;

@Slf4j
@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(RateLimitException.class)
    public ResponseEntity<ErrorResponse> handleRateLimit(RateLimitException ex) {
        log.warn("Rate limit exceeded: {}", ex.getMessage());

        ErrorResponse response = ErrorResponse.from(
            ErrorCode.RATE_LIMIT_EXCEEDED,
            "请等待片刻后重试,或考虑升级到更高的服务等级"
        );

        return ResponseEntity
            .status(HttpStatus.TOO_MANY_REQUESTS)
            .body(response);
    }

    @ExceptionHandler(TimeoutException.class)
    public ResponseEntity<ErrorResponse> handleTimeout(TimeoutException ex) {
        log.warn("Request timeout: {}", ex.getMessage());

        ErrorResponse response = ErrorResponse.from(
            ErrorCode.TIMEOUT,
            "请尝试简化您的请求或重试"
        );

        return ResponseEntity
            .status(HttpStatus.GATEWAY_TIMEOUT)
            .body(response);
    }

    @ExceptionHandler(AuthenticationException.class)
    public ResponseEntity<ErrorResponse> handleAuthentication(
            AuthenticationException ex) {

        log.error("Authentication failed: {}", ex.getMessage());

        ErrorResponse response = ErrorResponse.from(
            ErrorCode.UNAUTHORIZED,
            "请检查您的API密钥是否正确配置"
        );

        return ResponseEntity
            .status(HttpStatus.UNAUTHORIZED)
            .body(response);
    }

    @ExceptionHandler(ServiceUnavailableException.class)
    public ResponseEntity<ErrorResponse> handleServiceUnavailable(
            ServiceUnavailableException ex) {

        String requestId = UUID.randomUUID().toString();
        log.error("Service unavailable (requestId: {}): {}", requestId, ex.getMessage());

        ErrorResponse response = ErrorResponse.fromException(
            ErrorCode.SERVICE_UNAVAILABLE,
            ex,
            requestId
        );
        response.setSuggestion("我们的工程师已收到通知,请稍后重试");

        return ResponseEntity
            .status(HttpStatus.SERVICE_UNAVAILABLE)
            .body(response);
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<ErrorResponse> handleGenericException(Exception ex) {
        String requestId = UUID.randomUUID().toString();
        log.error("Unexpected error (requestId: {}): {}", requestId, ex.getMessage(), ex);

        ErrorResponse response = ErrorResponse.fromException(
            ErrorCode.INTERNAL_ERROR,
            ex,
            requestId
        );
        response.setSuggestion("请联系技术支持,并提供请求ID: " + requestId);

        return ResponseEntity
            .status(HttpStatus.INTERNAL_SERVER_ERROR)
            .body(response);
    }
}

国际化错误信息

package com.example.langchain4j.service;

import com.example.langchain4j.error.ErrorCode;
import lombok.RequiredArgsConstructor;
import org.springframework.context.MessageSource;
import org.springframework.context.i18n.LocaleContextHolder;
import org.springframework.stereotype.Service;

import java.util.Locale;

@Service
@RequiredArgsConstructor
public class ErrorMessageService {

    private final MessageSource messageSource;

    /**
     * 获取本地化的错误信息
     */
    public String getMessage(ErrorCode errorCode) {
        return getMessage(errorCode, LocaleContextHolder.getLocale());
    }

    public String getMessage(ErrorCode errorCode, Locale locale) {
        return messageSource.getMessage(
            "error." + errorCode.name().toLowerCase(),
            null,
            errorCode.getMessage(), // 默认消息
            locale
        );
    }

    /**
     * 获取带建议的错误信息
     */
    public String getMessageWithSuggestion(ErrorCode errorCode) {
        String message = getMessage(errorCode);
        String suggestion = getSuggestion(errorCode);

        if (suggestion != null && !suggestion.isEmpty()) {
            return message + "\n\n建议:" + suggestion;
        }

        return message;
    }

    private String getSuggestion(ErrorCode errorCode) {
        Locale locale = LocaleContextHolder.getLocale();
        return messageSource.getMessage(
            "error." + errorCode.name().toLowerCase() + ".suggestion",
            null,
            "",
            locale
        );
    }
}
# messages_zh_CN.properties
error.rate_limit_exceeded=请求过于频繁,请稍后重试
error.rate_limit_exceeded.suggestion=您可以等待1-5分钟后重试,或升级到更高的服务等级

error.timeout=请求处理超时
error.timeout.suggestion=请尝试简化您的问题,或将长文本分段处理

error.service_unavailable=服务暂时不可用
error.service_unavailable.suggestion=我们正在努力恢复服务,请稍后重试

error.unauthorized=认证失败
error.unauthorized.suggestion=请检查您的API密钥是否正确配置,或联系管理员
# messages_en_US.properties
error.rate_limit_exceeded=Rate limit exceeded, please try again later
error.rate_limit_exceeded.suggestion=You can wait 1-5 minutes or upgrade your service tier

error.timeout=Request timeout
error.timeout.suggestion=Try simplifying your request or processing long texts in segments

error.service_unavailable=Service temporarily unavailable
error.service_unavailable.suggestion=We're working to restore service, please try again later

error.unauthorized=Authentication failed
error.unauthorized.suggestion=Please check your API key configuration or contact administrator

🏗️ 生产级容错架构

将所有容错机制整合,构建完整的防御体系。

多层防御架构图

用户请求
    ↓
[1. 限流层] → 防止过载
    ↓
[2. 重试层] → 处理瞬时错误
    ↓
[3. 熔断层] → 快速失败
    ↓
[4. 降级层] → 保底服务
    ↓
响应返回

完整容错服务

package com.example.langchain4j.service;

import com.example.langchain4j.dto.ErrorResponse;
import com.example.langchain4j.error.ErrorCode;
import dev.langchain4j.model.chat.ChatLanguageModel;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import io.github.resilience4j.ratelimiter.annotation.RateLimiter;
import io.github.resilience4j.retry.annotation.Retry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

@Slf4j
@Service
@RequiredArgsConstructor
public class ResilientLLMServiceComplete {

    private final ChatLanguageModel primaryModel;
    private final ChatLanguageModel backupModel;
    private final FallbackChainService fallbackChain;
    private final ErrorMessageService errorMessageService;

    /**
     * 完整的容错调用链
     * 1. 限流保护
     * 2. 重试机制
     * 3. 熔断保护
     * 4. 降级策略
     */
    @RateLimiter(name = "llmService")
    @Retry(name = "llmService")
    @CircuitBreaker(name = "llmService", fallbackMethod = "completeFallback")
    public String generate(String prompt) {
        log.info("Processing request with full resilience");

        try {
            // 主模型调用
            return primaryModel.generate(prompt).content();

        } catch (Exception ex) {
            log.warn("Primary model failed: {}", ex.getMessage());
            throw ex; // 让Resilience4j处理
        }
    }

    /**
     * 完整的降级链
     */
    private String completeFallback(String prompt, Exception ex) {
        log.warn("Entering complete fallback chain", ex);

        // 第1层降级:尝试备用模型
        try {
            log.info("Trying backup model");
            return backupModel.generate(prompt).content();
        } catch (Exception backupEx) {
            log.warn("Backup model failed: {}", backupEx.getMessage());
        }

        // 第2层降级:使用降级链
        try {
            log.info("Trying fallback chain");
            return fallbackChain.generateWithFallback(prompt);
        } catch (Exception chainEx) {
            log.warn("Fallback chain failed: {}", chainEx.getMessage());
        }

        // 第3层降级:返回友好的错误信息
        log.error("All fallback attempts failed", ex);
        return errorMessageService.getMessageWithSuggestion(
            ErrorCode.SERVICE_UNAVAILABLE
        );
    }
}

监控和告警

package com.example.langchain4j.monitoring;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import io.github.resilience4j.ratelimiter.RateLimiter;
import io.github.resilience4j.ratelimiter.RateLimiterRegistry;
import io.github.resilience4j.retry.Retry;
import io.github.resilience4j.retry.RetryRegistry;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import java.util.HashMap;
import java.util.Map;

@Slf4j
@Component
@RequiredArgsConstructor
public class ResilienceMonitor {

    private final CircuitBreakerRegistry circuitBreakerRegistry;
    private final RetryRegistry retryRegistry;
    private final RateLimiterRegistry rateLimiterRegistry;

    /**
     * 定期收集健康指标
     */
    @Scheduled(fixedRate = 30000) // 每30秒
    public void collectMetrics() {
        Map<String, Object> metrics = new HashMap<>();

        // 收集熔断器指标
        circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
            metrics.put("circuitbreaker." + cb.getName() + ".state",
                cb.getState().toString());
            metrics.put("circuitbreaker." + cb.getName() + ".failureRate",
                cb.getMetrics().getFailureRate());
            metrics.put("circuitbreaker." + cb.getName() + ".slowCallRate",
                cb.getMetrics().getSlowCallRate());
        });

        // 收集限流器指标
        rateLimiterRegistry.getAllRateLimiters().forEach(rl -> {
            metrics.put("ratelimiter." + rl.getName() + ".availablePermissions",
                rl.getMetrics().getAvailablePermissions());
            metrics.put("ratelimiter." + rl.getName() + ".waitingThreads",
                rl.getMetrics().getNumberOfWaitingThreads());
        });

        // 记录指标
        log.info("Resilience metrics: {}", metrics);

        // 检查告警条件
        checkAlerts(metrics);
    }

    /**
     * 检查告警条件
     */
    private void checkAlerts(Map<String, Object> metrics) {
        // 检查熔断器状态
        circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
            if (cb.getState() == CircuitBreaker.State.OPEN) {
                sendAlert("熔断器打开: " + cb.getName());
            }

            if (cb.getMetrics().getFailureRate() > 30) {
                sendAlert("失败率过高: " + cb.getName() +
                    " - " + cb.getMetrics().getFailureRate() + "%");
            }
        });

        // 检查限流器状态
        rateLimiterRegistry.getAllRateLimiters().forEach(rl -> {
            if (rl.getMetrics().getNumberOfWaitingThreads() > 10) {
                sendAlert("限流等待线程过多: " + rl.getName());
            }
        });
    }

    /**
     * 发送告警
     */
    private void sendAlert(String message) {
        log.error("ALERT: {}", message);
        // 实际实现:发送邮件、钉钉、Slack等
    }
}

健康检查端点

package com.example.langchain4j.controller;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import lombok.RequiredArgsConstructor;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.HashMap;
import java.util.Map;

@Component
@RequiredArgsConstructor
class LLMServiceHealthIndicator implements HealthIndicator {

    private final CircuitBreakerRegistry registry;

    @Override
    public Health health() {
        Map<String, Object> details = new HashMap<>();
        boolean allHealthy = true;

        for (CircuitBreaker cb : registry.getAllCircuitBreakers()) {
            String state = cb.getState().toString();
            details.put(cb.getName(), state);

            if (cb.getState() == CircuitBreaker.State.OPEN) {
                allHealthy = false;
            }
        }

        if (allHealthy) {
            return Health.up().withDetails(details).build();
        } else {
            return Health.down().withDetails(details).build();
        }
    }
}

@RestController
@RequestMapping("/api/health")
@RequiredArgsConstructor
class HealthCheckController {

    private final LLMServiceHealthIndicator healthIndicator;
    private final CircuitBreakerRegistry circuitBreakerRegistry;

    @GetMapping("/detailed")
    public Map<String, Object> detailedHealth() {
        Map<String, Object> health = new HashMap<>();

        circuitBreakerRegistry.getAllCircuitBreakers().forEach(cb -> {
            Map<String, Object> cbHealth = new HashMap<>();
            cbHealth.put("state", cb.getState().toString());
            cbHealth.put("failureRate", cb.getMetrics().getFailureRate());
            cbHealth.put("slowCallRate", cb.getMetrics().getSlowCallRate());
            cbHealth.put("bufferedCalls", cb.getMetrics().getNumberOfBufferedCalls());
            cbHealth.put("failedCalls", cb.getMetrics().getNumberOfFailedCalls());
            cbHealth.put("successfulCalls", cb.getMetrics().getNumberOfSuccessfulCalls());

            health.put(cb.getName(), cbHealth);
        });

        return health;
    }
}

完整配置示例

# application-production.yml
spring:
  application:
    name: langchain4j-resilient-service

# Resilience4j完整配置
resilience4j:
  # 重试配置
  retry:
    instances:
      llmService:
        max-attempts: 4
        wait-duration: 1s
        enable-exponential-backoff: true
        exponential-backoff-multiplier: 2
        retry-exceptions:
          - dev.langchain4j.exception.RateLimitException
          - dev.langchain4j.exception.TimeoutException
          - dev.langchain4j.exception.ServiceUnavailableException
        ignore-exceptions:
          - dev.langchain4j.exception.AuthenticationException

  # 熔断器配置
  circuitbreaker:
    instances:
      llmService:
        failure-rate-threshold: 50
        slow-call-rate-threshold: 50
        slow-call-duration-threshold: 5s
        sliding-window-size: 10
        sliding-window-type: count_based
        minimum-number-of-calls: 5
        wait-duration-in-open-state: 60s
        permitted-number-of-calls-in-half-open-state: 3
        automatic-transition-from-open-to-half-open-enabled: true

  # 限流配置
  ratelimiter:
    instances:
      llmService:
        limit-refresh-period: 1s
        limit-for-period: 10
        timeout-duration: 5s

  # 舱壁隔离配置
  bulkhead:
    instances:
      llmService:
        max-concurrent-calls: 5
        max-wait-duration: 10s

# 监控配置
management:
  endpoints:
    web:
      exposure:
        include: health,metrics,circuitbreakers,ratelimiters
  health:
    circuitbreakers:
      enabled: true
  metrics:
    export:
      prometheus:
        enabled: true

# 日志配置
logging:
  level:
    com.example.langchain4j: INFO
    io.github.resilience4j: DEBUG

💡 实战练习

练习1:实现完整的容错服务

创建一个LLM服务,整合重试、熔断、限流和降级机制。

任务

  1. 配置Resilience4j的所有组件
  2. 实现多层降级策略
  3. 添加监控和告警
  4. 编写集成测试验证各种错误场景

练习2:模拟故障场景

创建故障注入器,测试容错机制的有效性。

@Service
public class ChaosEngineeringService {

    private Random random = new Random();

    public void injectChaos(String prompt) {
        int scenario = random.nextInt(5);

        switch (scenario) {
            case 0 -> throw new RateLimitException("模拟限流");
            case 1 -> throw new TimeoutException("模拟超时");
            case 2 -> simulateSlowResponse();
            case 3 -> throw new ServiceUnavailableException("模拟服务不可用");
            default -> {} // 正常执行
        }
    }

    private void simulateSlowResponse() {
        try {
            Thread.sleep(6000); // 模拟慢响应
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

练习3:优化错误响应

为你的应用设计一套完整的错误码体系和用户友好的错误信息。

要求

  1. 定义至少20个错误码
  2. 提供中英文双语支持
  3. 每个错误都有建议的解决方案
  4. 实现错误分类和严重级别

最后更新:2026-03-09 字数统计:5,000 字 预计阅读时间:40 分钟