时间:45分钟 | 难度:⭐⭐⭐⭐ | Week 4 Day 27
📋 学习目标
- 掌握生产级LangChain4J项目架构设计
- 理解安全最佳实践和API密钥管理
- 学会完整的测试策略(单元测试、集成测试、E2E测试)
- 掌握Docker和Kubernetes部署方案
- 熟悉50条生产环境最佳实践
- 理解从Demo到Production的迁移路径
- 建立生产就绪检查清单(Production Readiness Checklist)
🏗️ 生产级项目架构
完整项目结构
langchain4j-production/
├── src/
│ ├── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── example/
│ │ │ └── ai/
│ │ │ ├── config/ # 配置层
│ │ │ │ ├── LangChainConfig.java
│ │ │ │ ├── VectorStoreConfig.java
│ │ │ │ ├── SecurityConfig.java
│ │ │ │ └── MonitoringConfig.java
│ │ │ ├── domain/ # 领域层
│ │ │ │ ├── model/
│ │ │ │ │ ├── ChatMessage.java
│ │ │ │ │ ├── Document.java
│ │ │ │ │ └── SearchResult.java
│ │ │ │ ├── service/
│ │ │ │ │ ├── ChatService.java
│ │ │ │ │ ├── EmbeddingService.java
│ │ │ │ │ └── DocumentService.java
│ │ │ │ └── repository/
│ │ │ │ ├── VectorStoreRepository.java
│ │ │ │ └── ConversationRepository.java
│ │ │ ├── infrastructure/ # 基础设施层
│ │ │ │ ├── ai/
│ │ │ │ │ ├── LLMClient.java
│ │ │ │ │ ├── EmbeddingClient.java
│ │ │ │ │ └── VectorStoreClient.java
│ │ │ │ ├── cache/
│ │ │ │ │ ├── CacheManager.java
│ │ │ │ │ └── EmbeddingCache.java
│ │ │ │ ├── security/
│ │ │ │ │ ├── InputValidator.java
│ │ │ │ │ ├── OutputSanitizer.java
│ │ │ │ │ └── RateLimiter.java
│ │ │ │ └── monitoring/
│ │ │ │ ├── MetricsCollector.java
│ │ │ │ └── HealthIndicator.java
│ │ │ ├── api/ # API层
│ │ │ │ ├── controller/
│ │ │ │ │ ├── ChatController.java
│ │ │ │ │ ├── DocumentController.java
│ │ │ │ │ └── HealthController.java
│ │ │ │ ├── dto/
│ │ │ │ │ ├── ChatRequest.java
│ │ │ │ │ ├── ChatResponse.java
│ │ │ │ │ └── ErrorResponse.java
│ │ │ │ └── exception/
│ │ │ │ ├── GlobalExceptionHandler.java
│ │ │ │ └── RateLimitException.java
│ │ │ └── Application.java
│ │ └── resources/
│ │ ├── application.yml
│ │ ├── application-dev.yml
│ │ ├── application-prod.yml
│ │ ├── logback-spring.xml
│ │ └── db/
│ │ └── migration/
│ │ └── V1__init_schema.sql
│ └── test/
│ ├── java/
│ │ └── com/
│ │ └── example/
│ │ └── ai/
│ │ ├── integration/ # 集成测试
│ │ │ ├── ChatServiceIntegrationTest.java
│ │ │ └── VectorStoreIntegrationTest.java
│ │ ├── e2e/ # E2E测试
│ │ │ └── ChatFlowE2ETest.java
│ │ └── unit/ # 单元测试
│ │ ├── ChatServiceTest.java
│ │ ├── RateLimiterTest.java
│ │ └── InputValidatorTest.java
│ └── resources/
│ ├── application-test.yml
│ └── test-data/
├── docker/
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── init-scripts/
├── k8s/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ └── ingress.yaml
├── scripts/
│ ├── deploy.sh
│ ├── rollback.sh
│ └── health-check.sh
├── docs/
│ ├── API.md
│ ├── DEPLOYMENT.md
│ └── TROUBLESHOOTING.md
├── build.gradle.kts
├── settings.gradle.kts
├── .env.example
├── .gitignore
└── README.md
依赖管理(build.gradle.kts)
plugins {
id("org.springframework.boot") version "3.2.0"
id("io.spring.dependency-management") version "1.1.4"
kotlin("jvm") version "1.9.20"
kotlin("plugin.spring") version "1.9.20"
}
group = "com.example.ai"
version = "1.0.0"
java.sourceCompatibility = JavaVersion.VERSION_21
repositories {
mavenCentral()
}
dependencies {
// Spring Boot核心
implementation("org.springframework.boot:spring-boot-starter-web")
implementation("org.springframework.boot:spring-boot-starter-actuator")
implementation("org.springframework.boot:spring-boot-starter-validation")
implementation("org.springframework.boot:spring-boot-starter-cache")
implementation("org.springframework.boot:spring-boot-starter-data-jpa")
// LangChain4J核心
implementation("dev.langchain4j:langchain4j:0.36.2")
implementation("dev.langchain4j:langchain4j-spring-boot-starter:0.36.2")
implementation("dev.langchain4j:langchain4j-open-ai:0.36.2")
implementation("dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:0.36.2")
// 向量存储
implementation("dev.langchain4j:langchain4j-pgvector:0.36.2")
implementation("dev.langchain4j:langchain4j-qdrant:0.36.2")
// 缓存
implementation("com.github.ben-manes.caffeine:caffeine:3.1.8")
// 监控和指标
implementation("io.micrometer:micrometer-registry-prometheus")
implementation("io.micrometer:micrometer-tracing-bridge-brave")
// 安全
implementation("org.springframework.boot:spring-boot-starter-security")
implementation("io.jsonwebtoken:jjwt-api:0.12.3")
runtimeOnly("io.jsonwebtoken:jjwt-impl:0.12.3")
runtimeOnly("io.jsonwebtoken:jjwt-jackson:0.12.3")
// 限流
implementation("com.bucket4j:bucket4j-core:8.7.0")
implementation("com.bucket4j:bucket4j-redis:8.7.0")
// 数据库
implementation("org.postgresql:postgresql")
implementation("org.flywaydb:flyway-core")
// 工具
implementation("org.projectlombok:lombok")
annotationProcessor("org.projectlombok:lombok")
// 测试
testImplementation("org.springframework.boot:spring-boot-starter-test")
testImplementation("org.springframework.security:spring-security-test")
testImplementation("org.testcontainers:testcontainers:1.19.3")
testImplementation("org.testcontainers:postgresql:1.19.3")
testImplementation("org.testcontainers:junit-jupiter:1.19.3")
testImplementation("org.mockito:mockito-core")
testImplementation("org.mockito:mockito-junit-jupiter")
testImplementation("com.github.tomakehurst:wiremock-jre8:2.35.0")
}
tasks.withType<Test> {
useJUnitPlatform()
}
核心配置类
@Configuration
@EnableConfigurationProperties(LangChainProperties.class)
public class LangChainConfig {
private final LangChainProperties properties;
public LangChainConfig(LangChainProperties properties) {
this.properties = properties;
}
@Bean
public ChatLanguageModel chatLanguageModel() {
return OpenAiChatModel.builder()
.apiKey(properties.getApiKey())
.modelName(properties.getModelName())
.temperature(properties.getTemperature())
.timeout(Duration.ofSeconds(properties.getTimeout()))
.maxRetries(properties.getMaxRetries())
.logRequests(properties.isLogRequests())
.logResponses(properties.isLogResponses())
.build();
}
@Bean
public EmbeddingModel embeddingModel() {
return OpenAiEmbeddingModel.builder()
.apiKey(properties.getApiKey())
.modelName(properties.getEmbeddingModelName())
.timeout(Duration.ofSeconds(properties.getTimeout()))
.build();
}
@Bean
public EmbeddingStore<TextSegment> embeddingStore(DataSource dataSource) {
return PgVectorEmbeddingStore.builder()
.dataSource(dataSource)
.table(properties.getVectorStore().getTable())
.dimension(properties.getVectorStore().getDimension())
.build();
}
@Bean
public ContentRetriever contentRetriever(
EmbeddingStore<TextSegment> embeddingStore,
EmbeddingModel embeddingModel) {
return EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.maxResults(properties.getRetrieval().getMaxResults())
.minScore(properties.getRetrieval().getMinScore())
.build();
}
}
🔒 安全最佳实践
1. API密钥管理
环境变量方式(开发环境)
# application.yml
langchain4j:
api-key: ${OPENAI_API_KEY}
model-name: ${OPENAI_MODEL_NAME:gpt-4}
spring:
datasource:
url: ${DATABASE_URL}
username: ${DATABASE_USERNAME}
password: ${DATABASE_PASSWORD}
# .env文件(不要提交到Git)
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
OPENAI_MODEL_NAME=gpt-4
DATABASE_URL=jdbc:postgresql://localhost:5432/langchain4j
DATABASE_USERNAME=postgres
DATABASE_PASSWORD=secret
Vault集成(生产环境)
@Configuration
public class VaultConfig {
@Bean
public VaultTemplate vaultTemplate() {
VaultEndpoint endpoint = VaultEndpoint.create("vault.example.com", 8200);
VaultToken token = VaultToken.of(System.getenv("VAULT_TOKEN"));
return new VaultTemplate(
endpoint,
new TokenAuthentication(token)
);
}
@Bean
public String openAiApiKey(VaultTemplate vaultTemplate) {
VaultResponse response = vaultTemplate
.read("secret/data/langchain4j/openai");
return (String) response
.getRequiredData()
.get("api-key");
}
}
2. 输入验证
@Component
public class InputValidator {
private static final int MAX_INPUT_LENGTH = 4000;
private static final Pattern INJECTION_PATTERN =
Pattern.compile("(DROP|DELETE|UPDATE|INSERT|EXEC|SCRIPT)",
Pattern.CASE_INSENSITIVE);
public void validateChatInput(String input) {
// 检查空值
if (input == null || input.trim().isEmpty()) {
throw new ValidationException("输入不能为空");
}
// 检查长度
if (input.length() > MAX_INPUT_LENGTH) {
throw new ValidationException(
"输入超过最大长度限制: " + MAX_INPUT_LENGTH
);
}
// 检查注入攻击
if (INJECTION_PATTERN.matcher(input).find()) {
throw new SecurityException("检测到潜在的注入攻击");
}
// 检查特殊字符
if (containsMaliciousCharacters(input)) {
throw new SecurityException("输入包含不允许的特殊字符");
}
}
private boolean containsMaliciousCharacters(String input) {
// 检查控制字符、零宽字符等
return input.codePoints().anyMatch(cp ->
Character.isISOControl(cp) && cp != '\n' && cp != '\r' && cp != '\t'
);
}
public void validateDocumentUpload(MultipartFile file) {
// 检查文件大小
if (file.getSize() > 10 * 1024 * 1024) { // 10MB
throw new ValidationException("文件大小超过10MB限制");
}
// 检查文件类型
String contentType = file.getContentType();
List<String> allowedTypes = Arrays.asList(
"application/pdf",
"text/plain",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
);
if (!allowedTypes.contains(contentType)) {
throw new ValidationException("不支持的文件类型: " + contentType);
}
// 检查文件名
String filename = file.getOriginalFilename();
if (filename == null || filename.contains("..")) {
throw new SecurityException("非法的文件名");
}
}
}
3. 输出过滤
@Component
public class OutputSanitizer {
private static final Pattern PII_PATTERN = Pattern.compile(
// 匹配邮箱
"\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b|" +
// 匹配手机号(中国)
"\\b1[3-9]\\d{9}\\b|" +
// 匹配身份证号
"\\b\\d{17}[\\dXx]\\b"
);
private static final Pattern API_KEY_PATTERN = Pattern.compile(
"(?i)(api[_-]?key|secret|token|password)\\s*[:=]\\s*['\"]?([\\w\\-]+)['\"]?"
);
public String sanitizeOutput(String output) {
if (output == null) {
return null;
}
String sanitized = output;
// 移除PII信息
sanitized = removePII(sanitized);
// 移除API密钥
sanitized = removeApiKeys(sanitized);
// HTML转义(防止XSS)
sanitized = HtmlUtils.htmlEscape(sanitized);
return sanitized;
}
private String removePII(String text) {
Matcher matcher = PII_PATTERN.matcher(text);
StringBuffer result = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(result, "[已隐藏]");
}
matcher.appendTail(result);
return result.toString();
}
private String removeApiKeys(String text) {
Matcher matcher = API_KEY_PATTERN.matcher(text);
StringBuffer result = new StringBuffer();
while (matcher.find()) {
String key = matcher.group(1);
matcher.appendReplacement(result, key + ": [REDACTED]");
}
matcher.appendTail(result);
return result.toString();
}
}
4. 基于用户的限流
@Component
public class UserRateLimiter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
private final RateLimitConfig config;
public UserRateLimiter(RateLimitConfig config) {
this.config = config;
}
public boolean allowRequest(String userId) {
Bucket bucket = buckets.computeIfAbsent(userId, this::createBucket);
return bucket.tryConsume(1);
}
private Bucket createBucket(String userId) {
// 不同用户级别不同的限流策略
UserTier tier = getUserTier(userId);
Bandwidth limit = switch (tier) {
case FREE -> Bandwidth.builder()
.capacity(10) // 10次请求
.refillGreedy(10, Duration.ofMinutes(1)) // 每分钟
.build();
case PRO -> Bandwidth.builder()
.capacity(100)
.refillGreedy(100, Duration.ofMinutes(1))
.build();
case ENTERPRISE -> Bandwidth.builder()
.capacity(1000)
.refillGreedy(1000, Duration.ofMinutes(1))
.build();
};
return Bucket.builder()
.addLimit(limit)
.build();
}
private UserTier getUserTier(String userId) {
// 从数据库或缓存中获取用户级别
return config.getUserTier(userId);
}
// 定期清理不活跃的bucket
@Scheduled(fixedRate = 3600000) // 每小时
public void cleanup() {
buckets.entrySet().removeIf(entry ->
entry.getValue().getAvailableTokens() ==
entry.getValue().getAvailableTokens() // 简化示例
);
}
}
@RestControllerAdvice
public class RateLimitInterceptor {
private final UserRateLimiter rateLimiter;
@Around("@annotation(RateLimited)")
public Object checkRateLimit(ProceedingJoinPoint joinPoint) throws Throwable {
String userId = SecurityContextHolder.getContext()
.getAuthentication()
.getName();
if (!rateLimiter.allowRequest(userId)) {
throw new RateLimitExceededException(
"请求频率超限,请稍后再试"
);
}
return joinPoint.proceed();
}
}
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimited {
}
🧪 测试策略
1. 单元测试(Mock LLM)
@ExtendWith(MockitoExtension.class)
class ChatServiceTest {
@Mock
private ChatLanguageModel chatModel;
@Mock
private InputValidator inputValidator;
@Mock
private OutputSanitizer outputSanitizer;
@InjectMocks
private ChatService chatService;
@Test
@DisplayName("应该成功处理有效的聊天请求")
void shouldProcessValidChatRequest() {
// Given
String userMessage = "什么是LangChain4J?";
String expectedResponse = "LangChain4J是一个Java的AI框架...";
when(chatModel.generate(userMessage))
.thenReturn(Response.from(expectedResponse));
when(outputSanitizer.sanitizeOutput(expectedResponse))
.thenReturn(expectedResponse);
// When
String result = chatService.chat(userMessage);
// Then
assertEquals(expectedResponse, result);
verify(inputValidator).validateChatInput(userMessage);
verify(chatModel).generate(userMessage);
verify(outputSanitizer).sanitizeOutput(expectedResponse);
}
@Test
@DisplayName("应该拒绝过长的输入")
void shouldRejectTooLongInput() {
// Given
String longMessage = "x".repeat(5000);
doThrow(new ValidationException("输入过长"))
.when(inputValidator).validateChatInput(longMessage);
// When & Then
assertThrows(ValidationException.class, () -> {
chatService.chat(longMessage);
});
verify(chatModel, never()).generate(any());
}
@Test
@DisplayName("应该处理LLM超时")
void shouldHandleLLMTimeout() {
// Given
String message = "测试消息";
when(chatModel.generate(message))
.thenThrow(new RuntimeException("Timeout"));
// When & Then
assertThrows(ChatServiceException.class, () -> {
chatService.chat(message);
});
}
@Test
@DisplayName("应该过滤输出中的敏感信息")
void shouldFilterSensitiveInfoInOutput() {
// Given
String message = "我的联系方式";
String rawResponse = "您的邮箱是test@example.com,手机号是13812345678";
String sanitizedResponse = "您的邮箱是[已隐藏],手机号是[已隐藏]";
when(chatModel.generate(message))
.thenReturn(Response.from(rawResponse));
when(outputSanitizer.sanitizeOutput(rawResponse))
.thenReturn(sanitizedResponse);
// When
String result = chatService.chat(message);
// Then
assertEquals(sanitizedResponse, result);
assertFalse(result.contains("test@example.com"));
assertFalse(result.contains("13812345678"));
}
}
2. 集成测试(真实LLM)
@SpringBootTest
@Testcontainers
@ActiveProfiles("test")
class ChatServiceIntegrationTest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
.withDatabaseName("testdb")
.withUsername("test")
.withPassword("test");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Autowired
private ChatService chatService;
@Autowired
private ConversationRepository conversationRepository;
@Test
@DisplayName("端到端聊天流程应该正常工作")
void endToEndChatFlowShouldWork() {
// Given
String conversationId = UUID.randomUUID().toString();
String message = "什么是向量数据库?";
// When
ChatResponse response = chatService.chat(conversationId, message);
// Then
assertNotNull(response);
assertNotNull(response.getMessage());
assertTrue(response.getMessage().length() > 0);
// 验证会话被保存
Optional<Conversation> saved = conversationRepository.findById(conversationId);
assertTrue(saved.isPresent());
assertEquals(2, saved.get().getMessages().size()); // 用户消息 + AI回复
}
@Test
@DisplayName("应该维护对话上下文")
void shouldMaintainConversationContext() {
// Given
String conversationId = UUID.randomUUID().toString();
// When - 第一轮对话
chatService.chat(conversationId, "我的名字是张三");
// When - 第二轮对话
ChatResponse response = chatService.chat(conversationId, "我叫什么名字?");
// Then
assertTrue(
response.getMessage().contains("张三"),
"AI应该记住之前对话中的名字"
);
}
@Test
@DisplayName("RAG流程应该检索相关文档")
void ragFlowShouldRetrieveRelevantDocuments() {
// Given - 先添加一些文档
documentService.addDocument(
"LangChain4J是一个用于构建AI应用的Java框架"
);
documentService.addDocument(
"向量数据库用于存储和检索嵌入向量"
);
// When
ChatResponse response = chatService.chatWithRAG("什么是LangChain4J?");
// Then
assertNotNull(response);
assertNotNull(response.getSources());
assertFalse(response.getSources().isEmpty());
assertTrue(
response.getMessage().contains("Java框架"),
"回答应该基于检索到的文档"
);
}
}
3. E2E测试(TestContainers)
@SpringBootTest(webEnvironment = WebEnvironment.RANDOM_PORT)
@Testcontainers
@AutoConfigureMockMvc
class ChatFlowE2ETest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
.withDatabaseName("testdb");
@Container
static GenericContainer<?> qdrant = new GenericContainer<>("qdrant/qdrant:latest")
.withExposedPorts(6333)
.waitingFor(Wait.forHttp("/health").forStatusCode(200));
@Autowired
private MockMvc mockMvc;
@Autowired
private ObjectMapper objectMapper;
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
registry.add("langchain4j.qdrant.host", qdrant::getHost);
registry.add("langchain4j.qdrant.port", qdrant::getFirstMappedPort);
}
@Test
@DisplayName("完整的RAG流程E2E测试")
void completeRAGFlowE2E() throws Exception {
// Step 1: 上传文档
MockMultipartFile file = new MockMultipartFile(
"file",
"test.txt",
"text/plain",
"LangChain4J是一个Java AI框架".getBytes()
);
mockMvc.perform(multipart("/api/documents")
.file(file))
.andExpect(status().isOk())
.andExpect(jsonPath("$.documentId").exists());
// Step 2: 等待文档被索引
Thread.sleep(2000);
// Step 3: 发送查询
ChatRequest request = new ChatRequest();
request.setMessage("什么是LangChain4J?");
request.setUseRAG(true);
MvcResult result = mockMvc.perform(post("/api/chat")
.contentType(MediaType.APPLICATION_JSON)
.content(objectMapper.writeValueAsString(request)))
.andExpect(status().isOk())
.andExpect(jsonPath("$.message").exists())
.andExpect(jsonPath("$.sources").isArray())
.andReturn();
// Step 4: 验证响应
ChatResponse response = objectMapper.readValue(
result.getResponse().getContentAsString(),
ChatResponse.class
);
assertTrue(response.getSources().size() > 0);
assertTrue(response.getMessage().contains("Java"));
}
@Test
@DisplayName("限流机制E2E测试")
void rateLimitingE2E() throws Exception {
ChatRequest request = new ChatRequest();
request.setMessage("测试消息");
// 发送多个请求直到触发限流
for (int i = 0; i < 15; i++) {
ResultActions result = mockMvc.perform(post("/api/chat")
.contentType(MediaType.APPLICATION_JSON)
.content(objectMapper.writeValueAsString(request)));
if (i < 10) {
result.andExpect(status().isOk());
} else {
result.andExpect(status().isTooManyRequests());
}
}
}
@Test
@DisplayName("健康检查E2E测试")
void healthCheckE2E() throws Exception {
mockMvc.perform(get("/actuator/health"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.status").value("UP"))
.andExpect(jsonPath("$.components.db.status").value("UP"))
.andExpect(jsonPath("$.components.vectorStore.status").value("UP"));
}
}
🐳 部署和运维
1. Dockerfile
# 多阶段构建
FROM gradle:8.5-jdk21 AS builder
WORKDIR /app
# 复制构建文件
COPY build.gradle.kts settings.gradle.kts ./
COPY src ./src
# 构建应用
RUN gradle clean build -x test --no-daemon
# 运行时镜像
FROM eclipse-temurin:21-jre-alpine
# 添加非root用户
RUN addgroup -S spring && adduser -S spring -G spring
WORKDIR /app
# 复制构建产物
COPY --from=builder /app/build/libs/*.jar app.jar
# 修改所有者
RUN chown -R spring:spring /app
# 切换到非root用户
USER spring:spring
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8080/actuator/health || exit 1
# 暴露端口
EXPOSE 8080
# JVM参数优化
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+UseG1GC \
-XX:+ExitOnOutOfMemoryError \
-Djava.security.egd=file:/dev/./urandom"
# 启动应用
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
2. docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "8080:8080"
environment:
- SPRING_PROFILES_ACTIVE=prod
- OPENAI_API_KEY=${OPENAI_API_KEY}
- DATABASE_URL=jdbc:postgresql://postgres:5432/langchain4j
- DATABASE_USERNAME=langchain4j
- DATABASE_PASSWORD=${DB_PASSWORD}
- QDRANT_HOST=qdrant
- QDRANT_PORT=6333
depends_on:
postgres:
condition: service_healthy
qdrant:
condition: service_started
restart: unless-stopped
networks:
- langchain4j-network
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
postgres:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_DB=langchain4j
- POSTGRES_USER=langchain4j
- POSTGRES_PASSWORD=${DB_PASSWORD}
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U langchain4j"]
interval: 10s
timeout: 5s
retries: 5
networks:
- langchain4j-network
qdrant:
image: qdrant/qdrant:v1.7.4
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant-data:/qdrant/storage
networks:
- langchain4j-network
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
networks:
- langchain4j-network
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana-data:/var/lib/grafana
networks:
- langchain4j-network
volumes:
postgres-data:
qdrant-data:
prometheus-data:
grafana-data:
networks:
langchain4j-network:
driver: bridge
3. Kubernetes部署
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain4j-app
namespace: production
labels:
app: langchain4j
version: v1.0.0
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: langchain4j
template:
metadata:
labels:
app: langchain4j
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
spec:
serviceAccountName: langchain4j-sa
# 初始化容器 - 等待数据库就绪
initContainers:
- name: wait-for-postgres
image: busybox:1.36
command:
- sh
- -c
- |
until nc -z postgres-service 5432; do
echo "Waiting for PostgreSQL..."
sleep 2
done
containers:
- name: langchain4j
image: your-registry.com/langchain4j:v1.0.0
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: SPRING_PROFILES_ACTIVE
value: "prod"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: langchain4j-secrets
key: openai-api-key
- name: DATABASE_URL
valueFrom:
configMapKeyRef:
name: langchain4j-config
key: database-url
- name: DATABASE_USERNAME
valueFrom:
secretKeyRef:
name: langchain4j-secrets
key: db-username
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: langchain4j-secrets
key: db-password
# 资源限制
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "2Gi"
# 存活探针
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
# 就绪探针
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# 启动探针
startupProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30
# 优雅关闭
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# 优雅关闭时间
terminationGracePeriodSeconds: 30
# Pod反亲和性 - 不同节点部署
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- langchain4j
topologyKey: kubernetes.io/hostname
service.yaml
apiVersion: v1
kind: Service
metadata:
name: langchain4j-service
namespace: production
labels:
app: langchain4j
spec:
type: ClusterIP
selector:
app: langchain4j
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: langchain4j-config
namespace: production
data:
database-url: "jdbc:postgresql://postgres-service:5432/langchain4j"
qdrant-host: "qdrant-service"
qdrant-port: "6333"
application.yml: |
server:
port: 8080
shutdown: graceful
spring:
application:
name: langchain4j-app
lifecycle:
timeout-per-shutdown-phase: 20s
management:
endpoints:
web:
exposure:
include: health,prometheus,info,metrics
health:
livenessState:
enabled: true
readinessState:
enabled: true
metrics:
export:
prometheus:
enabled: true
langchain4j:
model-name: gpt-4
temperature: 0.7
max-tokens: 2000
timeout: 60
max-retries: 3
secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: langchain4j-secrets
namespace: production
type: Opaque
stringData:
openai-api-key: "sk-proj-xxxxxxxxxxxxx"
db-username: "langchain4j"
db-password: "your-secure-password"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: langchain4j-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- api.example.com
secretName: langchain4j-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: langchain4j-service
port:
number: 80
📋 50条最佳实践清单
代码质量(10条)
- 使用构建器模式:所有LangChain4J对象都使用
.builder()模式,提高可读性 - 接口隔离:为LLM交互定义清晰的接口,方便mock和测试
- 异常处理分层:业务异常、技术异常、外部服务异常分开处理
- 日志规范:使用结构化日志,包含请求ID、用户ID、操作类型
- 配置外部化:所有配置项通过配置文件或环境变量注入
- 版本控制:API版本化(/v1/chat),向后兼容
- 代码复用:提取通用的prompt模板、工具函数到共享模块
- 类型安全:使用强类型而非Map<String, Object>
- 不可变对象:领域模型使用不可变对象,线程安全
- 文档注释:复杂的prompt工程、RAG配置需要详细注释
性能优化(10条)
- 嵌入向量缓存:相同文本的embedding结果缓存到Redis
- 批量处理:文档处理使用批量API,减少网络往返
- 异步处理:长时间操作使用异步,返回任务ID
- 连接池配置:数据库、HTTP客户端使用合理的连接池大小
- 分页查询:向量检索使用分页,避免一次加载过多数据
- 懒加载:对话历史按需加载,不是全部加载
- 压缩传输:启用GZIP压缩,减少网络传输
- 索引优化:向量数据库索引类型选择(HNSW vs IVF)
- 热点数据预热:系统启动时预加载常用embedding
- 超时设置:所有外部调用设置合理超时(LLM 60s,DB 5s)
成本控制(10条)
- Token计数:请求前估算token数,避免超出限制
- 模型选择:根据任务复杂度选择模型(简单任务用GPT-3.5)
- Prompt优化:精简system prompt,减少不必要的token
- 结果缓存:相似问题的答案缓存30分钟
- 限流分级:不同用户级别不同的配额
- 批量嵌入:使用batch embedding API,成本更低
- 向量维度:根据需求选择embedding维度(768 vs 1536)
- 监控告警:API调用成本超过阈值时告警
- 降级策略:成本超标时降级到更便宜的模型
- 定期审计:每周审计API使用情况,优化高成本查询
可靠性(10条)
- 重试机制:API调用失败自动重试,指数退避
- 熔断器:连续失败时熔断,保护下游服务
- 降级方案:LLM不可用时返回预设回答
- 幂等性:所有写操作支持幂等,防止重复提交
- 事务管理:向量存储和元数据存储保持一致性
- 健康检查:检查LLM、数据库、向量存储的健康状态
- 优雅关闭:接收到SIGTERM时,完成当前请求再关闭
- 数据备份:向量数据和对话历史定期备份
- 故障恢复:系统重启后自动恢复未完成的任务
- 多区域部署:关键服务部署在多个可用区
安全性(10条)
- API密钥加密:生产环境使用Vault等密钥管理服务
- 输入验证:严格验证用户输入,防止注入攻击
- 输出过滤:过滤敏感信息(PII、API密钥)
- 访问控制:基于角色的权限控制(RBAC)
- 审计日志:记录所有敏感操作(创建、删除、修改)
- HTTPS强制:生产环境强制使用HTTPS
- CORS配置:严格配置跨域请求来源
- SQL注入防护:使用参数化查询,不拼接SQL
- XSS防护:输出时HTML转义,防止跨站脚本攻击
- 定期更新:及时更新依赖库,修复安全漏洞
🎯 项目Checklist
生产就绪检查清单
安全性
- API密钥不在代码中硬编码
- 生产环境使用Vault或类似服务管理密钥
- 启用HTTPS和TLS 1.2+
- 实施输入验证和输出过滤
- 配置CORS白名单
- 启用限流保护
- 审计日志记录所有敏感操作
可靠性
- 健康检查端点正常工作
- 配置重试和熔断机制
- 实现优雅关闭
- 数据库连接池配置合理
- 设置合理的超时时间
- 配置资源限制(CPU、内存)
监控和可观测性
- Prometheus指标暴露
- 关键操作有日志记录
- 配置告警规则
- 设置链路追踪
- 错误率监控
- 性能指标监控(延迟、吞吐量)
测试
- 单元测试覆盖率 > 80%
- 集成测试覆盖主要流程
- E2E测试验证关键场景
- 压力测试验证性能
- 安全测试(渗透测试)
文档
- API文档完整(OpenAPI/Swagger)
- 部署文档清晰
- 故障排查指南
- 架构决策记录(ADR)
- 运维手册
成本优化
- 配置结果缓存
- 根据任务选择合适的模型
- 设置成本告警
- 定期审计API使用
合规性
- GDPR/隐私保护合规
- 数据保留策略
- 用户数据导出/删除功能
- 服务条款和隐私政策
🚀 从Demo到Production
Demo阶段特征
// Demo代码示例
public class DemoChat {
public static void main(String[] args) {
// ❌ 硬编码API密钥
String apiKey = "sk-proj-xxxxx";
// ❌ 直接使用,没有错误处理
ChatLanguageModel model = OpenAiChatModel.builder()
.apiKey(apiKey)
.modelName("gpt-4")
.build();
// ❌ 没有输入验证
String response = model.generate("用户输入");
// ❌ 直接输出,没有过滤
System.out.println(response);
}
}
Production改造
// 生产级代码
@Service
@Slf4j
public class ProductionChatService {
private final ChatLanguageModel model;
private final InputValidator validator;
private final OutputSanitizer sanitizer;
private final MetricsCollector metrics;
private final CircuitBreaker circuitBreaker;
// ✅ 依赖注入,配置外部化
public ProductionChatService(
ChatLanguageModel model,
InputValidator validator,
OutputSanitizer sanitizer,
MetricsCollector metrics,
CircuitBreakerRegistry circuitBreakerRegistry) {
this.model = model;
this.validator = validator;
this.sanitizer = sanitizer;
this.metrics = metrics;
this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("chat-service");
}
@Transactional
@RateLimited
public ChatResponse chat(ChatRequest request) {
// ✅ 参数验证
validator.validateChatInput(request.getMessage());
// ✅ 记录指标
Timer.Sample sample = Timer.start(metrics.getRegistry());
try {
// ✅ 使用熔断器保护
String response = circuitBreaker.executeSupplier(() -> {
try {
return model.generate(request.getMessage());
} catch (Exception e) {
log.error("LLM调用失败", e);
throw new ChatServiceException("AI服务暂时不可用", e);
}
});
// ✅ 输出过滤
String sanitized = sanitizer.sanitizeOutput(response);
// ✅ 记录成功
sample.stop(metrics.timer("chat.success"));
return ChatResponse.builder()
.message(sanitized)
.timestamp(Instant.now())
.build();
} catch (CallNotPermittedException e) {
// ✅ 熔断降级
log.warn("熔断器打开,返回降级响应");
sample.stop(metrics.timer("chat.circuit_open"));
return getFallbackResponse();
} catch (Exception e) {
// ✅ 错误处理
log.error("聊天服务异常", e);
sample.stop(metrics.timer("chat.error"));
throw new ChatServiceException("处理请求时发生错误", e);
}
}
private ChatResponse getFallbackResponse() {
return ChatResponse.builder()
.message("抱歉,AI服务当前繁忙,请稍后重试")
.isFallback(true)
.timestamp(Instant.now())
.build();
}
}
迁移检查表
1. 配置管理
| Demo | Production |
|---|---|
| 硬编码配置 | 环境变量/配置文件 |
| 单一配置 | 多环境配置(dev/test/prod) |
| 明文密钥 | Vault/加密 |
2. 错误处理
| Demo | Production |
|---|---|
| 没有异常处理 | 完整的try-catch |
| 打印堆栈 | 结构化日志 |
| 应用崩溃 | 优雅降级 |
3. 可靠性
| Demo | Production |
|---|---|
| 单次调用 | 重试+熔断 |
| 同步阻塞 | 异步+超时 |
| 无监控 | 完整监控告警 |
4. 性能
| Demo | Production |
|---|---|
| 无缓存 | 多层缓存 |
| 串行处理 | 批量+异步 |
| 无限流 | 限流+降级 |
5. 安全
| Demo | Production |
|---|---|
| 无验证 | 输入验证+输出过滤 |
| 无认证 | JWT/OAuth2 |
| HTTP | HTTPS + 证书 |
6. 测试
| Demo | Production |
|---|---|
| 手工测试 | 自动化测试套件 |
| 无测试 | 80%+覆盖率 |
| 本地验证 | CI/CD流水线 |
💡 实战练习
练习1:构建生产级RAG系统
需求:
- 支持PDF、Word、TXT文档上传
- 使用PGVector存储向量
- 实现缓存和限流
- 完整的测试覆盖
提示:
- 先定义接口和测试(TDD)
- 实现文档解析和向量化
- 添加缓存层(Caffeine + Redis)
- 实现限流和监控
- 编写集成测试
练习2:实现对话上下文管理
需求:
- 维护用户对话历史
- 支持会话摘要(长对话)
- 自动清理过期会话
- 持久化到数据库
提示:
- 设计Conversation和Message实体
- 实现滑动窗口策略
- 集成ChatMemory
- 添加定时清理任务
练习3:部署到Kubernetes
需求:
- 编写完整的K8s配置
- 配置HPA(水平扩展)
- 实现滚动更新
- 配置监控告警
提示:
- 从Dockerfile开始
- 编写deployment和service
- 配置ConfigMap和Secret
- 使用Helm管理配置
- 配置Prometheus监控
最后更新:2026-03-09 字数统计:5,500 字 预计阅读时间:45 分钟