模块五-AI系统架构设计 | 第37讲:模块实战 - CodeSentinel 的 RAG 引擎与 LLM 审核服务完整部署
本讲目标:完成模块五的「平台级收束」:在一份
docker-compose.yml中拉起 FastAPI 应用、ChromaDB、Redis、PostgreSQL;把 代码索引服务(承接第 33 讲思路)、RAG 检索服务(混合检索 + 重排序,承接第 34 讲思路)、LLM 审核服务(流式与成本控制,承接第 35 讲)与 可观测层(第 36 讲)通过清晰的集成点串成闭环;并提供可运行的端到端脚本:index→review→metrics。本讲强调「能部署、能联调、能压测」,是你从 AI 架构图走向交付物的关键一步。
开场:组件齐全不等于系统可用
很多团队在模块五会陷入一种错觉:索引能跑、向量库能用、RAG demo 能答、推理接口能返回——但把它们拼成 CodeSentinel 时,问题会集中爆发:Webhook 触发增量索引与审核并发写库;RAG 检索耗时段内阻塞 FastAPI worker;LLM 流式事件与任务状态不同步;观测数据散落在 stdout。平台化交付要求你明确 集成契约:事件从哪来、上下文如何拼装、失败如何重试、哪些路径必须异步、哪些指标必须可聚合。
本讲给出一套「教学但完整」的工程布局:单进程 FastAPI 内含 IndexingService、RagService、ReviewService 三个边界清晰模块;ChromaDB 持久化卷;Redis 负责轻量队列与缓存占位;PostgreSQL 记录 reviews 与 index_jobs。你会看到 Git webhook 如何触发 reindex_repo(增量占位实现),RAG 如何把 top_k 片段注入审核 prompt,审核如何调用带 streaming callback 的 LLM 占位实现,以及 /metrics 与 /traces/{id} 如何暴露。
读完本讲,你应能回答四个问题:数据如何进索引、检索如何服务审核、推理如何控成本、线上如何看健康。下面用三张图把模块五总架构、部署拓扑与端到端时序钉死,再给出 docker-compose.yml 与完整 Python 服务代码,以及性能调参与测试脚本。
全局视角:模块五完整架构(Mermaid)
flowchart TB
subgraph Edge["边缘触发"]
GH["Git Webhook\n(占位 HTTP)"]
API["FastAPI Gateway"]
end
subgraph Data["数据面"]
IDX["IndexingService\nchunk + embed + upsert"]
CH["ChromaDB\n持久化卷"]
PG["PostgreSQL\nreviews / jobs"]
end
subgraph AI["智能面"]
RAG["RagService\nhybrid + rerank(教学简化)"]
REV["ReviewService\nprompt 拼装 + stream"]
LLM["LLM Provider\n(占位 FakeLLM)"]
end
subgraph Plat["平台能力"]
RD["Redis\nqueue/cache"]
OBS["Observability\nmetrics + traces"]
end
GH --> API
API --> IDX --> CH
API --> RAG
RAG --> CH
API --> REV
REV --> RAG
REV --> LLM
REV --> PG
API --> RD
REV --> OBS
IDX --> OBS
部署拓扑:Docker Compose 组件关系(Mermaid)
flowchart LR
subgraph Host["开发者机器 / 单机 VM"]
DC["docker compose"]
end
subgraph Ctn["containers"]
APP["app: FastAPI\n:8080"]
CHM["chroma\n:8001"]
RED["redis\n:6379"]
PDB["postgres\n:5432"]
end
DC --> APP
DC --> CHM
DC --> RED
DC --> PDB
APP --> CHM
APP --> RED
APP --> PDB
端到端序列:索引 → 审核 → 观测(Mermaid)
sequenceDiagram
participant W as Webhook/脚本
participant A as FastAPI
participant I as IndexingService
participant C as ChromaDB
participant R as RagService
participant V as ReviewService
participant P as Postgres
W->>A: POST /hooks/git (demo)
A->>I: reindex_repo(repo_id, files[])
I->>C: upsert embeddings
I->>P: insert index_jobs row
W->>A: POST /reviews (code, repo_id)
A->>R: retrieve(query code)
R->>C: query top_k
R-->>A: contexts[]
A->>V: review(code, contexts, trace_id)
V->>P: insert review record (streaming...)
V-->>A: summary + metrics
A-->>W: JSON 响应
核心原理:四大集成点与工程约束
1. 索引服务与 Webhook:增量是产品,不是插件
教学实现用「文件列表 POST」模拟 git diff 输出;生产应解析 push 事件,按变更路径选择性重嵌入,避免全量重建。集成点:IndexingService.upsert_files 必须幂等(同一 chunk_id 覆盖写),否则 Chroma 重复膨胀。
2. RAG 与审核:上下文窗口是硬约束
审核 prompt = 系统指令 + 用户代码 + RAG 片段 + 元数据(路径、语言、规则版本)。必须在服务端做 token 预算裁剪:按相似度排序后从高到低累加,直到逼近上限。集成点:RagService.build_context。
3. 推理服务与队列:高峰时别让 Uvicorn 当牺牲品
本讲在进程内用 asyncio.create_task 演示异步审核;生产应拆 Worker(第 35 讲队列模式)。集成点:ReviewService 应能切换 sync/enqueue 两种策略。
4. 可观测:把 trace_id 写入数据库
reviews 表保存 trace_id,才能把用户投诉与具体 prompt/模型版本对齐。集成点:API 中间件注入 → ReviewService 持久化。
5. 性能调参抓手
- batch size:嵌入批处理大小与 GPU/CPU、内存权衡。
- cache hit:对「同一 repo + 同一文件 hash」跳过重嵌入。
- 并发:Uvicorn workers 数 × 每个 worker 的 LLM 并发上限。
- Chroma:持久化目录放在卷上,避免容器重建丢索引。
代码实战:module5_capstone/ 完整可运行工程
1. 目录结构
module5_capstone/
docker-compose.yml
app/
main.py
requirements.txt
scripts/
e2e_demo.py
2. docker-compose.yml
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: codesentinel
POSTGRES_PASSWORD: codesentinel
POSTGRES_DB: codesentinel
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
chroma:
image: chromadb/chroma:0.5.5
ports:
- "8001:8000"
volumes:
- chroma_data:/chroma/chroma
app:
build:
context: ./app
environment:
DATABASE_URL: postgresql+asyncpg://codesentinel:codesentinel@postgres:5432/codesentinel
REDIS_URL: redis://redis:6379/0
CHROMA_URL: http://chroma:8000
ports:
- "8080:8080"
depends_on:
- postgres
- redis
- chroma
volumes:
pgdata:
chroma_data:
3. app/Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
4. app/requirements.txt
fastapi>=0.110.0
uvicorn[standard]>=0.27.0
pydantic>=2.6.0
httpx>=0.27.0
redis>=5.0.0
sqlalchemy[asyncio]>=2.0.0
asyncpg>=0.29.0
chromadb>=0.5.5
numpy>=1.26.0
5. app/main.py(完整实现)
from __future__ import annotations
import asyncio
import hashlib
import json
import os
import time
import uuid
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple
import httpx
import numpy as np
import redis.asyncio as redis
import uvicorn
from fastapi import FastAPI, Header, HTTPException, Request
from pydantic import BaseModel, Field
from sqlalchemy import String, Text, Float, Integer, update
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
DATABASE_URL = os.getenv("DATABASE_URL", "postgresql+asyncpg://codesentinel:codesentinel@localhost:5432/codesentinel")
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/0")
CHROMA_URL = os.getenv("CHROMA_URL", "http://localhost:8001")
class Base(DeclarativeBase):
pass
class IndexJob(Base):
__tablename__ = "index_jobs"
id: Mapped[str] = mapped_column(String(64), primary_key=True)
repo_id: Mapped[str] = mapped_column(String(128), index=True)
status: Mapped[str] = mapped_column(String(32))
created_at: Mapped[float] = mapped_column(Float)
class ReviewRow(Base):
__tablename__ = "reviews"
id: Mapped[str] = mapped_column(String(64), primary_key=True)
repo_id: Mapped[str] = mapped_column(String(128), index=True)
trace_id: Mapped[str] = mapped_column(String(64), index=True)
summary: Mapped[str] = mapped_column(Text())
prompt_tokens: Mapped[int] = mapped_column(Integer)
completion_tokens: Mapped[int] = mapped_column(Integer)
created_at: Mapped[float] = mapped_column(Float)
engine = create_async_engine(DATABASE_URL, echo=False)
SessionLocal = async_sessionmaker(engine, expire_on_commit=False)
async def init_db() -> None:
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
def stable_id(*parts: str) -> str:
h = hashlib.sha256()
for p in parts:
h.update(p.encode("utf-8", errors="ignore"))
h.update(b"|")
return h.hexdigest()[:32]
class Metrics:
def __init__(self) -> None:
self.lock = asyncio.Lock()
self.counters: Dict[str, float] = {
"reviews_total": 0,
"index_jobs_total": 0,
"rag_queries_total": 0,
"llm_calls_total": 0,
}
self.hist_latency_ms: List[float] = []
async def inc(self, key: str, delta: float = 1.0) -> None:
async with self.lock:
self.counters[key] = self.counters.get(key, 0.0) + delta
async def observe_review_latency(self, ms: float) -> None:
async with self.lock:
self.hist_latency_ms.append(ms)
if len(self.hist_latency_ms) > 2000:
self.hist_latency_ms = self.hist_latency_ms[-2000:]
async def snapshot(self) -> Dict[str, Any]:
async with self.lock:
lat = self.hist_latency_ms[-512:]
p95 = float(np.percentile(lat, 95)) if lat else 0.0
return {"counters": dict(self.counters), "review_latency_p95_ms": p95}
METRICS = Metrics()
class TraceStore:
def __init__(self, max_n: int = 2000) -> None:
self._items: List[Dict[str, Any]] = []
self._max = max_n
self._lock = asyncio.Lock()
async def add(self, ev: Dict[str, Any]) -> None:
async with self._lock:
self._items.append(ev)
if len(self._items) > self._max:
self._items = self._items[-self._max :]
async def by_trace(self, trace_id: str) -> List[Dict[str, Any]]:
async with self._lock:
return [x for x in self._items if x.get("trace_id") == trace_id]
TRACES = TraceStore()
def embed_texts(texts: List[str], dim: int = 64) -> List[List[float]]:
"""教学用确定性伪嵌入,避免下载大模型;生产请换 sentence-transformers/OpenAI embeddings。"""
out: List[List[float]] = []
for t in texts:
v = np.zeros(dim, dtype=np.float32)
s = (t or "").lower()
for i in range(len(s) - 1):
idx = (ord(s[i]) * 31 + ord(s[i + 1])) % dim
v[idx] += 1.0
n = float(np.linalg.norm(v) or 1.0)
out.append((v / n).tolist())
return out
def cosine(a: List[float], b: List[float]) -> float:
aa = np.array(a, dtype=np.float32)
bb = np.array(b, dtype=np.float32)
return float(np.dot(aa, bb) / ((np.linalg.norm(aa) * np.linalg.norm(bb)) or 1.0))
class ChromaClient:
"""通过 Chroma HTTP API(v1)集合操作;失败时降级内存。"""
def __init__(self, base_url: str, collection: str = "codesentinel_repo") -> None:
self.base = base_url.rstrip("/")
self.collection = collection
self._mem_docs: List[Dict[str, Any]] = []
async def ensure_collection(self) -> None:
async with httpx.AsyncClient(timeout=30.0) as client:
r = await client.get(f"{self.base}/api/v1/collections")
if r.status_code >= 400:
return
cols = r.json()
names = [c.get("name") for c in cols]
if self.collection not in names:
await client.post(
f"{self.base}/api/v1/collections",
json={"name": self.collection, "metadata": {"project": "CodeSentinel"}},
)
async def upsert(self, ids: List[str], documents: List[str], metadatas: List[Dict[str, Any]], embeddings: List[List[float]]) -> None:
payload = {"ids": ids, "documents": documents, "metadatas": metadatas, "embeddings": embeddings}
async with httpx.AsyncClient(timeout=60.0) as client:
r = await client.post(
f"{self.base}/api/v1/collections/{self.collection}/upsert",
json=payload,
)
if r.status_code >= 400:
# 降级:内存索引
for i, doc_id in enumerate(ids):
self._mem_docs.append(
{"id": doc_id, "document": documents[i], "metadata": metadatas[i], "embedding": embeddings[i]}
)
async def query(self, embedding: List[float], n_results: int = 5) -> List[Dict[str, Any]]:
async with httpx.AsyncClient(timeout=60.0) as client:
r = await client.post(
f"{self.base}/api/v1/collections/{self.collection}/query",
json={"query_embeddings": [embedding], "n_results": n_results},
)
if r.status_code >= 400:
scored = []
for d in self._mem_docs:
scored.append((cosine(embedding, d["embedding"]), d))
scored.sort(key=lambda x: x[0], reverse=True)
return [x[1] for x in scored[:n_results]]
data = r.json()
out = []
ids = (data.get("ids") or [[]])[0]
docs = (data.get("documents") or [[]])[0]
metas = (data.get("metadatas") or [[]])[0]
dists = (data.get("distances") or [[]])[0]
for i in range(len(ids)):
out.append(
{
"id": ids[i],
"document": docs[i],
"metadata": metas[i] if i < len(metas) else {},
"distance": dists[i] if i < len(dists) else None,
}
)
return out
@dataclass
class FileChunk:
repo_id: str
path: str
content: str
class IndexingService:
def __init__(self, chroma: ChromaClient) -> None:
self.chroma = chroma
def chunk_file(self, repo_id: str, path: str, content: str, max_chars: int = 800) -> List[FileChunk]:
chunks: List[FileChunk] = []
c = content or ""
for i in range(0, len(c), max_chars):
chunks.append(FileChunk(repo_id=repo_id, path=path, content=c[i : i + max_chars]))
return chunks or [FileChunk(repo_id=repo_id, path=path, content="")]
async def upsert_repo_files(self, repo_id: str, files: Dict[str, str]) -> Dict[str, Any]:
await METRICS.inc("index_jobs_total")
all_chunks: List[FileChunk] = []
for path, text in files.items():
all_chunks.extend(self.chunk_file(repo_id, path, text))
ids = [stable_id(repo_id, c.path, str(hash(c.content)), str(i)) for i, c in enumerate(all_chunks)]
docs = [c.content for c in all_chunks]
metas = [{"repo_id": c.repo_id, "path": c.path} for c in all_chunks]
embs = embed_texts(docs)
await self.chroma.ensure_collection()
await self.chroma.upsert(ids, docs, metas, embs)
return {"repo_id": repo_id, "chunks": len(all_chunks)}
class RagService:
def __init__(self, chroma: ChromaClient) -> None:
self.chroma = chroma
async def retrieve(self, repo_id: str, query_code: str, top_k: int = 4) -> List[Dict[str, Any]]:
await METRICS.inc("rag_queries_total")
q_emb = embed_texts([query_code])[0]
hits = await self.chroma.query(q_emb, n_results=max(8, top_k * 4))
def rerank(hit: Dict[str, Any]) -> float:
# 教学版 rerank:向量相似度 + 关键词重合加分
doc = hit.get("document") or ""
overlap = sum(1 for w in set(query_code.lower().split()) if w and w in doc.lower())
base = 1.0 - float(hit.get("distance") or 0.0)
return base + 0.01 * overlap
repo_hits = [h for h in hits if (h.get("metadata") or {}).get("repo_id") == repo_id]
repo_hits.sort(key=rerank, reverse=True)
return repo_hits[:top_k]
def build_context(self, hits: List[Dict[str, Any]], max_chars: int = 6000) -> str:
parts: List[str] = []
used = 0
for h in hits:
meta = h.get("metadata") or {}
header = f"FILE: {meta.get('path','?')}\n"
body = (h.get("document") or "")[:2000]
block = header + body + "\n---\n"
if used + len(block) > max_chars:
break
parts.append(block)
used += len(block)
return "\n".join(parts)
class FakeLLM:
async def review_stream(self, prompt: str) -> Tuple[str, Dict[str, int]]:
await asyncio.sleep(0.05)
text = (
"【CodeSentinel 审核(RAG 增强)】\n"
"1) 结合仓库上下文:相关文件片段显示该模块已有类似模式,请保持一致性。\n"
"2) 安全:未发现明显硬编码密钥,但建议把配置外置。\n"
"3) 架构:函数职责尚可,若继续膨胀建议拆分 Application Service。\n"
f"\n---\nPromptLen={len(prompt)}\n"
)
usage = {"prompt_tokens": max(1, len(prompt) // 4), "completion_tokens": max(1, len(text) // 4)}
return text, usage
class ReviewService:
def __init__(self, rag: RagService, llm: FakeLLM) -> None:
self.rag = rag
self.llm = llm
async def review(self, repo_id: str, code: str, trace_id: str) -> Dict[str, Any]:
await METRICS.inc("llm_calls_total")
hits = await self.rag.retrieve(repo_id, code, top_k=4)
ctx = self.rag.build_context(hits)
prompt = (
"你是 CodeSentinel 首席架构审核员。结合 RAG 上下文给出可执行建议(中文)。\n"
f"仓库:{repo_id}\n"
f"RAG上下文:\n{ctx}\n"
f"待审核代码:\n```\n{code}\n```\n"
)
await TRACES.add({"trace_id": trace_id, "ts": time.time(), "event": "rag.hits", "n": len(hits)})
summary, usage = await self.llm.review_stream(prompt)
await TRACES.add(
{
"trace_id": trace_id,
"ts": time.time(),
"event": "llm.done",
"usage": usage,
"summary_preview": summary[:300],
}
)
return {"summary": summary, "usage": usage, "rag_hits": len(hits)}
app = FastAPI(title="CodeSentinel Module5 Capstone", version="1.0.0")
@app.middleware("http")
async def trace_mw(request: Request, call_next):
tid = request.headers.get("x-trace-id") or str(uuid.uuid4())
request.state.trace_id = tid
t0 = time.perf_counter()
resp = await call_next(request)
ms = (time.perf_counter() - t0) * 1000.0
if request.url.path.startswith("/reviews"):
await METRICS.observe_review_latency(ms)
resp.headers["x-trace-id"] = tid
return resp
@app.on_event("startup")
async def startup() -> None:
await init_db()
app.state.redis = redis.from_url(REDIS_URL, decode_responses=True)
app.state.chroma = ChromaClient(CHROMA_URL)
app.state.indexer = IndexingService(app.state.chroma)
app.state.rag = RagService(app.state.chroma)
app.state.reviewer = ReviewService(app.state.rag, FakeLLM())
@app.on_event("shutdown")
async def shutdown() -> None:
await app.state.redis.close()
class GitHookPayload(BaseModel):
repo_id: str
files: Dict[str, str] = Field(default_factory=dict, description="path->content")
@app.post("/hooks/git")
async def git_hook(payload: GitHookPayload) -> Dict[str, Any]:
job_id = str(uuid.uuid4())
async with SessionLocal() as session: # type: ignore[var-annotated]
session.add(IndexJob(id=job_id, repo_id=payload.repo_id, status="running", created_at=time.time()))
await session.commit()
info = await app.state.indexer.upsert_repo_files(payload.repo_id, payload.files)
async with SessionLocal() as session:
await session.execute(update(IndexJob).where(IndexJob.id == job_id).values(status="done"))
await session.commit()
await TRACES.add({"trace_id": job_id, "ts": time.time(), "event": "index.done", "info": info})
return {"job_id": job_id, **info}
class ReviewRequest(BaseModel):
repo_id: str
code: str
@app.post("/reviews")
async def create_review(req: ReviewRequest, request: Request) -> Dict[str, Any]:
await METRICS.inc("reviews_total")
trace_id = request.state.trace_id
rid = str(uuid.uuid4())
result = await app.state.reviewer.review(req.repo_id, req.code, trace_id=trace_id)
async with SessionLocal() as session:
session.add(
ReviewRow(
id=rid,
repo_id=req.repo_id,
trace_id=trace_id,
summary=result["summary"],
prompt_tokens=int(result["usage"]["prompt_tokens"]),
completion_tokens=int(result["usage"]["completion_tokens"]),
created_at=time.time(),
)
)
await session.commit()
return {"review_id": rid, "trace_id": trace_id, **result}
@app.get("/metrics")
async def metrics() -> Dict[str, Any]:
return await METRICS.snapshot()
@app.get("/traces/{trace_id}")
async def traces(trace_id: str) -> Dict[str, Any]:
return {"trace_id": trace_id, "events": await TRACES.by_trace(trace_id)}
@app.get("/health")
async def health() -> Dict[str, str]:
return {"status": "ok"}
if __name__ == "__main__":
uvicorn.run("main:app", host="0.0.0.0", port=int(os.getenv("PORT", "8080")))
备注:Chroma 的 HTTP API 路径在不同版本可能略有差异;若联调报错,可观察容器日志并将 upsert/query 路径替换为你本镜像对应版本文档。教学代码已内置内存降级,保证「即使 Chroma API 不匹配也能跑通演示」,但生产必须以确定版本锁定。
6. scripts/e2e_demo.py
import json
import sys
import urllib.request
def post_json(url: str, payload: dict) -> dict:
data = json.dumps(payload, ensure_ascii=False).encode("utf-8")
req = urllib.request.Request(url, data=data, headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=60) as resp:
return json.loads(resp.read().decode("utf-8"))
def main() -> int:
base = sys.argv[1] if len(sys.argv) > 1 else "http://127.0.0.1:8080"
repo_id = "demo/repo"
files = {
"app/service.py": "class UserService:\n def create_user(self, name: str) -> int:\n return 1\n",
"app/api.py": "from fastapi import FastAPI\napp = FastAPI()\n",
}
print("== index ==")
print(post_json(base + "/hooks/git", {"repo_id": repo_id, "files": files}))
code = "def create_user(name):\n return 1\n"
print("== review ==")
r = post_json(base + "/reviews", {"repo_id": repo_id, "code": code})
print(r)
tid = r["trace_id"]
print("== traces ==")
with urllib.request.urlopen(base + f"/traces/{tid}", timeout=30) as resp:
print(resp.read().decode("utf-8"))
print("== metrics ==")
with urllib.request.urlopen(base + "/metrics", timeout=30) as resp:
print(resp.read().decode("utf-8"))
return 0
if __name__ == "__main__":
raise SystemExit(main())
生产环境实战:Compose 之后还要做的十件事
- 镜像版本钉死:Chroma、Postgres、Redis 全用 digest 或明确 minor 版本,避免 API 漂移。
- 迁移工具:用 Alembic 管理
reviews表索引(repo_id, created_at)。 - 密钥管理:数据库密码进 Docker secrets/KMS,不进 git。
- 资源限额:为
app设置 CPU/mem,防止嵌入批处理 OOM。 - 健康检查:Compose
healthcheck等待 Postgres ready 后再启动 app。 - 对外暴露:仅 Ingress 暴露 443;Chroma/Redis/Postgres 不映射到公网。
- 真实嵌入:把
embed_texts替换为异步批处理嵌入服务,并加缓存。 - 真实 LLM:接入供应商 SDK,复用第 35 讲预算与限流组件。
- 观测:把
METRICS.snapshot()换成 Prometheus exporter;把TRACES换 OTLP。 - 压测:用 locust 压
/reviews,调top_k、上下文长度、worker 数,观察 p95 与错误预算。
本讲小结(Mermaid mindmap)
mindmap
root((第37讲 模块实战))
索引
Webhook 触发
chunk+embed
Chroma 持久化
RAG
hybrid 简化
rerank
上下文预裁剪
审核
prompt 拼装
LLM 占位可替换
trace 入库
部署
compose 四件套
卷与网络
演进
Worker 队列
真实嵌入
OTLP 观测
思考题
- 当仓库达到百万文件规模,索引增量策略应如何分层(目录哈希、变更检测、嵌入缓存)?
- RAG 命中错误文件导致审核误判时,你如何在产品层给用户「引用依据」与「一键反馈」?
- 若 Chroma HTTP API 与客户端版本不一致,你会选择「锁版本」还是「抽象向量存储端口」?
下一讲预告
进入模块六或课程收尾前,你将把 CodeSentinel 从「单机 compose」推进到「多环境发布」:CI 构建镜像、密钥轮换、灰度发布与回滚、以及把本讲的 metrics/traces 接入公司级监控平台。若你正在准备晋升答辩,本讲交付物就是最好的架构案例:清晰的边界、可替换的供应商适配层、以及端到端可演示的闭环。