本次功能
模型响应耗时统计 + 基础元信息展示
支持:
- 显示本次回复耗时
- 显示当前会话使用的模型
- 显示回复时间
- 为后面接 token 用量统计预留结构
1)改 server/app.py
ChatResponse
class ChatResponse(BaseModel):
reply: str
meta: Optional[dict] = None
补充 import
import time
from datetime import datetime
start_time = time.perf_counter()
meta = None
try:
client, config = create_client_by_model(req.model or DEFAULT_MODEL_NAME)
completion = client.chat.completions.create(
model=config["model"],
messages=final_messages,
temperature=req.temperature or 0.7,
top_p=req.top_p or 1,
max_tokens=req.max_tokens or 1200,
)
reply = completion.choices[0].message.content or ""
duration_ms = int((time.perf_counter() - start_time) * 1000)
usage = getattr(completion, "usage", None)
meta = {
"provider": config["provider"],
"model": req.model or DEFAULT_MODEL_NAME,
"duration_ms": duration_ms,
"reply_at": datetime.now().isoformat(),
"usage": {
"prompt_tokens": getattr(usage, "prompt_tokens", None) if usage else None,
"completion_tokens": getattr(usage, "completion_tokens", None) if usage else None,
"total_tokens": getattr(usage, "total_tokens", None) if usage else None,
},
}
except Exception as e:
print("chat error:", e)
reply = "AI服务异常,请稍后再试"
duration_ms = int((time.perf_counter() - start_time) * 1000)
meta = {
"provider": None,
"model": req.model or DEFAULT_MODEL_NAME,
"duration_ms": duration_ms,
"reply_at": datetime.now().isoformat(),
"usage": {
"prompt_tokens": None,
"completion_tokens": None,
"total_tokens": None,
},
}
改返回
return ChatResponse(reply=reply, meta=meta)
/api/chat/stream 的 done 事件
def generate():
full_reply = ""
start_time = time.perf_counter()
provider = None
model_name = req.model or DEFAULT_MODEL_NAME
try:
client, config = create_client_by_model(model_name)
provider = config["provider"]
stream = client.chat.completions.create(
model=config["model"],
messages=final_messages,
temperature=req.temperature or 0.7,
top_p=req.top_p or 1,
max_tokens=req.max_tokens or 1200,
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
if delta:
full_reply += delta
yield f"data: {json.dumps({'type': 'chunk', 'content': delta}, ensure_ascii=False)}\n\n"
except Exception as e:
print("stream chat error:", e)
yield f"data: {json.dumps({'type': 'error', 'content': 'AI服务异常,请稍后再试'}, ensure_ascii=False)}\n\n"
return
if req.session_id:
latest_user_text = ""
for item in reversed(messages):
if item["role"] == "user":
latest_user_text = item["content"]
break
print("stream session_id:", req.session_id)
print("stream latest_user_text:", latest_user_text)
if latest_user_text:
new_memories = extract_user_memories(latest_user_text, model_name)
print("stream new_memories:", new_memories)
add_session_memories(req.session_id, new_memories)
duration_ms = int((time.perf_counter() - start_time) * 1000)
meta = {
"provider": provider,
"model": model_name,
"duration_ms": duration_ms,
"reply_at": datetime.now().isoformat(),
"usage": {
"prompt_tokens": None,
"completion_tokens": None,
"total_tokens": None,
},
}
yield f"data: {json.dumps({'type': 'done', 'content': full_reply, 'meta': meta}, ensure_ascii=False)}\n\n"
2)改 web/src/utils/session.js
createSession 里新增 lastReplyMeta
maxTokens: 1200,
memoryEnabled: true,
lastReplyMeta: null,
pinned: false,
loadSessions 里的 normalize 补默认值
memoryEnabled: true,
lastReplyMeta: null,
pinned: false,
...item,
3)改 web/src/App.vue
新增计算属性
const currentReplyMeta = computed(() => {
return currentSession.value?.lastReplyMeta || null
})
在普通 /api/chat 成功时保存 meta
updateCurrentSession(session => ({
...session,
updatedAt: Date.now(),
lastReplyMeta: res.data.meta || null,
messages: [
...session.messages,
{
role: 'assistant',
content: res.data.reply,
},
],
}))
改流式 done 逻辑
if (payload.type === 'done') {
sessions.value = sortSessions(
sessions.value.map(item =>
item.id === currentSessionId.value
? {
...item,
lastReplyMeta: payload.meta || null,
updatedAt: Date.now(),
}
: item
)
)
await fetchMemories()
}
4)改模板
在聊天框前、输入框上方都可以,推荐放在 chat-box 上方
<div v-if="currentReplyMeta" class="reply-meta-bar">
<div class="reply-meta-item">
<span class="reply-meta-label">模型</span>
<span class="reply-meta-value">{{ currentReplyMeta.model || '-' }}</span>
</div>
<div class="reply-meta-item">
<span class="reply-meta-label">提供方</span>
<span class="reply-meta-value">{{ currentReplyMeta.provider || '-' }}</span>
</div>
<div class="reply-meta-item">
<span class="reply-meta-label">耗时</span>
<span class="reply-meta-value">{{ currentReplyMeta.duration_ms }} ms</span>
</div>
<div class="reply-meta-item">
<span class="reply-meta-label">时间</span>
<span class="reply-meta-value">
{{ new Date(currentReplyMeta.reply_at).toLocaleString() }}
</span>
</div>
</div>
5)补充样式
.reply-meta-bar {
margin-bottom: 12px;
padding: 12px 14px;
border: 1px solid #e5e7eb;
border-radius: 12px;
background: #f8fafc;
display: flex;
flex-wrap: wrap;
gap: 12px 20px;
}
.reply-meta-item {
display: flex;
align-items: center;
gap: 6px;
min-width: 0;
}
.reply-meta-label {
font-size: 12px;
color: #6b7280;
}
.reply-meta-value {
font-size: 13px;
color: #111827;
font-weight: 500;
word-break: break-all;
}
6)怎么验证
nice !
本次提交代码
完整代码请看仓库,仓库地址:github.com/huanhunmao/… star 🌟🌟🌟 谢谢~