AI Chat实现第一步，流式输出，教你如何实现打字流前端如何实现 Chat 流式输出：从 0 到可上线在 AI 对话

打字.gif

前端如何实现 Chat 流式输出：从 0 到可上线

在 AI 对话场景里，用户最敏感的不是“总耗时”，而是“多久看到第一个字”。
流式输出（Streaming）最大的价值，就是让回复边生成边展示，显著提升体感速度。

这篇文章结合一个 Vue + Pinia 项目的真实代码，讲清楚前端如何实现稳定的流式聊天输出。

1. 先明确目标：我们到底要实现什么

一个可用的流式 Chat 前端，至少要满足下面 5 点：

请求发出后，能持续接收后端增量数据（不是一次性返回全文）。
每收到一个 chunk，就把它追加到页面上。
支持结束事件（done）和错误事件（error）。
支持会话 ID 回传（新会话 -> 后端创建 -> 前端接管）。
UI 不“卡帧”，用户确实能看到“逐段输出”。

2. 协议约定：前后端先说好 SSE 格式

这套实现走的是 text/event-stream，后端按行推送：

data: {"content": "你"}

data: {"content": "好"}

data: {"done": true, "conversation_id": "conv_abc123", "message_id": "msg_001"}

前端只需要抓住一个核心：逐行解析 data: ，拿到 JSON 后按字段分发逻辑。

3. API 层：用 fetch + ReadableStream 读取流

下面是核心实现思路：

export async function streamChat(message, history = [], { conversationId, onChunk, onDone, onError, signal } = {}) {
  const res = await fetch("/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message, history, conversation_id: conversationId || null }),
    signal,
  })

  if (!res.ok) throw new Error(res.statusText || "请求失败")

  const reader = res.body.getReader()
  const decoder = new TextDecoder("utf-8")
  let buffer = ""
  let fullText = ""

  while (true) {
    const { done, value } = await reader.read()
    if (value) buffer += decoder.decode(value, { stream: true })

    const lines = buffer.split("\n")
    buffer = lines.pop() || ""

    for (const line of lines) {
      if (!line.startsWith("data: ")) continue
      const raw = line.slice(6).trim()
      if (!raw) continue

      const data = JSON.parse(raw)
      if (data.error) {
        onError?.(data.error)
        return fullText
      }
      if (data.content !== undefined) {
        fullText += data.content
        onChunk?.(data.content, fullText)
      }
      if (data.done) {
        onDone?.(fullText, data)
        return fullText
      }
    }

    if (done) break
  }

  onDone?.(fullText, {})
  return fullText
}

核心的流式处理问题:

1. reader.read() 实际在干什么

res.body 是一个 ReadableStream

reader.read() 每次返回的是：

done: boolean

value: Uint8Array（一段原始字节）

这块字节是网络底层随缘给你的：「收到多少就给多少」，会受下面这些影响：

TCP 自己怎么分包

服务器怎么 flush 输出

中间的 nginx / 代理有没有缓冲

浏览器内部的缓冲策略

所以有几种常见情况：

一次 read() 可能只读到半行："data: {"cont"

下一次 read() 才读到剩下的："ent":"你好"}\n"

也可能一次 read() 直接给你多行：

"data:{...}\ndata:{...}\n"

2. 为什么要自己用 buffer + split("\n")

因为 read() 给的是「任意切块」：

我们想要的是按 SSE 协议的「一行一行的 data: xxx」

就只能自己维护一个字符串缓冲区 buffer：

每次 read()：把这一块字节 decode 成字符串，拼到 buffer

buffer.split("\n") 拆成多行

最后一行很可能是不完整的，留回给 buffer

其它完整行，交给 for (const line of lines) 去解析

所以整体逻辑是：

浏览器帮你：按字节流 → 若干块 Uint8Array（read()）

你自己：把块拼接起来 → 按 \n 切出完整的 data: ... 行 → 解析 JSON，触发回调

这段代码的 4 个关键点

ReadableStream + getReader()：让你拿到“增量字节流”，而不是等待完整响应。
TextDecoder(..., { stream: true })：避免中文被拆包时出现乱码。
buffer + split("\n")：处理半包/粘包，保证每次只按完整行解析。
fullText 累积：UI 展示时直接用完整文本，避免拼接顺序错乱。

4. Store 层：如何把 chunk 丝滑渲染到页面

很多项目“看起来是流式”，但其实 UI 一次性更新。问题常出在响应式更新策略。

const assistantIdx = messages.value.length - 1

await streamChat(content, history, {
  conversationId: activeConversationId.value,
  onChunk: (chunk, fullText) => {
    const msg = messages.value[assistantIdx]
    if (msg) {
      // 用“替换对象”强制触发响应式，而不是只改 msg.content
      messages.value[assistantIdx] = { ...msg, content: fullText }
    }
  },
  onDone: (fullText, metadata = {}) => {
    streaming.value = false
    const msg = messages.value[assistantIdx]
    if (msg) {
      messages.value[assistantIdx] = { ...msg, streaming: false, content: fullText }
    }

    // 新会话场景：接住后端返回的 conversation_id
    if (!activeConversationId.value && metadata.conversation_id) {
      activeConversationId.value = metadata.conversation_id
    }
  }
})

为什么要“替换对象”而不是直接赋值字段

在高频 chunk 更新下，某些场景里直接 msg.content = fullText 可能让视图更新不稳定。
通过 messages[idx] = { ...msg, content: fullText }，可以更稳定地触发 Vue 响应式刷新。

5. 体验优化：让“流式”真的看起来在流

项目里还有一个很关键的细节：每次 chunk 回调后，把主线程还给浏览器一帧。（实现打字机动画效果最关键的地方）

const yieldToUI = () => new Promise((r) => requestAnimationFrame(r))

if (data.content !== undefined) {
  fullText += data.content
  onChunk?.(data.content, fullText)
  await yieldToUI()
}

这个技巧可以缓解“后端虽然分片返回，但前端因为连续计算导致批量渲染”的问题，肉眼观感会更像真实打字流。

6. 错误处理与边界条件

流式场景最容易忽略的是“非 happy path”。建议至少覆盖这些分支：

res.ok === false：HTTP 失败直接走错误提示。
SSE 数据里带 error 字段：显示业务错误并结束流。
done 到达时补齐元数据：conversation_id、message_id、tokens_used（后续的对话引用历史内容要用到）。
最后一行没有换行符：在 done 后补解析 buffer。
用户主动停止（AbortController）：中断请求并更新 UI 状态。

7. 一份可复用的最小实现模板

如果你要在新项目快速落地，可以按这个结构拆分：

src/api/streamChat.js：只管网络流读取和 SSE 解析。
src/stores/chat.js：只管消息状态、占位消息、逐段更新、结束落盘。
ChatInput.vue：发消息/停止生成按钮。
ChatMessageList.vue：渲染消息列表和 streaming 状态。

你会得到一个职责清晰、可维护的流式聊天前端架构。

8. 常见坑位清单（实战版）

只判断了 HTTP 成功，没处理 SSE 内的业务错误。
按 chunk 直接渲染，没做 buffer 行解析，导致 JSON 解析随机失败。
使用了流式接口，但没有逐帧让出 UI，用户仍看到“整段跳出”。
新会话拿到 conversation_id 后没回写，下一轮请求上下文断裂。
切换会话时历史消息和当前流混在一起，状态污染。

为什么很多人“写了流式却看不到流式”

1) 没做 buffer，直接 `JSON.parse(chunk)`

这几乎必炸。
因为网络层会拆包/粘包，chunk 和一条完整 data: 事件不是一一对应关系。

2) 没处理最后尾包

split("\n") 后最后一段会留在 buffer，如果末尾没有换行，最后一条事件会丢。
上面代码里 if (done && buffer.trim()) 就是专门兜这个坑。

3) UI 更新策略不对

后端在推，但前端不让主线程喘气，就会“看起来一次性渲染”。
requestAnimationFrame 让出一帧，观感差距非常明显。

4) 我在项目里踩过的 7 个坑

只判断 res.ok，没处理 SSE 里的 error 字段。
直接 parse chunk，导致随机 JSON 报错。
忘了处理 done 时 buffer 尾包，最后一句话丢失。
新会话没回写 conversation_id，下一轮上下文断裂。
高频更新直接改对象字段，某些情况下视图刷新不连贯。
生成中切会话，旧流写进新会话列表，状态串台。
没做取消逻辑，用户点“停止”后 UI 停了但请求还在跑。

9. 总结

前端实现流式输出的核心，不是“用了 SSE”这么简单，而是：

读得对（ReadableStream + 行级解析）
渲得稳（响应式策略 + 逐帧更新）
状态闭环（占位、完成、异常、会话关联）

把这三件事做好，你的 Chat 页面就不只是“能用”，而是“体验接近成熟产品”。完整代码：

async function streamChat(message, history = [], { conversationId, onChunk, onDone, onError, signal } = {}) {
  const res = await fetch("/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      message,
      history,
      conversation_id: conversationId || null,
    }),
    signal,
  })

  if (!res.ok) {
    const errMsg = res.statusText || "请求失败"
    onError?.(errMsg)
    throw new Error(errMsg)
  }

  const reader = res.body.getReader()
  const decoder = new TextDecoder("utf-8")
  let buffer = ""
  let fullText = ""

  const parseSSELine = (line) => {
    if (!line.startsWith("data: ")) return {}
    const raw = line.slice(6).trim()
    if (!raw) return {}

    if (raw === "[DONE]") {
      onDone?.(fullText, {})
      return { shouldReturn: true }
    }

    try {
      const data = JSON.parse(raw)
      if (data.error) {
        onError?.(data.error)
        return { shouldReturn: true }
      }
      if (data.content !== undefined) {
        fullText += data.content
        onChunk?.(data.content, fullText)
        return { needsYield: true }
      }
      if (data.done) {
        onDone?.(fullText, {
          conversation_id: data.conversation_id || null,
          message_id: data.message_id || null,
          tokens_used: data.tokens_used || null,
        })
        return { shouldReturn: true }
      }
      return {}
    } catch {
      return {}
    }
  }

  while (true) {
    const { done, value } = await reader.read()
    if (value) buffer += decoder.decode(value, { stream: true })

    const lines = buffer.split("\n")
    buffer = lines.pop() || ""

    for (const line of lines) {
      const result = parseSSELine(line)
      if (result.shouldReturn) return fullText
      if (result.needsYield) await new Promise((r) => requestAnimationFrame(r))
    }

    if (done && buffer.trim()) {
      const result = parseSSELine(buffer)
      if (result.shouldReturn) return fullText
      buffer = ""
    }

    if (done) break
  }

  onDone?.(fullText, {})
  return fullText
}
// 消息列表：{ role: 'user' | 'assistant', content: string }
const messages = ref([])
const inputText = ref("")
const listRef = ref(null)
const loading = ref(false)
/** 超过该时长仍无首字时显示「AI 正在思考…」*/
const showThinkingHint = ref(false)
/** 已收到流式首条事件（首条 content 为空时显示「已连接 / 思考中」）*/
const streamConnected = ref(false)
let abortController = null
let thinkingTimer = null

const sendMessage = async () => {
  const text = inputText.value?.trim()
  if (!text || loading.value) return

  messages.value.push({ role: "user", content: text })
  inputText.value = ""
  loading.value = true
  showThinkingHint.value = false
  streamConnected.value = false
  if (thinkingTimer) clearTimeout(thinkingTimer)
  thinkingTimer = setTimeout(() => { showThinkingHint.value = true }, 1500)

  messages.value.push({ role: "assistant", content: "" })
  const assistantIndex = messages.value.length - 1
  nextTick(() => scrollToBottom())

  // abortController 用于中断正在进行的 streamChat 请求，实现聊天生成过程被用户手动打断
  abortController = new AbortController()
  const history = messages.value.slice(0, -2).map(m => ({ role: m.role, content: m.content }))

  // 使用 try-catch 语句捕获异步请求中的异常
  try {
    // 调用 streamChat 进行流式对话
    await streamChat(text, history, {
      // onChunk: 每收到一段新的内容时被调用
      onChunk(_, fullText) {
        if (thinkingTimer) clearTimeout(thinkingTimer) // 若存在思考提示计时器，清除之（因为已经有新内容返回）
        thinkingTimer = null // 置空计时器句柄
        showThinkingHint.value = false // 不再显示“AI正在思考”提示
        streamConnected.value = true  // 标记流连接已建立
        messages.value[assistantIndex].content = fullText // 更新 assistant 的回复内容
        nextTick(() => scrollToBottom()) // 下一次 DOM 更新时滚动到底部，显示新消息
      },
      // onDone: 所有内容流式反馈结束后被调用
      onDone() {
        if (thinkingTimer) clearTimeout(thinkingTimer) // 清理计时器
        thinkingTimer = null // 置空计时器句柄
        showThinkingHint.value = false // 隐藏思考提示
        streamConnected.value = false // 标记已断开流连接
        loading.value = false // 标记生成加载已完成
        nextTick(() => scrollToBottom()) // 滚动到底部
      },
      // onError: 出现异常时调用
      onError(errMsg) {
        // 若 assistant 尚未输出内容，则用异常提示文本替代
        if (messages.value[assistantIndex].content === "") {
          messages.value[assistantIndex].content = "[请求异常：" + errMsg + "]"
        }
        loading.value = false // 标记生成结束
        nextTick(() => scrollToBottom()) // 滚动到底部
      },
      // 传递 abortController.signal，用于外部中断请求
      signal: abortController.signal,
    })
  } catch (err) {
    // 捕获请求过程中抛出的异常
    if (thinkingTimer) clearTimeout(thinkingTimer) // 清除计时器
    thinkingTimer = null // 置空计时器句柄
    showThinkingHint.value = false // 隐藏思考提示
    streamConnected.value = false // 标记流未连接
    // 若为"AbortError"，表示用户手动中断，无需额外处理
    if (err?.name === "AbortError") return
    // 提取错误信息，若不存在则显示“请求失败”
    const msg = err?.message || "请求失败"
    // 若 assistant 尚未输出内容，则用异常提示文本替代
    if (messages.value[assistantIndex].content === "") {
      messages.value[assistantIndex].content = "[请求异常：" + msg + "]"
    }
    // 用 ElMessage 组件弹窗显示错误信息
    ElMessage.error(msg)
    loading.value = false // 标记生成过程已结束
    nextTick(() => scrollToBottom()) // 滚动到底部
  }
}

const stopGeneration = () => {
  if (abortController) {
    abortController.abort()
    loading.value = false
    showThinkingHint.value = false
    streamConnected.value = false
    if (thinkingTimer) clearTimeout(thinkingTimer)
    thinkingTimer = null
  }
}

const scrollToBottom = () => {
  const el = listRef.value
  if (el) el.scrollTop = el.scrollHeight
}

AI Chat实现第一步，流式输出，教你如何实现打字流