在前端运行Qwen3.5原生多模态模型一个开箱即用、单文件驱动的纯前端大语言模型聊天应用。基于 Transformers

在两年前其实我写过一篇前端transformers.js的介绍，里面可以运行一些对话、图片识别、语音转文字、文字转语音等等：Transformers.js：Web 上的最新机器学习技术（1）
但是那时候的模型还很弱鸡，所以我也只是随便玩玩而已，但是现在6202年了，国产开源模型跟坐火箭一样疯狂进化，特别是像Qwen3.5这样的原生多模态模型都出来了。然后早上突然有个鬼点子出来了，能不能在浏览器跑一下，然后我就去hf.co看了一下有没有3.5的模型可以用，一搜索发现还真有：
点进去发现还有demo，而且像模像样的，说明应该是可以直接跑的。
那么就让ai写一个界面来用一下试试，这里我使用了哈基米3.1pro：
然后这是使用展示：
拿来做ocr的话0.8b准确率不高，有兴趣可以尝试一下4b。
（在写文的时候突然想到加入了速度计算，不过好像不是从首字返回开始计时，可能不太准确，可以自己优化一下）
这是ai生成的html：
<!doctype html>
<html lang="zh-CN">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Qwen3.5-ONNX WebGPU Chat</title>
    <style>
      :root {
        --primary: #2563eb;
        --bg-sidebar: #f3f4f6;
        --bg-main: #ffffff;
        --border: #e5e7eb;
        --msg-user: #2563eb;
        --msg-ai: #f3f4f6;
      }

      * {
        box-sizing: border-box;
        margin: 0;
        padding: 0;
      }
      body {
        font-family:
          -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica,
          Arial, sans-serif;
        display: flex;
        height: 100vh;
        overflow: hidden;
        background: var(--bg-main);
      }

      /* Sidebar Styles */
      #sidebar {
        width: 260px;
        background: var(--bg-sidebar);
        border-right: 1px solid var(--border);
        display: flex;
        flex-direction: column;
      }
      #sidebar-header {
        padding: 13px 15px 12.5px 15px;
        border-bottom: 1px solid var(--border);
      }
      #new-chat-btn {
        width: 100%;
        padding: 10px;
        background: var(--primary);
        color: white;
        border: none;
        border-radius: 6px;
        cursor: pointer;
        font-size: 14px;
      }
      #new-chat-btn:hover {
        background: #1d4ed8;
      }
      #chat-list {
        flex: 1;
        overflow-y: auto;
        padding: 10px;
      }
      .chat-item {
        padding: 10px;
        margin-bottom: 5px;
        border-radius: 6px;
        cursor: pointer;
        display: flex;
        justify-content: space-between;
        align-items: center;
      }
      .chat-item:hover {
        background: #e5e7eb;
      }
      .chat-item.active {
        background: #dbeafe;
        color: #1e40af;
      }
      .chat-title {
        font-size: 14px;
        white-space: nowrap;
        overflow: hidden;
        text-overflow: ellipsis;
        flex: 1;
      }
      .delete-btn {
        color: #ef4444;
        font-size: 12px;
        border: none;
        background: none;
        cursor: pointer;
        padding: 0 5px;
        display: none;
      }
      .chat-item:hover .delete-btn {
        display: block;
      }

      /* Main Chat Area */
      #main {
        flex: 1;
        display: flex;
        flex-direction: column;
        position: relative;
      }
      #header {
        padding: 15px 20px;
        border-bottom: 1px solid var(--border);
        display: flex;
        justify-content: space-between;
        align-items: center;
        background: white;
        z-index: 10;
      }
      #model-controls {
        display: flex;
        gap: 10px;
        align-items: center;
      }
      select {
        padding: 8px;
        border-radius: 4px;
        border: 1px solid var(--border);
        font-size: 14px;
        min-width: 250px;
      }
      #load-model-btn {
        padding: 8px 15px;
        background: var(--primary);
        color: white;
        border: none;
        border-radius: 4px;
        cursor: pointer;
      }
      #load-model-btn:disabled {
        background: #9ca3af;
        cursor: not-allowed;
      }

      #chat-container {
        flex: 1;
        overflow-y: auto;
        padding: 20px;
        display: flex;
        flex-direction: column;
        gap: 15px;
        background: white;
      }
      .message {
        max-width: 80%;
        padding: 12px 16px;
        border-radius: 8px;
        line-height: 1.5;
        font-size: 15px;
        word-wrap: break-word;
      }
      .message img {
        max-width: 300px;
        border-radius: 6px;
        margin-bottom: 8px;
        display: block;
      }
      .user-msg {
        background: var(--msg-user);
        color: white;
        align-self: flex-end;
        border-bottom-right-radius: 0;
      }
      .ai-msg {
        background: var(--msg-ai);
        color: black;
        align-self: flex-start;
        border-bottom-left-radius: 0;
      }

      /* Token Speed Stats */
      .msg-stats {
        font-size: 12px;
        color: #6b7280;
        margin-top: 8px;
        font-family:
          ui-monospace, SFMono-Regular, Consolas, 'Courier New', monospace;
      }

      /* Input Area */
      #input-wrapper {
        padding: 15px 20px;
        border-top: 1px solid var(--border);
        background: white;
      }
      #preview-container {
        display: none;
        margin-bottom: 10px;
        position: relative;
        width: max-content;
      }
      #preview-img {
        max-height: 100px;
        border-radius: 6px;
        border: 1px solid var(--border);
      }
      #remove-img-btn {
        position: absolute;
        top: -8px;
        right: -8px;
        background: #ef4444;
        color: white;
        border: none;
        border-radius: 50%;
        width: 20px;
        height: 20px;
        cursor: pointer;
        display: flex;
        align-items: center;
        justify-content: center;
        font-size: 12px;
      }

      #input-form {
        display: flex;
        gap: 10px;
        align-items: flex-end;
      }
      #text-input {
        flex: 1;
        padding: 12px;
        border: 1px solid var(--border);
        border-radius: 6px;
        resize: none;
        min-height: 45px;
        max-height: 150px;
        font-family: inherit;
        font-size: 15px;
      }
      .icon-btn {
        background: none;
        border: 1px solid var(--border);
        border-radius: 6px;
        padding: 10px;
        cursor: pointer;
        color: #4b5563;
        display: flex;
        align-items: center;
        justify-content: center;
      }
      .icon-btn:hover {
        background: #f3f4f6;
      }
      #send-btn {
        background: var(--primary);
        color: white;
        border: none;
        padding: 0 20px;
        height: 45px;
        border-radius: 6px;
        cursor: pointer;
        font-weight: 500;
      }
      #send-btn:disabled {
        background: #9ca3af;
        cursor: not-allowed;
      }

      /* Loading Modal */
      #loading-modal {
        position: absolute;
        top: 0;
        left: 0;
        right: 0;
        bottom: 0;
        background: rgba(255, 255, 255, 0.9);
        z-index: 100;
        display: none;
        flex-direction: column;
        justify-content: center;
        align-items: center;
      }
      .modal-box {
        background: white;
        padding: 30px;
        border-radius: 12px;
        box-shadow: 0 4px 20px rgba(0, 0, 0, 0.1);
        width: 400px;
        text-align: center;
      }
      #progress-text {
        margin-bottom: 15px;
        font-weight: 500;
        color: #374151;
      }
      progress {
        width: 100%;
        height: 10px;
        border-radius: 5px;
        appearance: none;
      }
      progress::-webkit-progress-bar {
        background-color: #e5e7eb;
        border-radius: 5px;
      }
      progress::-webkit-progress-value {
        background-color: var(--primary);
        border-radius: 5px;
      }
    </style>
  </head>
  <body>
    <!-- 左侧对话历史 -->
    <div id="sidebar">
      <div id="sidebar-header">
        <button id="new-chat-btn">+ 新建对话</button>
      </div>
      <div id="chat-list"></div>
    </div>

    <!-- 右侧主界面 -->
    <div id="main">
      <div id="header">
        <h3>Qwen WebGPU Chat</h3>
        <div id="model-controls">
          <select id="model-select">
            <option value="huggingworld/Qwen3.5-0.8B-ONNX">
              Qwen3.5-0.8B-ONNX
            </option>
            <option value="huggingworld/Qwen3.5-2B-ONNX">
              Qwen3.5-2B-ONNX
            </option>
            <option value="huggingworld/Qwen3.5-4B-ONNX">
              Qwen3.5-4B-ONNX
            </option>
          </select>
          <button id="load-model-btn">加载模型</button>
        </div>
      </div>

      <div id="chat-container">
        <!-- 聊天消息会插入在这里 -->
      </div>

      <div id="input-wrapper">
        <div id="preview-container">
          <img id="preview-img" src="" alt="preview" />
          <button id="remove-img-btn">✕</button>
        </div>
        <form id="input-form">
          <input
            type="file"
            id="file-input"
            accept="image/*"
            style="display: none"
          />
          <button
            type="button"
            class="icon-btn"
            id="attach-btn"
            title="上传图片"
          >
            📷
          </button>
          <textarea
            id="text-input"
            placeholder="输入消息 (Shift + Enter 换行)..."
            rows="1"
          ></textarea>
          <button type="submit" id="send-btn" disabled>发送</button>
        </form>
      </div>

      <!-- 模型加载进度弹窗 -->
      <div id="loading-modal">
        <div class="modal-box">
          <h3 style="margin-bottom: 15px">加载模型中</h3>
          <div id="progress-text">正在初始化...</div>
          <progress id="progress-bar" value="0" max="100"></progress>
          <p style="margin-top: 15px; font-size: 12px; color: #6b7280">
            首次加载会自动下载模型文件，请耐心等待。
          </p>
        </div>
      </div>
    </div>

    <script type="module">
      import {
        AutoProcessor,
        Qwen3_5ForConditionalGeneration,
        RawImage,
        TextStreamer,
      } from 'https://unpkg.com/@huggingface/transformers@4.0.0-next.8/dist/transformers.min.js'

      // ============== 自定义 Streamer 以实时更新 UI 并计算速度 ==============
      class DOMTextStreamer extends TextStreamer {
        constructor(tokenizer, callback) {
          super(tokenizer, { skip_prompt: true, skip_special_tokens: true })
          this.callback = callback
          this.generatedText = ''

          // 用于速度统计的变量
          this.tokenCount = 0
          this.startTime = null
          this.first_put_done = false
          this.finalTps = 0
        }

        put(value) {
          // 第一次 put 通常包含 Prompt 或者是生成前的准备
          if (!this.first_put_done) {
            this.first_put_done = true
            // 以处理完 Prompt 后的时间为起点，这样算出来的是纯 Decode 速度
            this.startTime = performance.now()
          } else {
            // 后续每次 put 记录新生成的 token 数
            let count =
              value.size !== undefined ? value.size : value.length || 1
            this.tokenCount += count
          }
          super.put(value)
        }

        on_finalized_text(text, streamEnd) {
          this.generatedText += text

          // 计算当前速度 (Tokens/s)
          let tps = 0
          if (this.tokenCount > 0 && this.startTime) {
            const elapsed = (performance.now() - this.startTime) / 1000
            if (elapsed > 0) tps = (this.tokenCount / elapsed).toFixed(2)
          }
          this.finalTps = tps

          this.callback(this.generatedText, streamEnd, tps)
        }
      }

      // ============== IndexedDB 封装 ==============
      const DB_NAME = 'QwenChatDB'
      const STORE_NAME = 'chats'
      let db

      function initDB() {
        return new Promise((resolve, reject) => {
          const req = indexedDB.open(DB_NAME, 1)
          req.onupgradeneeded = (e) => {
            let db = e.target.result
            if (!db.objectStoreNames.contains(STORE_NAME)) {
              db.createObjectStore(STORE_NAME, { keyPath: 'id' })
            }
          }
          req.onsuccess = (e) => {
            db = e.target.result
            resolve()
          }
          req.onerror = (e) => reject(e)
        })
      }

      function saveChat(chat) {
        return new Promise((resolve) => {
          const tx = db.transaction(STORE_NAME, 'readwrite')
          tx.objectStore(STORE_NAME).put(chat)
          tx.oncomplete = () => resolve()
        })
      }

      function getAllChats() {
        return new Promise((resolve) => {
          const tx = db.transaction(STORE_NAME, 'readonly')
          const req = tx.objectStore(STORE_NAME).getAll()
          req.onsuccess = () => resolve(req.result || [])
        })
      }

      function deleteChat(id) {
        return new Promise((resolve) => {
          const tx = db.transaction(STORE_NAME, 'readwrite')
          tx.objectStore(STORE_NAME).delete(id)
          tx.oncomplete = () => resolve()
        })
      }

      // ============== 全局状态 ==============
      let currentChatId = null
      let chatHistory = []
      let processor = null
      let model = null
      let isLoading = false
      let currentImageBase64 = null

      // ============== UI 元素 ==============
      const chatListEl = document.getElementById('chat-list')
      const chatContainerEl = document.getElementById('chat-container')
      const textInput = document.getElementById('text-input')
      const sendBtn = document.getElementById('send-btn')
      const loadModelBtn = document.getElementById('load-model-btn')
      const modelSelect = document.getElementById('model-select')

      const attachBtn = document.getElementById('attach-btn')
      const fileInput = document.getElementById('file-input')
      const previewContainer = document.getElementById('preview-container')
      const previewImg = document.getElementById('preview-img')
      const removeImgBtn = document.getElementById('remove-img-btn')

      const loadingModal = document.getElementById('loading-modal')
      const progressBar = document.getElementById('progress-bar')
      const progressText = document.getElementById('progress-text')

      // ============== 初始化流程 ==============
      async function init() {
        await initDB()
        await loadSidebar()
        createNewChat()
      }

      // ============== 侧边栏与历史管理 ==============
      async function loadSidebar() {
        const chats = await getAllChats()
        chats.sort((a, b) => b.id - a.id)
        chatListEl.innerHTML = ''

        chats.forEach((chat) => {
          const div = document.createElement('div')
          div.className = `chat-item ${chat.id === currentChatId ? 'active' : ''}`
          div.onclick = () => selectChat(chat)

          const title = document.createElement('div')
          title.className = 'chat-title'
          title.innerText = chat.title || '新对话'

          const delBtn = document.createElement('button')
          delBtn.className = 'delete-btn'
          delBtn.innerText = '删除'
          delBtn.onclick = async (e) => {
            e.stopPropagation()
            await deleteChat(chat.id)
            if (currentChatId === chat.id) createNewChat()
            loadSidebar()
          }

          div.appendChild(title)
          div.appendChild(delBtn)
          chatListEl.appendChild(div)
        })
      }

      function createNewChat() {
        currentChatId = Date.now()
        chatHistory = []
        chatContainerEl.innerHTML = ''
        loadSidebar()
      }

      async function selectChat(chat) {
        currentChatId = chat.id
        chatHistory = chat.messages || []
        chatContainerEl.innerHTML = ''

        // 重新加载历史时也要显示之前记录的速度
        chatHistory.forEach((msg) => {
          const { statsDiv } = appendMessageUI(msg.role, msg.text, msg.image)
          if (msg.tps && statsDiv) {
            statsDiv.innerText = `⚡ 速度: ${msg.tps} tokens/s`
          }
        })
        loadSidebar()
      }

      document
        .getElementById('new-chat-btn')
        .addEventListener('click', createNewChat)

      // ============== 图片上传预览 ==============
      attachBtn.onclick = () => fileInput.click()

      fileInput.onchange = (e) => {
        const file = e.target.files[0]
        if (!file) return
        const reader = new FileReader()
        reader.onload = (event) => {
          currentImageBase64 = event.target.result
          previewImg.src = currentImageBase64
          previewContainer.style.display = 'block'
        }
        reader.readAsDataURL(file)
        fileInput.value = ''
      }

      removeImgBtn.onclick = () => {
        currentImageBase64 = null
        previewContainer.style.display = 'none'
        previewImg.src = ''
      }

      // ============== 加载模型 ==============
      loadModelBtn.onclick = async () => {
        if (isLoading) return
        const model_id = modelSelect.value

        isLoading = true
        loadModelBtn.disabled = true
        modelSelect.disabled = true
        loadingModal.style.display = 'flex'

        const progressMap = {}

        try {
          const progressCallback = (info) => {
            if (info.status === 'progress') {
              progressMap[info.file] = {
                loaded: info.loaded,
                total: info.total,
              }
              let totalLoaded = 0,
                totalSize = 0
              for (let file in progressMap) {
                totalLoaded += progressMap[file].loaded
                totalSize += progressMap[file].total
              }
              const percent =
                totalSize > 0 ? (totalLoaded / totalSize) * 100 : 0
              progressBar.value = percent
              progressText.innerText = `下载中... ${Math.round(percent)}%`
            } else if (info.status === 'ready') {
              progressText.innerText = `加载至 WebGPU... 这可能会需要一段时间`
            }
          }

          processor = await AutoProcessor.from_pretrained(model_id, {
            progress_callback: progressCallback,
          })

          model = await Qwen3_5ForConditionalGeneration.from_pretrained(
            model_id,
            {
              dtype: {
                embed_tokens: 'q4',
                vision_encoder: 'fp16',
                decoder_model_merged: 'q4',
              },
              device: 'webgpu',
              progress_callback: progressCallback,
            },
          )

          sendBtn.disabled = false
          loadModelBtn.innerText = '模型已加载'
        } catch (error) {
          console.error(error)
          alert(
            '模型加载失败。请确保您的浏览器支持 WebGPU（Chrome 113+），或者控制台查看具体报错。\n' +
              error.message,
          )
          loadModelBtn.disabled = false
          modelSelect.disabled = false
        } finally {
          isLoading = false
          loadingModal.style.display = 'none'
        }
      }

      // ============== 消息 UI 处理 ==============
      // 修改为返回对象包含 textSpan 和 statsDiv，方便外部实时更新
      function appendMessageUI(role, text, imageBase64 = null, msgId = null) {
        const msgDiv = document.createElement('div')
        msgDiv.className = `message ${role === 'user' ? 'user-msg' : 'ai-msg'}`
        if (msgId) msgDiv.id = msgId

        if (imageBase64) {
          const img = document.createElement('img')
          img.src = imageBase64
          msgDiv.appendChild(img)
        }

        const textSpan = document.createElement('span')
        textSpan.innerText = text
        msgDiv.appendChild(textSpan)

        // 如果是 AI 回复，添加一个显示速度的专属容器
        let statsDiv = null
        if (role === 'assistant') {
          statsDiv = document.createElement('div')
          statsDiv.className = 'msg-stats'
          msgDiv.appendChild(statsDiv)
        }

        chatContainerEl.appendChild(msgDiv)
        chatContainerEl.scrollTop = chatContainerEl.scrollHeight

        return { textSpan, statsDiv }
      }

      // ============== 发送消息与模型推理 ==============
      document
        .getElementById('input-form')
        .addEventListener('submit', async (e) => {
          e.preventDefault()
          const text = textInput.value.trim()
          if (!text && !currentImageBase64) return
          if (!model || !processor) {
            alert('请先加载模型！')
            return
          }

          const userText = text
          const userImg = currentImageBase64

          // 清理输入框
          textInput.value = ''
          removeImgBtn.click()

          // 显示用户消息并保存
          appendMessageUI('user', userText, userImg)
          chatHistory.push({ role: 'user', text: userText, image: userImg })
          updateDB()

          // 禁用输入
          sendBtn.disabled = true
          textInput.disabled = true

          // 准备 AI 消息框
          const aiMsgId = `ai-msg-${Date.now()}`
          const { textSpan: aiTextSpan, statsDiv: aiStatsDiv } =
            appendMessageUI('assistant', '思考中...', null, aiMsgId)

          try {
            // 1. 整理对话格式（适配 apply_chat_template）
            const conversation = []
            const rawImages = []

            for (let msg of chatHistory) {
              if (msg.role === 'user') {
                const content = []
                if (msg.image) {
                  content.push({ type: 'image' })
                  const rImg = await RawImage.read(msg.image)
                  const resized = await rImg.resize(448, 448)
                  rawImages.push(resized)
                }
                content.push({ type: 'text', text: msg.text || '' })
                conversation.push({ role: 'user', content: content })
              } else {
                conversation.push({
                  role: 'assistant',
                  content: [{ type: 'text', text: msg.text }],
                })
              }
            }

            // 2. 调用 Processor
            const promptText = processor.apply_chat_template(conversation, {
              add_generation_prompt: true,
            })

            let inputs
            if (rawImages.length > 0) {
              inputs = await processor(
                promptText,
                rawImages.length === 1 ? rawImages[0] : rawImages,
              )
            } else {
              inputs = await processor(promptText)
            }

            // 3. 构建 Streamer 来动态更新网页及显示速度
            aiTextSpan.innerText = ''

            const streamer = new DOMTextStreamer(
              processor.tokenizer,
              (newText, isEnd, tps) => {
                aiTextSpan.innerText = newText
                if (tps > 0) {
                  aiStatsDiv.innerText = `⚡ 速度: ${tps} tokens/s`
                }
                chatContainerEl.scrollTop = chatContainerEl.scrollHeight
              },
            )

            // 4. 开始生成
            await model.generate({
              ...inputs,
              max_new_tokens: 512,
              streamer: streamer,
            })

            // 5. 保存结果（把速度也保存进去，方便重新加载时查看）
            chatHistory.push({
              role: 'assistant',
              text: aiTextSpan.innerText,
              tps: streamer.finalTps,
            })
            updateDB()
          } catch (err) {
            console.error('生成报错: ', err)
            aiTextSpan.innerText = '生成出错: ' + err.message
          } finally {
            sendBtn.disabled = false
            textInput.disabled = false
            textInput.focus()
          }
        })

      // 绑定 Enter 快捷键
      textInput.addEventListener('keydown', (e) => {
        if (e.key === 'Enter' && !e.shiftKey) {
          e.preventDefault()
          document.getElementById('send-btn').click()
        }
      })

      async function updateDB() {
        // 自动用第一条消息作为标题
        const title =
          chatHistory.length > 0
            ? chatHistory[0].text.substring(0, 15) || '图片对话'
            : '新对话'

        await saveChat({
          id: currentChatId,
          title: title,
          messages: chatHistory,
        })
        loadSidebar()
      }

      // 启动应用
      init()
    </script>
  </body>
</html>
这是我放到vercel上面的代理，可以通过这个链接体验：my-ai-gen-demo.vercel.app/qwen-webgpu…