[译] 前端人工智能框架 Transformers.js v3 大版本更新！WebGPU、新模型统统有！Transfor

💰 点进来就是赚到知识点！本文译自官方 Blog，作者 Xenova ，点赞、收藏、评论更能促进消化吸收！

经过一年多的开发，我们很高兴地宣布发布🤗 Transformers.js v3！

亮点包括：

WebGPU 支持（比 WASM 快 100 倍！）
新的量化格式（dtypes） )
总共支持 120 种架构
25 个新的示例项目和模板
超过 1200 个预转换模型上架 Hugging Face Hub
兼容 Node.js（ESM + CJS）、Deno 和 Bun
全新的 GitHub 和 NPM 主页

安装

你可以使用以下命令从 npm 安装 Transformers.js v3：

npm i @huggingface/transformers

然后，将库导入进来：

import { pipeline } from "@huggingface/transformers";

或者通过 CDN 引入：

import { pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0";

更多信息请参见文档。

支持 WebGPU

WebGPU 是加速图形渲染和计算的新 Web 标准。该 API 使 Web 开发者能够使用底层系统的 GPU 直接在浏览器中进行高性能计算。WebGPU 是 WebGL 的继承者，它提供了更好的性能，因为它能与现代 GPU 进行更直接的交互。最后，它支持通用 GPU 计算，这使其非常适合机器学习！

截至 2024 年 10 月，全球 WebGPU 支持率约为 70%（数据来自 caniuse.com），这意味着一些用户可能无法使用相关 API。

如果以下 demo 在你的浏览器中不起作用，你可能需要通过浏览器的功能 flag 启用设置：

Firefox：dom.webgpu.enabled flag（见此处）。

Safari：WebGPU flag（见此处）。

较旧的 Chromium 浏览器（在Windows、macOS、Linux 上）：enable-unsafe-webgpu flag（见此处）。

在 Transformers.js v3 中使用 WebGPU

通过与 ONNX Runtime Web 的合作，启用 WebGPU 加速就像在加载模型时写一行 device: 'webgpu' 一样简单。让我们来看一些例子！

示例：在 WebGPU 上计算文本嵌入（Text Embedding）（demo）

import { pipeline } from "@huggingface/transformers";

// Create a feature-extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "mixedbread-ai/mxbai-embed-xsmall-v1",
  { device: "webgpu" },
});

// Compute embeddings
const texts = ["Hello world!", "This is an example sentence."];
const embeddings = await extractor(texts, { pooling: "mean", normalize: true });
console.log(embeddings.tolist());
// [
//   [-0.016986183822155, 0.03228696808218956, -0.0013630966423079371, ... ],
//   [0.09050482511520386, 0.07207386940717697, 0.05762749910354614, ... ],
// ]

示例：在 WebGPU 上使用 OpenAI whisper 模型进行自动语音识别（demo）

import { pipeline } from "@huggingface/transformers";

// Create automatic speech recognition pipeline
const transcriber = await pipeline(
  "automatic-speech-recognition",
  "onnx-community/whisper-tiny.en",
  { device: "webgpu" },
);

// Transcribe audio from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";
const output = await transcriber(url);
console.log(output);
// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }

示例：在 WebGPU 上使用 MobileNetV4 执行图像分类（demo）

import { pipeline } from "@huggingface/transformers";

// Create image classification pipeline
const classifier = await pipeline(
  "image-classification",
  "onnx-community/mobilenetv4_conv_small.e2400_r224_in1k",
  { device: "webgpu" },
);

// Classify an image from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg";
const output = await classifier(url);
console.log(output);
// [
//   { label: 'tiger, Panthera tigris', score: 0.6149784922599792 },
//   { label: 'tiger cat', score: 0.30281734466552734 },
//   { label: 'tabby, tabby cat', score: 0.0019135422771796584 },
//   { label: 'lynx, catamount', score: 0.0012161266058683395 },
//   { label: 'Egyptian cat', score: 0.0011465961579233408 }
// ]

新的量化格式（dtypes）

在 Transformers.js v3 之前，我们通过把 quantized 设置成 true 或 false，来指定是模型的量化参数是 q8 还是全精度 fp32。现在，我们能够使用 dtype 参数从更大的列表中进行选择。

具体的可用量化参数列表取决于模型，而常见的有：全精度（fp32）、半精度（fps6）、8 位（q8、int8、uint8）和 4 位（q4、bnb4、q4f16）。

(e.g., mixedbread-ai/mxbai-embed-xsmall-v1)

基础用法

示例：以 4 位量化运行 Qwen2.5-0.5B-Instruct 模型（demo）

import { pipeline } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen2.5-0.5B-Instruct",
  { dtype: "q4", device: "webgpu" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me a funny joke." },
];

// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);

各个模型的 dtypes

一些编码器-解码器架构的模型，如 Whisper 或 Florence-2，对量化设置非常敏感：对编码器侧尤为如此。因此，我们添加了新能力来选择每个模型的 dtypes，这可以通过提供从模型名称到 dtypes 的映射来实现。

示例：在 WebGPU 上运行 Florence-2 (demo)

import { Florence2ForConditionalGeneration } from "@huggingface/transformers";

const model = await Florence2ForConditionalGeneration.from_pretrained(
  "onnx-community/Florence-2-base-ft",
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);

完整代码：

import {
  Florence2ForConditionalGeneration,
  AutoProcessor,
  AutoTokenizer,
  RawImage,
} from "@huggingface/transformers";

// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
  model_id,
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);

// Generate text
const generated_ids = await model.generate({
  ...text_inputs,
  ...vision_inputs,
  max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
  skip_special_tokens: false,
})[0];

// Post-process the generated text
const result = processor.post_process_generation(
  generated_text,
  task,
  image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }

支持 120 种架构

此版本将支持的架构总数增加到 120 个（完整列表见此），涵盖了广泛的输入方式和任务。其中包括：Phi-3、Gemma & Gemma 2、LLaVa、Moondream、Florence-2、MusicGen、Sapiens、Depth Pro、PyAnnote 和 RT-DETR。

示例项目和模板

作为新版本的一部分，我们发布了 25 个新的示例项目和模板，专用于展示对 WebGPU 的支持！其中包括 Phi-3.5 WebGPU 和 Whisper WebGPU 等演示，如下所示。

我们正在将所有示例项目和 demo 转移到github.com/huggingface…，请继续关注此更新！

超过 1200 个预转换模型

截至今天的版本，社区已经将 1200 多个模型转换成了与 Transformers.js 兼容的版本！你可以在此处找到可用模型的完整列表。

如果你想自己转换或微调，可以使用我们的转换脚本，如下所示：

python -m scripts.convert --quantize --model_id <model_name_or_path>

将生成的文件上传到 Hugging Face Hub 后，记得添加 transformers.js 标签，这样其他人就可以很容易地找到并使用你的模型啦！

兼容 Node.js（ESM + CJS）、Deno 和 Bun

Transformers.js v3 现在与三种最流行的服务器端 JavaScript 运行时兼容：

运行时	描述	示例
Node.js	一个广泛使用的基于 Chrome V8 的 JavaScript 运行时。具有庞大的生态系统，支持各种库和框架。	ESM 示例 / CJS 示例
Deno	JavaScript 和 TypeScript 的现代运行时，安全性更高。它支持 ES 模块以及具有实验性的 WebGPU 功能。	Deno 示例
Bun	针对性能优化的快速 JavaScript 运行时。它具有内置的打包器、转译器和包管理器。	Bun 示例

全新的 npm 和 GitHub 主页

最后，我们很高兴地宣布，在 npm 上，Transformers.js 将发布在 Hugging Face 官方组织下，名为 @huggingface/Transformers（而不是之前的用于 v1 和 v2 的 @xenova/transforms）。

在 GitHub 上，我们还把仓库迁移到了 Hugging Face 官方组织(github.com/huggingface…)名下，以后这里就是我们的新家啦 —— 来打个招呼吧！我们非常期待能在新家收到你的反馈、回复你的 issue、看到你的 PR！

这是一个重要的里程碑，我们非常感谢社区帮助我们实现这一长期目标！没有你们所有人，这一切都是不可能的。谢谢！🤗

📣 我是 Jax，在畅游 Web 技术海洋的又一年，我仍然是坚定不移的 JavaScript 迷弟，Web 技术带给我太多乐趣。如果你也和我一样，欢迎关注、私聊！