如何实现一个Canvas渲染引擎(七)：webGPU渲染谷歌于2023年4月份在chrome113上发布了webGPU，

友情提示

请先看这里 👉 引言
阅读本文需要有一些webGPU知识，学习webGPU基础可以看这里 👉 WebGPU Fundamentals
源码的GitHub地址 👉 代码
本文内容对应的代码在feat/webgpu这个分支上，建议切到这个分支查看代码。

1. 前言

1.1 webGPU和webGL的关系

在上一篇文章中大家已经看到了，webGL相对于canvas2D有着碾压级的渲染速度，那么webGPU的出现，又是为了解决什么问题呢？我想没有谁比谷歌更有资格回答这个问题了，如下图所示：

对于这个渲染引擎来说，最重要的是第一条，webGPU相对于webGL更加‘现代’，从本质和使用方式上来说，webGL和webGPU都是一个光栅化引擎，或者说线性插值引擎，但是webGPU能够更大程度地发挥出现代显卡的能力，并且，webGPU还提供了计算着色器这一功能，让GPU的功能不止局限于绘制图形，当然，这一条不在本文的讨论范围内，大家有兴趣可以自行去了解。

1.2 webGPU兼容性

谷歌于2023年4月份在chrome113版本上发布了webGPU，到目前为止(2024年6月份)，依然只有chrome上能够使用webGPU，相比之下，webGL已经取得了95%以上的浏览器的支持，目前离webGPU大规模普及还有一段比较长的距离，但是，webGPU的前景是非常优秀的，它可以说是web端渲染的未来，很多基于webGL的引擎已经开始支持webGPU了，如pixijs、threejs等。

2. 一些策略的改变

由于webGPU的特殊性，在这个渲染引擎中也需要适当做出一些改变。

2.1 公共逻辑的抽离

正如上面所说，webGL和webGPU都是一个光栅化引擎，所以这两者的编程模式是相通的，我们需要把图形处理成一系列的顶点，然后进行三角剖分，得到一系列的顶点数据，最后放到一个TypedArray里面，交给GPU，让其帮我们把图形绘制出来，如果你还不知道如何进行三角剖分，可以看这里 👉 顶点化和三角剖分

在webGPU渲染中要做的事情的很大一部分，已经在webGL渲染中做了，所以，我们要把这一公共部分抽离出来，然后单独放到一个类里，这个类将会被命名为：BatchRenderer类。在前面的一系列文章中，我们让CanvasRenderer和WebGLRenderer继承了Renderer这个基类，它们形成了如下的继承关系：

接下来，我们要让BatchRenderer类继承Renderer这个基类，然后让WebGLRenderer和WebGPURenderer继承BatchRenderer类，当然，CanvasRenderer依然是继承自Renderer这个基类，它们将会形成如下的继承关系：

在BatchRenderer类中，我们将会实现一些WebGL渲染和WebGPU渲染的公共逻辑，比如：构建顶点数组、顶点下标数组，更新节点的位置信息、将顶点数据写入ArrayBuffer等。其他逻辑则分别在WebGLRenderer和WebGPURenderer中实现，比如：设置一些shader的公共变量(uniform变量)如投影矩阵、将ArrayBuffer上传到GPU中、执行WebGL或WebGPU的绘制命令等。

2.2 init函数

webGPU获取adapter和device的api都是异步的，在完成了这2个步骤后，才能真正开始使用webGPU，所以，在使用webGPURenderer，需要调用对应的init函数，init函数是一个一步的函数，他会返回一个Promise，当adapter和device等初始完毕后，这个Promise就会变成resoved态。

但是，CanvasRenderer和WebGLRenderer的所有api都是同步的，所以这3种renderer的使用方式是会有差异的，但是，正如这一系列的文章的引言中所讲的，这个渲染引擎是一个WebGPU、WebGL、Canvas2D无缝切换的引擎，所以对于使用者来说，这3种renderer的使用方式必须是统一的。

所以，不仅是WebGPURenderer会有一个init函数，WebGLRenderer和CanvasRenderer都会有一个init函数，但是用户是不会直接使用这3个Renderer上的init函数的，而是会使用Application类上挂载的init函数，这个函数会去执行Renderer上的init函数，在new Application(options)后，需要app.init()后才会真正开始执行渲染逻辑。如下所示：

const app = new Application({
  view,
  backgroundColor: 'pink',
  backgroundAlpha: 0.5,
  prefer: 'webGPU',
  width,
  height
})

app.init().then(() => {
  // 业务逻辑
})

WebGPURenderer的部分init逻辑如下：

public async init() {
  await this.initDevice()

  //其他逻辑...
}

private async initDevice() {
  const adapter = (await navigator.gpu.requestAdapter()) as GPUAdapter

  const device = await adapter.requestDevice()

  this.gpu.configure({
    device,
    format: navigator.gpu.getPreferredCanvasFormat(),
    alphaMode: 'premultiplied'
  })

  this.device = device
}

2.3 取消65536个顶点上限限制

顶点数组没有65536个的上限限制了，现在所有顶点和顶点下标将会分别被放到两个很大的ArrayBuffer里，一次性上传到GPU。分段上传的话，可能无法最大化webGPU的多drawCall优势。由于顶点数量没有65536的上限限制了，所以顶点下标将抛弃UInt16的方式存储，转而使用UInt32 Array存储。

3. 公共逻辑

这一部分是BatchRenderer类中的一些公共逻辑，WebGLRenderer和WebGPURenderer都会用到这部分逻辑。

3.1 构建大数组

所谓的大数组就是两个很大的ArrayBuffer，里面分别存储着顶点数据和顶点下标数据，当首次进入页面或者stage上的元素发生了增删时等情况，需要全量构建整个大数组，否则，只需要更新大数组的某些部分就行了。

3.1.1 生成batch

在构建大数组时，首先会前序遍历stage上的所有元素，每个元素通过buildBatches函数生成对应的batch，buildBatches函数会被放到Container类上，所有要渲染的类都会继承Container类并实现buildBatches函数，比如，Graphics类的buildBatches函数如下：

// Graphics.buildBatches
public buildBatches(batchRenderer: BatchRenderer) {
  this.startPoly()

  this.worldId = this.transform.worldId

  this.geometry.buildVerticesAndTriangulate()

  const batchParts = this.geometry.batchParts

  for (let i = 0; i < batchParts.length; i++) {
    const { style, vertexStart, vertexCount, indexStart, indexCount } =
      batchParts[i]

    const { color, alpha } = style

    const rgba = toRgbaLittleEndian(color, alpha * this.worldAlpha)

    const batch = batchPool.get(this.type) as GraphicsBatch
    batch.vertexCount = vertexCount
    batch.indexCount = indexCount
    batch.rgba = rgba
    batch.vertexOffset = vertexStart
    batch.indexOffset = indexStart
    batch.graphics = this

    this.batches[i] = batch
    batchRenderer.addBatch(this.batches[i])
  }

  this.batchCount = batchParts.length
}

3.1.2 ArrayBuffer容量不够时扩容

// BatchRenderer.resizeBufferIfNeeded
protected resizeBufferIfNeeded() {
  if (this.vertexCount * BYTES_PER_VERTEX > this.vertFloatView.byteLength) {
    const arrayBuffer = new ArrayBuffer(this.vertexCount * BYTES_PER_VERTEX)
    this.vertFloatView = new Float32Array(arrayBuffer)
    this.vertIntView = new Uint32Array(arrayBuffer)
  }

  if (this.indexCount > this.indexBuffer.length) {
    this.indexBuffer = new Uint32Array(this.indexCount)
  }
}

3.1.3 将数据写入顶点数组和顶点下标数组

这一部分的内容是将每个batch中的顶点数据和顶点下标数据写入大数组，每个batch都会记录自身在大数组中的位置，这样就能正确将自身的数据写入大数组。

// BatchRenderer.packData
protected packData() {
  for (let i = 0; i < this.batchesCount; i++) {
    const batch = this.batches[i]

    batch.packVertices(this.vertFloatView, this.vertIntView)
    batch.packIndices(this.indexBuffer)
  }
}

packVertices和packIndices分别将顶点数据和顶点下标数据写入了大数组，其代码如下：

// GraphicsBatch.packVertices
packVertices(floatView: Float32Array, intView: Uint32Array): void {
  const step = BYTES_PER_VERTEX / 4

  const vertices = this.graphics.geometry.vertices.data

  const offset = this.vertexOffset

  for (let i = 0; i < this.vertexCount; i++) {
    const x = vertices[(offset + i) * 2] // position.x
    const y = vertices[(offset + i) * 2 + 1] // position.y

    const { a, b, c, d, tx, ty } = this.graphics.worldTransform

    const realX = a * x + c * y + tx
    const realY = b * x + d * y + ty

    const vertPos = (this.vertexStart + i) * step

    floatView[vertPos] = realX
    floatView[vertPos + 1] = realY

    intView[vertPos + 2] = this.rgba // color
  }
}

// GraphicsBatch.packIndices
packIndices(int32: Uint32Array): void {
  const indices = this.graphics.geometry.indices.data

  const offset = this.indexOffset

  for (let i = 0; i < this.indexCount; i++) {
    int32[this.indexStart + i] = indices[i + offset] + this.vertexStart
  }
}

3.2 更新大数组

如果stage上的元素只是发生了位置信息的改变，则没有必要重新构建整个大数组，只需要在大数组中定向更新部分节点的信息就行了。

更新大数组的代码如下：

// GraphicsBatch.updateVertices
updateVertices(floatView: Float32Array): void {
  // 每个batch都会记录自身在大数组中的位置，所以可以定向更新

  const step = BYTES_PER_VERTEX / 4

  const vertices = this.graphics.geometry.vertices.data

  const offset = this.vertexOffset

  const { a, b, c, d, tx, ty } = this.graphics.worldTransform

  for (let i = 0; i < this.vertexCount; i++) {
    const x = vertices[(offset + i) * 2] // position.x
    const y = vertices[(offset + i) * 2 + 1] // position.y

    const vertPos = (this.vertexStart + i) * step

    floatView[vertPos] = a * x + c * y + tx
    floatView[vertPos + 1] = b * x + d * y + ty
  }
}

4. 特殊逻辑

像初始化shader，创建GPU Buffer，更新shader中的uniform变量等操作，WebGPU和WebGL提供的api是各不相同的，所以这部分的逻辑要在WebGPURenderer和WebGLRenderer中分别实现。

由于WebGPU的基本逻辑和WebGL类似，所以有些部分并不会过多解释。

webGL部分之前已经讲过了，所以这里只会讲述webGPU部分。

4.1 初始化shader

首先我们要有一个顶点着色器和片元着色器。

webGL的shader language是glsl，而webGPU的shader language是另一种语言，它叫做wgsl，他的语法不是glsl的那种类似C语言的语法了，wgsl的语法类似于Rust的语法。当然，不懂Rust也完全没关系，我们并不会在着色器里实现非常复杂的逻辑，所以语法稍微学点就够了。

在一些细节上，wgsl的逻辑和glsl是不同的，比如对unfirom变量的处理等，但是wgsl和glsl整体逻辑是相同的，即：顶点着色器读取顶点数组，获取顶点数据，并进行插值，最后交给片元着色器，片元着色器对像素着色。

着色器具体代码如下：

顶点着色器

@group(0) @binding(0) var<uniform> u_root_transform: mat3x3<f32>;
@group(0) @binding(1) var<uniform> u_projection_matrix: mat3x3<f32>;

struct VertOutput {
  @builtin(position) v_position: vec4<f32>,
  @location(0) v_color : vec4<f32>,
};

@vertex
fn main(
  @location(0) a_position: vec2<f32>,
  @location(1) a_color: vec4<f32>,
) -> VertOutput {
  let v_position = vec4<f32>((u_projection_matrix * u_root_transform * vec3<f32>(a_position, 1.0)).xy, 0.0, 1.0);

  let v_color = a_color;

  return VertOutput(v_position, v_color);
}

在变量的命名方面，我依然采用了之前glsl里的命名方式，也就是下划线命名，u开头的代表uniform变量，a开头的代表attribute变量，v开头的代表varying变量。

两个uniform变量和之前的glsl里的类似，都是2个3x3的矩阵，每个值都是float32类型。

片元着色器

@fragment
fn main(
  @location(0) v_color : vec4<f32>,
) -> @location(0) vec4<f32> {
  return v_color;
}

片元着色器依然很简洁，顶点着色器传过来什么就用什么。

接下来就是一系列的固定代码了，我们会：申请一个device，创建一个GPU Buffer，创建render pipeline，初始化uniform的bind group等，代码如下：

// WebGPURenderer.init
public async init() {
  await this.initDevice()

  this.initGpuBuffer()
  this.initRenderPassDescriptor()
  this.createPipeline()
  this.initUniformBindGroup()

  this.setRootTransform(1, 0, 0, 1, 0, 0)

  this.setProjectionMatrix()
}

这些函数具体的细节可以去项目里面看，这里不过多阐述。

4.2 设置投影矩阵和stage的变换矩阵

首先上代码

// 投影矩阵 
// WebGPURenderer.setProjectionMatrix
protected setProjectionMatrix(): void {
  const width = this.canvasEle.width
  const height = this.canvasEle.height

  const scaleX = (1 / width) * 2
  const scaleY = (1 / height) * 2

  this.device.queue.writeBuffer(
    this.projectionMatBuffer,
    0,
    new Float32Array([
      scaleX, 0,       0, 0, // 矩阵第一列 
      0,      -scaleY, 0, 0, // 矩阵第二列
      -1,     1,       1, 0  // 矩阵第三列
    ])
  )
}

// stage的变换矩阵 
// WebGPURenderer.setRootTransform
protected setRootTransform(
  a: number,
  b: number,
  c: number,
  d: number,
  tx: number,
  ty: number
): void {
  this.device.queue.writeBuffer(
    this.stageMatBuffer,
    0,
    new Float32Array([
      a,  b,  0, 0, // 矩阵第一列 
      c,  d,  0, 0, // 矩阵第二列
      tx, ty, 1, 0  // 矩阵第三列
    ])
  )
}

这一部分其实跟webGL是一样的逻辑，也就是往uniform变量里面写一段数据而已，但是在webGPU里有一个大坑，webGPU的uniform变量的内存布局和webGL的是不一样的，webGPU存在一种叫做对准要求(alignment requirement)的东西(可以参考这里👉webGPU内存布局)，它要求数据在内存中必须是对齐的，这里有一个对齐尺寸，就算变量的空间没有达到这个尺寸，这个变量也要占据这么多的空间，也就是说有一些空间是‘浪费’了的。

以上面的投影矩阵为例，这是一个uniform变量，它是一个3x3的float32矩阵，它在内存中的布局实际上是这样的：

虽然它的大小是3x3也就是9个float32的空间，但是实际上它要占据3x4也就是12个float32的空间，这个时候我们写入uniform变量就不能写9个float32了，我们需要写入12个float32，每4个float32的最后一个都是‘浪费’了的，它并不会被webGPU使用，所以我们可以随便填一个值，在上面我填了0，实际上大家填100，10000，都是无所谓的，这一位的目的主要是为了填充内存空隙，不然的话webGPU会读取到错误的数字。

好在vertex buffer里面没有这个内存布局的问题，我们可以放心写入数据。

4.3 绘制

在绘制方面，webGPU采用了command buffer的模式来进行这个过程，步骤比较繁琐，相较于webGL会比较长。代码如下：

// WebGPURenderer.draw
protected draw(): void {
  const {
    device,
    renderPassDescriptor,
    gpuVertexBuffer,
    gpuIndexBuffer,
    gpu,
    uniformBindGroup,
    indexCount,
    pipeline
  } = this

  const commandEncoder = device.createCommandEncoder()
  
  renderPassDescriptor.colorAttachments[0].view = gpu
    .getCurrentTexture()
    .createView()

  const renderPass = commandEncoder.beginRenderPass(renderPassDescriptor)

  renderPass.setPipeline(pipeline)

  renderPass.setVertexBuffer(0, gpuVertexBuffer)
  renderPass.setIndexBuffer(gpuIndexBuffer, 'uint32')

  renderPass.setBindGroup(0, uniformBindGroup)

  renderPass.drawIndexed(indexCount)

  renderPass.end()

  const commandBuffer = commandEncoder.finish()

  device.queue.submit([commandBuffer])
}

5. 结语

WebGPU是一个新东西，它更加‘现代’，但是核心逻辑和WebGL是相通的，如果你已经非常熟悉WebGL了，那么上手WebGPU一定也会非常容易，并且作者也非常推荐大家去学习WebGPU。

谢谢大家的观看🙏，如果觉得本文还不错，就点个赞吧👍，作者需要你的鼓励❤️。