高斯模糊的线性采样优化

126 阅读23分钟

高斯模糊是图形渲染中最常用的后处理效果之一,用于实现景深、辉光、毛玻璃等视觉效果。实现高斯模糊有两个关键优化:

  1. 二维卷积 → 可分离卷积:利用高斯函数的可分离性,将 O(n²) 降至 O(2n)
  2. 可分离卷积 → 线性采样优化:利用 GPU 纹理插值特性,将采样次数再减半

本文将详细讲解高斯模糊的数学原理、线性采样优化方法,并通过 JavaScript 模拟和完整的 WebGL 示例展示实现过程。

高斯模糊的数学原理

要理解线性采样优化,首先需要搞清楚高斯模糊到底在做什么。简单来说,模糊就是让一个像素受周围像素影响,但关键问题是:如何决定每个周围像素的影响程度?

从问题出发:为什么要用高斯函数

假设我们要模糊一个像素,最简单的做法是把它和周围像素取平均值。比如周围有 8 个像素,就把 9 个像素加起来除以 9。这种方法叫均值模糊(Box Blur),但效果很生硬——想象一下,距离你 1 米和距离你 10 米的物体,难道对你的影响是一样的吗?

高斯模糊的核心思想是:距离越近,影响越大;距离越远,影响越小;而且这个影响是平滑过渡的。高斯函数正好符合这个要求。

一维高斯函数的形式是:

function gaussian1D(x, sigma) {
  return (1 / (Math.sqrt(2 * Math.PI) * sigma)) * Math.exp(-(x * x) / (2 * sigma * sigma));
}

这里 x 是距离中心的偏移量,sigma 是标准差。当 x = 0(中心点)时函数值最大,随着 |x| 增大函数值呈指数衰减。

对于二维图像,我们需要二维高斯函数:

function gaussian2D(x, y, sigma) {
  return (1 / (2 * Math.PI * sigma * sigma)) * Math.exp(-(x * x + y * y) / (2 * sigma * sigma));
}

但这里有个非常重要的性质:二维高斯可以分解为两个一维高斯的乘积,即 G(x, y) = G(x) * G(y)。这个特性叫作可分离性(Separability),后面会看到它如何帮助我们优化性能。

从函数到数组:生成卷积核

知道了高斯函数,下一个问题是:如何在离散的像素网格上应用它?

图像不是连续的数学平面,而是由一个个独立像素组成的网格。我们需要把连续的高斯函数"离散化"成一个具体的权重数组,这个数组就叫卷积核(Kernel)。

卷积核的大小通常是 2 * radius + 1。比如 radius = 3,卷积核就有 7 个元素,对应中心像素左右各 3 个像素的权重:

function generateGaussianKernel(radius, sigma) {
  const kernelSize = 2 * radius + 1;
  const kernel = new Array(kernelSize);
  let sum = 0;

  // 计算每个位置的权重
  for (let i = 0; i < kernelSize; i++) {
    const x = i - radius; // 偏移量:-radius 到 +radius
    kernel[i] = gaussian1D(x, sigma);
    sum += kernel[i];
  }

  // 归一化:让所有权重之和为 1
  for (let i = 0; i < kernelSize; i++) {
    kernel[i] /= sum;
  }

  return kernel;
}

// 生成半径为 3、标准差为 1.5 的卷积核
const kernel = generateGaussianKernel(3, 1.5);
console.log(kernel);
// [0.015, 0.094, 0.235, 0.312, 0.235, 0.094, 0.015]
// 可以看到中心权重 0.312 最大,向两边对称递减

这里有两个关键步骤需要解释。

**第一,为什么要遍历计算权重?**因为图像是离散的,我们必须为每个整数偏移位置计算一个具体的权重值。

**第二,为什么要归一化?**注意代码中最后一步把所有权重都除以它们的总和。这样做是为了保证 sum(weights) = 1。如果不归一化会怎样?假设权重之和是 1.2,那模糊后每个像素值都会乘以 1.2,整个图像会变亮;如果权重之和是 0.8,图像会变暗。归一化确保模糊前后图像的整体亮度不变。

标准差的作用:控制模糊强度

前面提到的 sigma(标准差)是控制模糊强度的关键参数。它决定了高斯分布的"胖瘦":

  • sigma 小:高斯曲线陡峭,权重集中在中心附近 → 模糊范围小,效果轻微
  • sigma 大:高斯曲线平缓,远处仍有较大权重 → 模糊范围大,效果明显
sigma效果典型应用
0.5 - 1.0轻微模糊抗锯齿、降噪
1.5 - 3.0中等模糊常规模糊效果
3.0+强烈模糊背景虚化、艺术滤镜

实践中,radiussigma 通常有个经验关系:

radius = Math.ceil(sigma * 3);

为什么是 3 倍?因为在高斯分布中,距离中心 3 * sigma 的范围覆盖了 99.7% 的分布区域。超出这个范围的权重已经小到可以忽略,继续增大 radius 只会浪费计算,而不会明显改善效果。

传统高斯模糊的实现

有了卷积核,下一步就是如何用它来处理图像。最直接的做法是二维卷积,但这样做性能很差。利用高斯函数的可分离性,我们可以把二维卷积拆分成两次一维卷积,大幅降低计算量。

二维卷积:最直接但最慢的方法

假设我们有一个 5x5 的二维卷积核(radius = 2),要模糊一个像素,需要对周围 25 个像素进行加权求和:

function blur2D(imageData, x, y, kernel2D, radius) {
  let r = 0,
    g = 0,
    b = 0;
  const width = imageData.width;
  const data = imageData.data;

  // 遍历卷积核覆盖的所有像素
  for (let ky = -radius; ky <= radius; ky++) {
    for (let kx = -radius; kx <= radius; kx++) {
      const px = x + kx;
      const py = y + ky;

      // 获取像素索引
      const idx = (py * width + px) * 4;

      // 获取卷积核权重
      const weight = kernel2D[ky + radius][kx + radius];

      // 加权累加
      r += data[idx] * weight;
      g += data[idx + 1] * weight;
      b += data[idx + 2] * weight;
    }
  }

  return [r, g, b];
}

这个方法的问题很明显:对于半径为 r 的卷积核,每个像素需要采样 (2r + 1)²。如果 r = 10,就需要 441 次采样!对于一张 1920x1080 的图片,总计算量是天文数字。

可分离卷积:性能的关键突破

还记得前面提到的高斯函数可分离性吗?G(x, y) = G(x) * G(y) 这个性质让我们可以把二维卷积拆分成两次一维卷积:

  1. 第一遍(水平方向):对每一行应用一维卷积核
  2. 第二遍(垂直方向):对第一遍的结果,对每一列应用一维卷积核

为什么这样能提高性能?让我们算一笔账:

  • 二维卷积:每个像素采样 (2r + 1)²
  • 可分离卷积:每个像素在两次一维卷积中分别采样 (2r + 1) 次,总共 2 * (2r + 1)

对比:

半径 r二维卷积采样次数可分离卷积采样次数性能提升
349143.5x
5121225.5x
104414210.5x
2016818220.5x

半径越大,可分离卷积的优势越明显。这就是为什么实际应用中几乎从不使用二维卷积。

代码实现:水平和垂直两次模糊

下面是可分离卷积的完整实现:

// 水平方向模糊
function blurHorizontal(imageData, kernel, radius) {
  const width = imageData.width;
  const height = imageData.height;
  const data = imageData.data;
  const output = new Uint8ClampedArray(data.length);

  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      let r = 0,
        g = 0,
        b = 0,
        a = 0;

      // 对水平方向的像素加权求和
      for (let kx = -radius; kx <= radius; kx++) {
        const px = Math.min(Math.max(x + kx, 0), width - 1); // 边界处理
        const idx = (y * width + px) * 4;
        const weight = kernel[kx + radius];

        r += data[idx] * weight;
        g += data[idx + 1] * weight;
        b += data[idx + 2] * weight;
        a += data[idx + 3] * weight;
      }

      const outIdx = (y * width + x) * 4;
      output[outIdx] = r;
      output[outIdx + 1] = g;
      output[outIdx + 2] = b;
      output[outIdx + 3] = a;
    }
  }

  return new ImageData(output, width, height);
}

// 垂直方向模糊
function blurVertical(imageData, kernel, radius) {
  const width = imageData.width;
  const height = imageData.height;
  const data = imageData.data;
  const output = new Uint8ClampedArray(data.length);

  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      let r = 0,
        g = 0,
        b = 0,
        a = 0;

      // 对垂直方向的像素加权求和
      for (let ky = -radius; ky <= radius; ky++) {
        const py = Math.min(Math.max(y + ky, 0), height - 1); // 边界处理
        const idx = (py * width + x) * 4;
        const weight = kernel[ky + radius];

        r += data[idx] * weight;
        g += data[idx + 1] * weight;
        b += data[idx + 2] * weight;
        a += data[idx + 3] * weight;
      }

      const outIdx = (y * width + x) * 4;
      output[outIdx] = r;
      output[outIdx + 1] = g;
      output[outIdx + 2] = b;
      output[outIdx + 3] = a;
    }
  }

  return new ImageData(output, width, height);
}

// 完整的高斯模糊
function gaussianBlur(imageData, radius, sigma) {
  const kernel = generateGaussianKernel(radius, sigma);
  const temp = blurHorizontal(imageData, kernel, radius);
  return blurVertical(temp, kernel, radius);
}

注意代码中的边界处理:Math.min(Math.max(x + kx, 0), width - 1) 确保采样位置不会超出图像范围,超出时使用边缘像素值。

可分离卷积的局限

虽然可分离卷积已经大幅提升了性能,但每个像素仍然需要 2 * (2r + 1) 次采样。当半径较大时(比如 r = 20,需要 82 次采样),性能依然是个问题,尤其是在实时渲染场景中。

这就引出了本文的核心:线性采样优化。通过利用 GPU 的纹理插值特性,我们可以把采样次数再减少一半,在几乎不损失质量的前提下进一步提升性能。

线性采样优化技术

可分离卷积已经把性能提升了一个数量级,但还能更快吗?线性采样优化利用 GPU 纹理插值的硬件特性,把相邻的两次采样合并成一次,再次将采样次数减半。

GPU 的纹理线性插值

在 GPU 上读取纹理时,如果采样坐标是小数(比如 2.3),GPU 会自动做线性插值:

// 采样位置 2.3
result = texel[2] * 0.7 + texel[3] * 0.3

这里 0.7 和 0.3 是根据小数部分(0.3)自动计算的插值权重。关键是:这个插值操作由硬件完成,几乎不耗时

核心思路:让 GPU 帮我们加权

传统方法中,我们需要自己对相邻两个像素分别采样并加权:

// 传统方式:两次采样
color = texture(pos + 1) * weight1 + texture(pos + 2) * weight2;

如果我们能找到一个位置 offset,让 GPU 的自动插值恰好等于我们想要的加权结果,就可以省下一次采样:

// 优化方式:一次采样
color = texture(pos + offset) * combinedWeight;

数学推导:如何计算新的偏移和权重

假设我们要合并两个相邻采样点:

  • 位置 k,权重 w1
  • 位置 k+1,权重 w2

传统方法的结果是:

result = pixel[k] * w1 + pixel[k+1] * w2

现在我们在位置 k + offset(其中 0 < offset < 1)采样一次,GPU 会自动插值:

sample = pixel[k] * (1 - offset) + pixel[k+1] * offset

要让 sample * newWeight 等于原来的结果,我们需要:

pixel[k] * w1 + pixel[k+1] * w2 = (pixel[k] * (1 - offset) + pixel[k+1] * offset) * newWeight

展开右边:

pixel[k] * w1 + pixel[k+1] * w2 = pixel[k] * (1 - offset) * newWeight + pixel[k+1] * offset * newWeight

对比系数,得到:

w1 = (1 - offset) * newWeight
w2 = offset * newWeight

从这两个方程解出:

newWeight = w1 + w2
offset = w2 / (w1 + w2)

结论

  • 新权重 = 两个原权重之和
  • 新偏移 = 第二个权重占比

具体示例:7 点卷积核的优化

假设我们有一个 radius=3 的高斯卷积核:

const kernel = [0.015, 0.094, 0.235, 0.312, 0.235, 0.094, 0.015];
//            偏移: -3     -2     -1      0     +1     +2     +3

我们从左到右依次两两合并相邻的采样点:

pair 1: 偏移 -3 和 -2

w1 = 0.015, w2 = 0.094
newWeight = 0.015 + 0.094 = 0.109
offset = -3 + 0.094 / 0.109 = -3 + 0.862 = -2.138

pair 2: 偏移 -1 和 0

w1 = 0.235, w2 = 0.312
newWeight = 0.235 + 0.312 = 0.547
offset = -1 + 0.312 / 0.547 = -1 + 0.570 = -0.430

pair 3: 偏移 +1 和 +2

w1 = 0.235, w2 = 0.094
newWeight = 0.235 + 0.094 = 0.329
offset = +1 + 0.094 / 0.329 = +1 + 0.286 = +1.286

剩余: 偏移 +3(单独保留)

offset = +3.0;
weight = 0.015;

优化后的采样方案:

[
  { offset: -2.138, weight: 0.109 },
  { offset: -0.43, weight: 0.547 },
  { offset: +1.286, weight: 0.329 },
  { offset: +3.0, weight: 0.015 },
];

验证总权重:0.109 + 0.547 + 0.329 + 0.015 = 1.0

从 7 次采样减少到 4 次!

像素计算示例

现在用具体像素值验证:

// 一行像素(单通道,简化示例)
const pixels = [10, 20, 30, 50, 70, 80, 90];
// 索引:         0   1   2   3   4   5   6
// 相对偏移:    -3  -2  -1   0  +1  +2  +3

传统方法(7 次采样):

result = pixel[-3] * 0.015
       + pixel[-2] * 0.094
       + pixel[-1] * 0.235
       + pixel[0] * 0.312
       + pixel[+1] * 0.235
       + pixel[+2] * 0.094
       + pixel[+3] * 0.015

       = 10 * 0.015 + 20 * 0.094 + 30 * 0.235 + 50 * 0.312
       + 70 * 0.235 + 80 * 0.094 + 90 * 0.015

       = 0.15 + 1.88 + 7.05 + 15.6 + 16.45 + 7.52 + 1.35
       = 50.0

线性采样优化(4 次采样):

// 采样 1: offset=-2.138
// 在相对偏移 -3 和 -2 之间插值
// -2.138 = -3 + 0.862,小数部分 0.862
sample1 = pixel[-3] * (1 - 0.862) + pixel[-2] * 0.862
        = 10 * 0.138 + 20 * 0.862
        = 1.38 + 17.24 = 18.62

// 采样 2: offset=-0.430
// 在相对偏移 -1 和 0 之间插值
// -0.430 = -1 + 0.570,小数部分 0.570
sample2 = pixel[-1] * (1 - 0.570) + pixel[0] * 0.570
        = 30 * 0.430 + 50 * 0.570
        = 12.9 + 28.5 = 41.4

// 采样 3: offset=+1.286
// 在相对偏移 +1 和 +2 之间插值
// +1.286 = +1 + 0.286,小数部分 0.286
sample3 = pixel[+1] * (1 - 0.286) + pixel[+2] * 0.286
        = 70 * 0.714 + 80 * 0.286
        = 49.98 + 22.88 = 72.86

// 采样 4: offset=+3.0
// 在相对偏移 +3(整数位置,无需插值)
sample4 = pixel[+3] = 90

// 应用新权重
result = sample1 * 0.109
       + sample2 * 0.547
       + sample3 * 0.329
       + sample4 * 0.015

       = 18.62 * 0.109 + 41.4 * 0.547 + 72.86 * 0.329 + 90 * 0.015
       = 2.03 + 22.65 + 23.97 + 1.35
       = 50.0

结果完全一致!这证明了线性采样优化在数学上是精确的。

代码实现

将卷积核转换为优化的采样方案:

function optimizeKernel(kernel) {
  const radius = Math.floor(kernel.length / 2);
  const optimized = [];

  // 两两合并
  for (let i = 0; i < kernel.length; i += 2) {
    if (i + 1 < kernel.length) {
      const w1 = kernel[i];
      const w2 = kernel[i + 1];
      const wSum = w1 + w2;

      // 计算偏移(相对于中心)
      const baseOffset = i - radius;
      const offset = baseOffset + w2 / wSum;

      optimized.push({ offset, weight: wSum });
    } else {
      // 奇数个元素,最后一个单独
      optimized.push({
        offset: i - radius,
        weight: kernel[i],
      });
    }
  }

  return optimized;
}

性能提升

半径传统采样次数线性采样优化提升
3741.75x
51161.83x
1021111.91x
2041211.95x

配合可分离卷积,从原始二维卷积到线性采样优化的总提升:

半径二维卷积最终优化总提升
34986.1x
104412220x
2016814240x

核心算法函数

在展示完整代码前,先看关键函数的实现。

生成高斯卷积核

无论传统方法还是优化方法,都需要先生成高斯卷积核:

function generateGaussianKernel(radius, sigma) {
  const kernelSize = 2 * radius + 1;
  const kernel = new Array(kernelSize);
  let sum = 0;

  // 计算每个位置的高斯权重
  for (let i = 0; i < kernelSize; i++) {
    const x = i - radius;
    kernel[i] = Math.exp(-(x * x) / (2 * sigma * sigma));
    sum += kernel[i];
  }

  // 归一化:确保权重和为 1
  for (let i = 0; i < kernelSize; i++) {
    kernel[i] /= sum;
  }

  return kernel;
}

传统可分离卷积(水平方向)

传统方法直接使用卷积核进行加权求和:

function blurHorizontal(imageData, kernel, radius) {
  const { width, height, data } = imageData;
  const output = new Uint8ClampedArray(data.length);

  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      let r = 0,
        g = 0,
        b = 0,
        a = 0;

      // 遍历 2*radius+1 个采样点
      for (let kx = -radius; kx <= radius; kx++) {
        const px = Math.min(Math.max(x + kx, 0), width - 1);
        const idx = (y * width + px) * 4;
        const weight = kernel[kx + radius];

        r += data[idx] * weight;
        g += data[idx + 1] * weight;
        b += data[idx + 2] * weight;
        a += data[idx + 3] * weight;
      }

      const outIdx = (y * width + x) * 4;
      output[outIdx] = r;
      output[outIdx + 1] = g;
      output[outIdx + 2] = b;
      output[outIdx + 3] = a;
    }
  }

  return new ImageData(output, width, height);
}

垂直方向类似,只是改变采样方向。

优化卷积核转换

将传统卷积核转换为优化的采样方案:

function optimizeKernel(kernel) {
  const radius = Math.floor(kernel.length / 2);
  const optimized = [];

  // 两两合并相邻采样点
  for (let i = 0; i < kernel.length; i += 2) {
    if (i + 1 < kernel.length) {
      const w1 = kernel[i];
      const w2 = kernel[i + 1];
      const wSum = w1 + w2;
      const baseOffset = i - radius;
      // 计算新的偏移位置(利用 GPU 线性插值)
      const offset = baseOffset + w2 / wSum;

      optimized.push({ offset, weight: wSum });
    } else {
      // 奇数个元素,最后一个单独保留
      optimized.push({
        offset: i - radius,
        weight: kernel[i],
      });
    }
  }

  return optimized;
}

线性采样优化(水平方向)

使用优化后的采样方案,模拟 GPU 线性插值:

function blurHorizontalOptimized(imageData, optimized) {
  const { width, height, data } = imageData;
  const output = new Uint8ClampedArray(data.length);

  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      let r = 0,
        g = 0,
        b = 0,
        a = 0;

      // 遍历优化后的采样点(约为原来的一半)
      for (const { offset, weight } of optimized) {
        // 模拟 GPU 线性插值
        const samplePos = x + offset;
        const px1 = Math.floor(samplePos);
        const px2 = Math.ceil(samplePos);
        const frac = samplePos - px1;

        const px1Clamped = Math.min(Math.max(px1, 0), width - 1);
        const px2Clamped = Math.min(Math.max(px2, 0), width - 1);

        const idx1 = (y * width + px1Clamped) * 4;
        const idx2 = (y * width + px2Clamped) * 4;

        // 线性插值
        const sr = data[idx1] * (1 - frac) + data[idx2] * frac;
        const sg = data[idx1 + 1] * (1 - frac) + data[idx2 + 1] * frac;
        const sb = data[idx1 + 2] * (1 - frac) + data[idx2 + 2] * frac;
        const sa = data[idx1 + 3] * (1 - frac) + data[idx2 + 3] * frac;

        r += sr * weight;
        g += sg * weight;
        b += sb * weight;
        a += sa * weight;
      }

      const outIdx = (y * width + x) * 4;
      output[outIdx] = r;
      output[outIdx + 1] = g;
      output[outIdx + 2] = b;
      output[outIdx + 3] = a;
    }
  }

  return new ImageData(output, width, height);
}

WebGL 着色器对比

传统方法着色器

precision mediump float;
uniform sampler2D u_texture;
uniform float u_texelSize;
uniform float u_kernel[63];  // 固定大小数组
uniform int u_radius;
varying vec2 v_texCoord;

void main() {
  vec4 color = vec4(0.0);
  // 遍历所有采样点
  for (int i = -31; i <= 31; i++) {
    int absI = i < 0 ? -i : i;
    if (absI > u_radius) continue;
    float weight = u_kernel[i + 31];
    vec2 offset = vec2(float(i) * u_texelSize, 0.0);
    color += texture2D(u_texture, v_texCoord + offset) * weight;
  }
  gl_FragColor = color;
}

优化方法着色器

precision mediump float;
uniform sampler2D u_texture;
uniform float u_texelSize;
uniform float u_offsets[32];  // 固定大小数组
uniform float u_weights[32];
uniform int u_sampleCount;
varying vec2 v_texCoord;

void main() {
  vec4 color = vec4(0.0);
  // 遍历优化后的采样点(约为原来的一半)
  for (int i = 0; i < 32; i++) {
    if (i >= u_sampleCount) break;
    float offset = u_offsets[i];
    float weight = u_weights[i];
    vec2 tc = vec2(v_texCoord.x + offset * u_texelSize, v_texCoord.y);
    // GPU 自动线性插值
    color += texture2D(u_texture, tc) * weight;
  }
  gl_FragColor = color;
}

关键区别:优化方法使用非整数的 offset,GPU 的 texture2D 会自动进行硬件线性插值,无需手动计算。

完整示例代码

下面的完整示例展示了四种高斯模糊实现的性能对比:

  • Canvas 2D 传统:CPU 可分离卷积实现
  • Canvas 2D 优化:CPU 线性采样优化实现
  • WebGL 传统:GPU 可分离卷积实现
  • WebGL 优化:GPU 线性采样优化实现
<!DOCTYPE html>
<html lang="zh-CN">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>高斯模糊对比:Canvas 2D vs WebGL</title>
    <style>
      * {
        margin: 0;
        padding: 0;
        box-sizing: border-box;
      }
      body {
        font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
        background: #f5f5f5;
        padding: 20px;
      }
      .container {
        max-width: 1600px;
        margin: 0 auto;
        background: white;
        padding: 30px;
        border-radius: 8px;
        box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
      }
      h1 {
        margin-bottom: 10px;
        color: #333;
      }
      .description {
        color: #666;
        margin-bottom: 30px;
        line-height: 1.6;
      }
      .controls {
        display: grid;
        grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
        gap: 20px;
        margin-bottom: 30px;
        padding: 20px;
        background: #f9f9f9;
        border-radius: 6px;
      }
      .control-group {
        display: flex;
        flex-direction: column;
        gap: 8px;
      }
      label {
        font-weight: 500;
        color: #555;
        font-size: 14px;
      }
      input[type="file"] {
        padding: 8px;
        border: 2px dashed #ddd;
        border-radius: 4px;
        cursor: pointer;
      }
      input[type="range"] {
        width: 100%;
      }
      .value-display {
        font-size: 16px;
        font-weight: bold;
        color: #007aff;
      }
      button {
        padding: 12px 24px;
        background: #007aff;
        color: white;
        border: none;
        border-radius: 6px;
        font-size: 16px;
        font-weight: 500;
        cursor: pointer;
        transition: background 0.2s;
      }
      button:hover {
        background: #0051d5;
      }
      button:disabled {
        background: #ccc;
        cursor: not-allowed;
      }
      .results {
        display: grid;
        grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
        gap: 20px;
        margin-top: 30px;
      }
      .result-item {
        text-align: center;
      }
      .result-item h3 {
        margin-bottom: 10px;
        color: #333;
        font-size: 16px;
      }
      canvas {
        max-width: 100%;
        border: 1px solid #ddd;
        border-radius: 4px;
        display: block;
        margin: 0 auto;
      }
      .stats {
        margin-top: 10px;
        padding: 10px;
        background: #f9f9f9;
        border-radius: 4px;
        font-size: 13px;
        color: #666;
        text-align: left;
      }
      .stats-highlight {
        color: #007aff;
        font-weight: bold;
      }
    </style>
  </head>
  <body>
    <div class="container">
      <h1>高斯模糊对比:Canvas 2D vs WebGL</h1>
      <p class="description">
        上传图片,调整参数,对比 CPU(Canvas 2D)和 GPU(WebGL)实现的性能。
        每种实现都包含传统可分离卷积和线性采样优化两个版本。
      </p>

      <div class="controls">
        <div class="control-group">
          <label for="imageUpload">上传图片:</label>
          <input type="file" id="imageUpload" accept="image/*" />
        </div>
        <div class="control-group">
          <label for="sigmaRange"> 标准差 (sigma):<span class="value-display" id="sigmaValue">2.0</span> </label>
          <input type="range" id="sigmaRange" min="0.5" max="10" step="0.5" value="2.0" />
        </div>
        <div class="control-group">
          <label for="radiusRange"> 半径 (radius):<span class="value-display" id="radiusValue">6</span> </label>
          <input type="range" id="radiusRange" min="1" max="20" step="1" value="6" />
        </div>
        <div class="control-group" style="align-self: end;">
          <button id="applyButton" disabled>应用模糊</button>
        </div>
      </div>

      <div class="results">
        <div class="result-item">
          <h3>原图</h3>
          <canvas id="originalCanvas"></canvas>
        </div>
        <div class="result-item">
          <h3>Canvas 2D 传统</h3>
          <canvas id="canvas2dTraditional"></canvas>
          <div class="stats">
            <div>采样次数:<span class="stats-highlight" id="c2dTradSamples">-</span></div>
            <div>耗时:<span class="stats-highlight" id="c2dTradTime">-</span></div>
          </div>
        </div>
        <div class="result-item">
          <h3>Canvas 2D 优化</h3>
          <canvas id="canvas2dOptimized"></canvas>
          <div class="stats">
            <div>采样次数:<span class="stats-highlight" id="c2dOptSamples">-</span></div>
            <div>耗时:<span class="stats-highlight" id="c2dOptTime">-</span></div>
            <div>性能提升:<span class="stats-highlight" id="c2dSpeedup">-</span></div>
          </div>
        </div>
        <div class="result-item">
          <h3>WebGL 传统</h3>
          <canvas id="webglTraditional"></canvas>
          <div class="stats">
            <div>采样次数:<span class="stats-highlight" id="webglTradSamples">-</span></div>
            <div>耗时:<span class="stats-highlight" id="webglTradTime">-</span></div>
          </div>
        </div>
        <div class="result-item">
          <h3>WebGL 优化</h3>
          <canvas id="webglOptimized"></canvas>
          <div class="stats">
            <div>采样次数:<span class="stats-highlight" id="webglOptSamples">-</span></div>
            <div>耗时:<span class="stats-highlight" id="webglOptTime">-</span></div>
            <div>性能提升:<span class="stats-highlight" id="webglSpeedup">-</span></div>
          </div>
        </div>
      </div>
    </div>

    <script>
      // ==================== 通用函数 ====================

      function generateGaussianKernel(radius, sigma) {
        const kernelSize = 2 * radius + 1;
        const kernel = new Array(kernelSize);
        let sum = 0;
        for (let i = 0; i < kernelSize; i++) {
          const x = i - radius;
          kernel[i] = Math.exp(-(x * x) / (2 * sigma * sigma));
          sum += kernel[i];
        }
        for (let i = 0; i < kernelSize; i++) {
          kernel[i] /= sum;
        }
        return kernel;
      }

      function optimizeKernel(kernel) {
        const radius = Math.floor(kernel.length / 2);
        const optimized = [];
        for (let i = 0; i < kernel.length; i += 2) {
          if (i + 1 < kernel.length) {
            const w1 = kernel[i];
            const w2 = kernel[i + 1];
            const wSum = w1 + w2;
            const baseOffset = i - radius;
            const offset = baseOffset + w2 / wSum;
            optimized.push({ offset, weight: wSum });
          } else {
            optimized.push({ offset: i - radius, weight: kernel[i] });
          }
        }
        return optimized;
      }

      // ==================== Canvas 2D 实现 ====================

      function blurHorizontal(imageData, kernel, radius) {
        const { width, height, data } = imageData;
        const output = new Uint8ClampedArray(data.length);
        for (let y = 0; y < height; y++) {
          for (let x = 0; x < width; x++) {
            let r = 0,
              g = 0,
              b = 0,
              a = 0;
            for (let kx = -radius; kx <= radius; kx++) {
              const px = Math.min(Math.max(x + kx, 0), width - 1);
              const idx = (y * width + px) * 4;
              const weight = kernel[kx + radius];
              r += data[idx] * weight;
              g += data[idx + 1] * weight;
              b += data[idx + 2] * weight;
              a += data[idx + 3] * weight;
            }
            const outIdx = (y * width + x) * 4;
            output[outIdx] = r;
            output[outIdx + 1] = g;
            output[outIdx + 2] = b;
            output[outIdx + 3] = a;
          }
        }
        return new ImageData(output, width, height);
      }

      function blurVertical(imageData, kernel, radius) {
        const { width, height, data } = imageData;
        const output = new Uint8ClampedArray(data.length);
        for (let y = 0; y < height; y++) {
          for (let x = 0; x < width; x++) {
            let r = 0,
              g = 0,
              b = 0,
              a = 0;
            for (let ky = -radius; ky <= radius; ky++) {
              const py = Math.min(Math.max(y + ky, 0), height - 1);
              const idx = (py * width + x) * 4;
              const weight = kernel[ky + radius];
              r += data[idx] * weight;
              g += data[idx + 1] * weight;
              b += data[idx + 2] * weight;
              a += data[idx + 3] * weight;
            }
            const outIdx = (y * width + x) * 4;
            output[outIdx] = r;
            output[outIdx + 1] = g;
            output[outIdx + 2] = b;
            output[outIdx + 3] = a;
          }
        }
        return new ImageData(output, width, height);
      }

      function blurHorizontalOptimized(imageData, optimized) {
        const { width, height, data } = imageData;
        const output = new Uint8ClampedArray(data.length);
        for (let y = 0; y < height; y++) {
          for (let x = 0; x < width; x++) {
            let r = 0,
              g = 0,
              b = 0,
              a = 0;
            for (const { offset, weight } of optimized) {
              const samplePos = x + offset;
              const px1 = Math.floor(samplePos);
              const px2 = Math.ceil(samplePos);
              const frac = samplePos - px1;
              const px1Clamped = Math.min(Math.max(px1, 0), width - 1);
              const px2Clamped = Math.min(Math.max(px2, 0), width - 1);
              const idx1 = (y * width + px1Clamped) * 4;
              const idx2 = (y * width + px2Clamped) * 4;
              const sr = data[idx1] * (1 - frac) + data[idx2] * frac;
              const sg = data[idx1 + 1] * (1 - frac) + data[idx2 + 1] * frac;
              const sb = data[idx1 + 2] * (1 - frac) + data[idx2 + 2] * frac;
              const sa = data[idx1 + 3] * (1 - frac) + data[idx2 + 3] * frac;
              r += sr * weight;
              g += sg * weight;
              b += sb * weight;
              a += sa * weight;
            }
            const outIdx = (y * width + x) * 4;
            output[outIdx] = r;
            output[outIdx + 1] = g;
            output[outIdx + 2] = b;
            output[outIdx + 3] = a;
          }
        }
        return new ImageData(output, width, height);
      }

      function blurVerticalOptimized(imageData, optimized) {
        const { width, height, data } = imageData;
        const output = new Uint8ClampedArray(data.length);
        for (let y = 0; y < height; y++) {
          for (let x = 0; x < width; x++) {
            let r = 0,
              g = 0,
              b = 0,
              a = 0;
            for (const { offset, weight } of optimized) {
              const samplePos = y + offset;
              const py1 = Math.floor(samplePos);
              const py2 = Math.ceil(samplePos);
              const frac = samplePos - py1;
              const py1Clamped = Math.min(Math.max(py1, 0), height - 1);
              const py2Clamped = Math.min(Math.max(py2, 0), height - 1);
              const idx1 = (py1Clamped * width + x) * 4;
              const idx2 = (py2Clamped * width + x) * 4;
              const sr = data[idx1] * (1 - frac) + data[idx2] * frac;
              const sg = data[idx1 + 1] * (1 - frac) + data[idx2 + 1] * frac;
              const sb = data[idx1 + 2] * (1 - frac) + data[idx2 + 2] * frac;
              const sa = data[idx1 + 3] * (1 - frac) + data[idx2 + 3] * frac;
              r += sr * weight;
              g += sg * weight;
              b += sb * weight;
              a += sa * weight;
            }
            const outIdx = (y * width + x) * 4;
            output[outIdx] = r;
            output[outIdx + 1] = g;
            output[outIdx + 2] = b;
            output[outIdx + 3] = a;
          }
        }
        return new ImageData(output, width, height);
      }

      // ==================== WebGL 实现 ====================

      const vertexShaderSource = `
      attribute vec2 a_position;
      attribute vec2 a_texCoord;
      varying vec2 v_texCoord;
      void main() {
        gl_Position = vec4(a_position, 0.0, 1.0);
        v_texCoord = a_texCoord;
      }
    `;

      const traditionalHorizontalShader = `
      precision mediump float;
      uniform sampler2D u_texture;
      uniform float u_texelSize;
      uniform float u_kernel[63];
      uniform int u_radius;
      varying vec2 v_texCoord;
      void main() {
        vec4 color = vec4(0.0);
        for (int i = -31; i <= 31; i++) {
          int absI = i < 0 ? -i : i;
          if (absI > u_radius) continue;
          float weight = u_kernel[i + 31];
          vec2 offset = vec2(float(i) * u_texelSize, 0.0);
          color += texture2D(u_texture, v_texCoord + offset) * weight;
        }
        gl_FragColor = color;
      }
    `;

      const traditionalVerticalShader = `
      precision mediump float;
      uniform sampler2D u_texture;
      uniform float u_texelSize;
      uniform float u_kernel[63];
      uniform int u_radius;
      varying vec2 v_texCoord;
      void main() {
        vec4 color = vec4(0.0);
        for (int i = -31; i <= 31; i++) {
          int absI = i < 0 ? -i : i;
          if (absI > u_radius) continue;
          float weight = u_kernel[i + 31];
          vec2 offset = vec2(0.0, float(i) * u_texelSize);
          color += texture2D(u_texture, v_texCoord + offset) * weight;
        }
        gl_FragColor = color;
      }
    `;

      const optimizedHorizontalShader = `
      precision mediump float;
      uniform sampler2D u_texture;
      uniform float u_texelSize;
      uniform float u_offsets[32];
      uniform float u_weights[32];
      uniform int u_sampleCount;
      varying vec2 v_texCoord;
      void main() {
        vec4 color = vec4(0.0);
        for (int i = 0; i < 32; i++) {
          if (i >= u_sampleCount) break;
          float offset = u_offsets[i];
          float weight = u_weights[i];
          vec2 tc = vec2(v_texCoord.x + offset * u_texelSize, v_texCoord.y);
          color += texture2D(u_texture, tc) * weight;
        }
        gl_FragColor = color;
      }
    `;

      const optimizedVerticalShader = `
      precision mediump float;
      uniform sampler2D u_texture;
      uniform float u_texelSize;
      uniform float u_offsets[32];
      uniform float u_weights[32];
      uniform int u_sampleCount;
      varying vec2 v_texCoord;
      void main() {
        vec4 color = vec4(0.0);
        for (int i = 0; i < 32; i++) {
          if (i >= u_sampleCount) break;
          float offset = u_offsets[i];
          float weight = u_weights[i];
          vec2 tc = vec2(v_texCoord.x, v_texCoord.y + offset * u_texelSize);
          color += texture2D(u_texture, tc) * weight;
        }
        gl_FragColor = color;
      }
    `;

      function createShader(gl, type, source) {
        const shader = gl.createShader(type);
        gl.shaderSource(shader, source);
        gl.compileShader(shader);
        if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
          console.error("Shader error:", gl.getShaderInfoLog(shader));
          gl.deleteShader(shader);
          return null;
        }
        return shader;
      }

      function createProgram(gl, vs, fs) {
        const program = gl.createProgram();
        gl.attachShader(program, vs);
        gl.attachShader(program, fs);
        gl.linkProgram(program);
        if (!gl.getProgramParameter(program, gl.LINK_STATUS)) {
          console.error("Program error:", gl.getProgramInfoLog(program));
          gl.deleteProgram(program);
          return null;
        }
        return program;
      }

      function setupQuad(gl) {
        const positions = new Float32Array([-1, -1, 1, -1, -1, 1, -1, 1, 1, -1, 1, 1]);
        const texCoords = new Float32Array([0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0]);
        const positionBuffer = gl.createBuffer();
        gl.bindBuffer(gl.ARRAY_BUFFER, positionBuffer);
        gl.bufferData(gl.ARRAY_BUFFER, positions, gl.STATIC_DRAW);
        const texCoordBuffer = gl.createBuffer();
        gl.bindBuffer(gl.ARRAY_BUFFER, texCoordBuffer);
        gl.bufferData(gl.ARRAY_BUFFER, texCoords, gl.STATIC_DRAW);
        return { positionBuffer, texCoordBuffer };
      }

      function loadTexture(gl, image) {
        const texture = gl.createTexture();
        gl.bindTexture(gl.TEXTURE_2D, texture);
        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, image);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
        return texture;
      }

      function createFramebuffer(gl, w, h) {
        const fb = gl.createFramebuffer();
        const tex = gl.createTexture();
        gl.bindTexture(gl.TEXTURE_2D, tex);
        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, w, h, 0, gl.RGBA, gl.UNSIGNED_BYTE, null);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
        gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
        gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
        gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
        return { framebuffer: fb, texture: tex };
      }

      class WebGLBlur {
        constructor(canvas) {
          this.canvas = canvas;
          this.gl = canvas.getContext("webgl", { preserveDrawingBuffer: true });
          if (!this.gl) throw new Error("WebGL not supported");
          this.setupPrograms();
          this.quad = setupQuad(this.gl);
        }

        setupPrograms() {
          const gl = this.gl;
          const vs = createShader(gl, gl.VERTEX_SHADER, vertexShaderSource);
          const tradH = createShader(gl, gl.FRAGMENT_SHADER, traditionalHorizontalShader);
          const tradV = createShader(gl, gl.FRAGMENT_SHADER, traditionalVerticalShader);
          const optH = createShader(gl, gl.FRAGMENT_SHADER, optimizedHorizontalShader);
          const optV = createShader(gl, gl.FRAGMENT_SHADER, optimizedVerticalShader);
          this.traditionalHProgram = createProgram(gl, vs, tradH);
          this.traditionalVProgram = createProgram(gl, vs, tradV);
          this.optimizedHProgram = createProgram(gl, vs, optH);
          this.optimizedVProgram = createProgram(gl, vs, optV);
        }

        applyTraditional(texture, w, h, radius, sigma) {
          const gl = this.gl;
          const kernel = generateGaussianKernel(radius, sigma);
          const fb = createFramebuffer(gl, w, h);
          gl.bindFramebuffer(gl.FRAMEBUFFER, fb.framebuffer);
          gl.viewport(0, 0, w, h);
          this.renderPass(this.traditionalHProgram, texture, w, h, kernel, radius, true);
          gl.bindFramebuffer(gl.FRAMEBUFFER, null);
          gl.viewport(0, 0, w, h);
          this.renderPass(this.traditionalVProgram, fb.texture, w, h, kernel, radius, false);
          gl.deleteFramebuffer(fb.framebuffer);
          gl.deleteTexture(fb.texture);
        }

        applyOptimized(texture, w, h, radius, sigma) {
          const gl = this.gl;
          const kernel = generateGaussianKernel(radius, sigma);
          const optimized = optimizeKernel(kernel);
          const fb = createFramebuffer(gl, w, h);
          gl.bindFramebuffer(gl.FRAMEBUFFER, fb.framebuffer);
          gl.viewport(0, 0, w, h);
          this.renderPassOptimized(this.optimizedHProgram, texture, w, h, optimized, true);
          gl.bindFramebuffer(gl.FRAMEBUFFER, null);
          gl.viewport(0, 0, w, h);
          this.renderPassOptimized(this.optimizedVProgram, fb.texture, w, h, optimized, false);
          gl.deleteFramebuffer(fb.framebuffer);
          gl.deleteTexture(fb.texture);
        }

        renderPass(program, texture, w, h, kernel, radius, horizontal) {
          const gl = this.gl;
          gl.useProgram(program);
          const posLoc = gl.getAttribLocation(program, "a_position");
          const texLoc = gl.getAttribLocation(program, "a_texCoord");
          gl.bindBuffer(gl.ARRAY_BUFFER, this.quad.positionBuffer);
          gl.enableVertexAttribArray(posLoc);
          gl.vertexAttribPointer(posLoc, 2, gl.FLOAT, false, 0, 0);
          gl.bindBuffer(gl.ARRAY_BUFFER, this.quad.texCoordBuffer);
          gl.enableVertexAttribArray(texLoc);
          gl.vertexAttribPointer(texLoc, 2, gl.FLOAT, false, 0, 0);
          gl.activeTexture(gl.TEXTURE0);
          gl.bindTexture(gl.TEXTURE_2D, texture);
          gl.uniform1i(gl.getUniformLocation(program, "u_texture"), 0);
          const texelSize = horizontal ? 1.0 / w : 1.0 / h;
          gl.uniform1f(gl.getUniformLocation(program, "u_texelSize"), texelSize);
          gl.uniform1i(gl.getUniformLocation(program, "u_radius"), radius);
          const paddedKernel = new Float32Array(63);
          for (let i = 0; i < kernel.length; i++) {
            paddedKernel[i + 31 - radius] = kernel[i];
          }
          gl.uniform1fv(gl.getUniformLocation(program, "u_kernel"), paddedKernel);
          gl.drawArrays(gl.TRIANGLES, 0, 6);
        }

        renderPassOptimized(program, texture, w, h, optimized, horizontal) {
          const gl = this.gl;
          gl.useProgram(program);
          const posLoc = gl.getAttribLocation(program, "a_position");
          const texLoc = gl.getAttribLocation(program, "a_texCoord");
          gl.bindBuffer(gl.ARRAY_BUFFER, this.quad.positionBuffer);
          gl.enableVertexAttribArray(posLoc);
          gl.vertexAttribPointer(posLoc, 2, gl.FLOAT, false, 0, 0);
          gl.bindBuffer(gl.ARRAY_BUFFER, this.quad.texCoordBuffer);
          gl.enableVertexAttribArray(texLoc);
          gl.vertexAttribPointer(texLoc, 2, gl.FLOAT, false, 0, 0);
          gl.activeTexture(gl.TEXTURE0);
          gl.bindTexture(gl.TEXTURE_2D, texture);
          gl.uniform1i(gl.getUniformLocation(program, "u_texture"), 0);
          const texelSize = horizontal ? 1.0 / w : 1.0 / h;
          gl.uniform1f(gl.getUniformLocation(program, "u_texelSize"), texelSize);
          gl.uniform1i(gl.getUniformLocation(program, "u_sampleCount"), optimized.length);
          const offsets = new Float32Array(32);
          const weights = new Float32Array(32);
          for (let i = 0; i < optimized.length; i++) {
            offsets[i] = optimized[i].offset;
            weights[i] = optimized[i].weight;
          }
          gl.uniform1fv(gl.getUniformLocation(program, "u_offsets"), offsets);
          gl.uniform1fv(gl.getUniformLocation(program, "u_weights"), weights);
          gl.drawArrays(gl.TRIANGLES, 0, 6);
        }
      }

      // ==================== UI 逻辑 ====================

      let currentImage = null;
      let webglTradBlur = null;
      let webglOptBlur = null;

      document.getElementById("sigmaRange").addEventListener("input", (e) => {
        document.getElementById("sigmaValue").textContent = e.target.value;
      });

      document.getElementById("radiusRange").addEventListener("input", (e) => {
        document.getElementById("radiusValue").textContent = e.target.value;
      });

      document.getElementById("imageUpload").addEventListener("change", (e) => {
        const file = e.target.files[0];
        if (!file) return;
        const reader = new FileReader();
        reader.onload = (event) => {
          const img = new Image();
          img.onload = () => {
            currentImage = img;
            displayOriginalImage(img);
            document.getElementById("applyButton").disabled = false;
          };
          img.src = event.target.result;
        };
        reader.readAsDataURL(file);
      });

      function displayOriginalImage(img) {
        const maxSize = 300;
        let w = img.width;
        let h = img.height;
        if (w > maxSize || h > maxSize) {
          const ratio = Math.min(maxSize / w, maxSize / h);
          w = Math.floor(w * ratio);
          h = Math.floor(h * ratio);
        }

        const canvases = [
          "originalCanvas",
          "canvas2dTraditional",
          "canvas2dOptimized",
          "webglTraditional",
          "webglOptimized",
        ];
        canvases.forEach((id) => {
          const c = document.getElementById(id);
          c.width = w;
          c.height = h;
        });

        const ctx = document.getElementById("originalCanvas").getContext("2d");
        ctx.drawImage(img, 0, 0, w, h);

        webglTradBlur = new WebGLBlur(document.getElementById("webglTraditional"));
        webglOptBlur = new WebGLBlur(document.getElementById("webglOptimized"));
      }

      document.getElementById("applyButton").addEventListener("click", () => {
        if (!currentImage) return;
        const sigma = parseFloat(document.getElementById("sigmaRange").value);
        const radius = parseInt(document.getElementById("radiusRange").value);
        const btn = document.getElementById("applyButton");
        btn.disabled = true;
        btn.textContent = "处理中...";
        setTimeout(() => {
          processBlur(sigma, radius);
          btn.disabled = false;
          btn.textContent = "应用模糊";
        }, 10);
      });

      function processBlur(sigma, radius) {
        const originalCanvas = document.getElementById("originalCanvas");
        const w = originalCanvas.width;
        const h = originalCanvas.height;
        const ctx = originalCanvas.getContext("2d");
        const imageData = ctx.getImageData(0, 0, w, h);
        const kernel = generateGaussianKernel(radius, sigma);
        const optimized = optimizeKernel(kernel);

        // Canvas 2D 传统
        const t1 = performance.now();
        const temp1 = blurHorizontal(imageData, kernel, radius);
        const result1 = blurVertical(temp1, kernel, radius);
        const t2 = performance.now();
        const ctx1 = document.getElementById("canvas2dTraditional").getContext("2d");
        ctx1.putImageData(result1, 0, 0);

        // Canvas 2D 优化
        const t3 = performance.now();
        const temp2 = blurHorizontalOptimized(imageData, optimized);
        const result2 = blurVerticalOptimized(temp2, optimized);
        const t4 = performance.now();
        const ctx2 = document.getElementById("canvas2dOptimized").getContext("2d");
        ctx2.putImageData(result2, 0, 0);

        // WebGL 传统
        const tradTexture = loadTexture(webglTradBlur.gl, originalCanvas);
        const t5 = performance.now();
        webglTradBlur.applyTraditional(tradTexture, w, h, radius, sigma);
        webglTradBlur.gl.finish();
        const t6 = performance.now();
        webglTradBlur.gl.deleteTexture(tradTexture);

        // WebGL 优化
        const optTexture = loadTexture(webglOptBlur.gl, originalCanvas);
        const t7 = performance.now();
        webglOptBlur.applyOptimized(optTexture, w, h, radius, sigma);
        webglOptBlur.gl.finish();
        const t8 = performance.now();
        webglOptBlur.gl.deleteTexture(optTexture);

        // 统计信息
        const kernelSize = 2 * radius + 1;
        const tradSamples = kernelSize * 2;
        const optSamples = Math.ceil(kernelSize / 2) * 2;

        document.getElementById("c2dTradSamples").textContent = tradSamples;
        document.getElementById("c2dTradTime").textContent = (t2 - t1).toFixed(2) + " ms";
        document.getElementById("c2dOptSamples").textContent = optSamples;
        document.getElementById("c2dOptTime").textContent = (t4 - t3).toFixed(2) + " ms";
        document.getElementById("c2dSpeedup").textContent = ((t2 - t1) / (t4 - t3)).toFixed(2) + "x";

        document.getElementById("webglTradSamples").textContent = tradSamples;
        document.getElementById("webglTradTime").textContent = (t6 - t5).toFixed(2) + " ms";
        document.getElementById("webglOptSamples").textContent = optSamples;
        document.getElementById("webglOptTime").textContent = (t8 - t7).toFixed(2) + " ms";
        document.getElementById("webglSpeedup").textContent = ((t6 - t5) / (t8 - t7)).toFixed(2) + "x";
      }
    </script>
  </body>
</html>