Depth精度问题分析Depth的精度是图形开发者迟早要面对的痛点。很多文章讨论过这个问题，不同的游戏，引擎，设备中也有

原文：Depth Precision Visualized By Nathan Reed, posted Jul 15 2015 at 03:54PM

本文采用原文与译文对照形式

Depth precision is a pain in the ass that every graphics programmer has to struggle with sooner or later. Many articles and papers have been written on the topic, and a variety of different depth buffer formats and setups are found across different games, engines, and devices.

Depth的精度是图形开发者迟早要面对的痛点。很多文章讨论过这个问题，不同的游戏，引擎，设备中也有多种不同的Depth缓存格式和设置。

Because of the way it interacts with perspective projection, GPU hardware depth mapping is a little recondite and studying the equations may not make things immediately obvious. To get an intuition for how it works, it's helpful to draw some pictures.

由于它与透视投影相互作用的方式，GPU硬件的深度映射变得有点深奥，单纯阅读公式并不能一目了然。为了直观地了解它的工作原理，本文会借助一些图表。

This article has three main parts. In the first part, I try to provide some motivation for nonlinear depth mapping. Second, I present some diagrams to help understand how nonlinear depth mapping works in different situations, intuitively and visually. The third part is a discussion and reproduction of the main results of Tightening the Precision of Perspective Rendering by Paul Upchurch and Mathieu Desbrun (2012), concerning the effects of floating-point roundoff error on depth precision.

本文主要有三个部分。在第一部分中，我尝试解释一些非线性深度映射的基本原理。其次，我绘制了一些图表，以帮助直观理解非线性深度映射如何在不同情况下工作。第三部分是Paul Upchurch和Mathieu Desbrun（2012）关于提高透视渲染精度的主要结果的讨论和实现，其中涉及浮点取整误差对深度精度的影响。

Why 1/z

为什么是 1/z

GPU hardware depth buffers don't typically store a linear representation of the distance an object lies in front of the camera, contrary to what one might naïvely expect when encountering this for the first time. Instead, the depth buffer stores a value proportional to the reciprocal of world-space depth. I want to briefly motivate this convention.

GPU硬件深度缓冲区通常不会存储物体位于摄像机前方的距离的线性表示，这与人们的直觉相反。事实上，深度缓冲区存储的值与世界空间深度的倒数成比例。我想简要地解释下为什么是这样。

In this article, I'll use d to represent the value stored in the depth buffer (in [0, 1]), and z to represent world-space depth, i.e. distance along the view axis, in world units such as meters. In general, the relationship between them is of the form

在本文中，我将使用d来表示深度缓冲区（[0，1]）中存储的值，以z表示世界空间深度，即沿视图轴的距离，使用世界坐标的单位比如“米”。一般来说，它们之间的关系是这样的

where a,b are constants related to the near and far plane settings. In other words, d is always some linear remapping of 1/z.

其中a，b是与近平面和远平面设置有关的常量。换句话说，d总是1 / z的线性重映射。

On the face of it, you can imagine taking d to be any function of z you like. So why this particular choice? There are two main reasons.

从表面上看，你可以想象得到d可以是你喜欢的z的任何函数。那么为什么取1 / z呢？主要有两个原因。

First, 1/z fits naturally into the framework of perspective projections. This is the most general class of transformation that is guaranteed to preserve straight lines—which makes it convenient for hardware rasterization, since straight edges of triangles stay straight in screen space. We can generate linear remappings of 1/z by taking advantage of the perspective divide that the hardware already performs:

首先，1 / z自然适合透视投影的框架。这是保证直线保持直线的最普通的变换类型 - 这使硬件光栅化变得很方便，因为三角形的直边在屏幕空间中保持直线。我们可以利用硬件已经执行的角度划分来生成1 / z的线性重新映射：

The real power in this approach, of course, is that the projection matrix can be multiplied with other matrices, allowing you to combine many transformation stages together in one.

当然，这种方法的真正威力在于投影矩阵可以与其他矩阵相乘，从而允许将多个转换阶段合并为一个。

The second reason is that 1/z is linear in screen space, as noted by Emil Persson. So it's easy to interpolate d across a triangle while rasterizing, and things like hierarchical Z-buffers, early Z-culling, and depth buffer compression are all a lot easier to do.

第二个原因是，Emil Persson指出1 / z在屏幕空间是线性的。所以在栅格化过程中很容易在三角形中插入d，而像分层Z缓冲区，早期Z-culling和深度缓冲区压缩等事情要容易得多。

Graphing Depth Maps

绘制深度贴图

Equations are hard; let's look at some pictures!

公式很难，让我们先从图表开始！

The way to read these graphs is left to right, then down to the bottom. Start with d, plotted on the left axis. Because d can be an arbitrary linear remapping of 1/z, we can place 0 and 1 wherever we wish on this axis. The tick marks indicate distinct depth buffer values. For illustrative purposes, I'm simulating a 4-bit normalized integer depth buffer, so there are 16 evenly-spaced tick marks.

阅读这些图表的方法是从左到右，然后上到下。先从d开始，把d绘制在左侧的轴上。因为d可以是1 / z的任意线性重映射，所以我们可以将0和1放在我们希望的位置上。刻度线表示不同的深度缓冲值。为了便于说明，我模拟了一个4位标准化整数深度缓冲区，所以有16个均匀间隔的刻度标记。

Trace the tick marks horizontally to where they hit the 1/z curve, then down to the bottom axis. That's where the distinct values fall in the world-space depth range.

将刻度线水平跟踪到1 / z曲线的位置，然后下到底部轴。这就是不同的值对应的在世界空间中的深度。

The graph above shows the “standard”, vanilla depth mapping used in D3D and similar APIs. You can immediately see how the 1/z curve leads to bunching up values close to the near plane, and the values close to the far plane are quite spread out.

上图显示了D3D和类似API中使用的“标准”，香草（？）深度映射。您可以立即看到1 / z曲线如何导致靠近近平面的数据聚集，并且靠近远平面的数值相当分散。

It's also easy to see why the near plane has such a profound effect on depth precision. Pulling in the near plane will make the d range skyrocket up toward the asymptote of the 1/z curve, leading to an even more lopsided distribution of values:

这也很容易看出为什么近平面对深度精度有如此大的影响。拉近近平面会使d的范围上升到1 / z曲线的渐近线，导致更加不平衡的数值分布：

Similarly, it's easy to see in this context why pushing the far plane all the way out to infinity doesn't have that much effect. It just means extending the d range slightly down to 1/z=0:

同样，在这种情况下很容易看到为什么将远平面推到无穷远不会产生太大的影响。这只是意味着将d范围稍微降低到1 / z = 0：

What about floating-point depth? The following graph adds tick marks corresponding to a simulated float format with 3 exponent bits and 3 mantissa bits:

浮点深度又有什么影响呢？下面的图表添加了与具有3个指数位和3个尾数位的模拟浮点格式相对应的刻度标记：

There are now 40 distinct values in [0, 1]—quite a bit more than the 16 values previously, but most of them are uselessly bunched up at the near plane where we didn't really need more precision.

现在[0,1]中有40个不同的值比之前的16个值多一点，但其中大多数无用地集中在近平面，在那里我们并不需要更高的精度。

A now-widely-known trick is to reverse the depth range, mapping the near plane to d=1 and the far plane to d=0:

一个广为人知的技巧是颠倒深度范围，将近平面映射到d = 1，将远平面映射到d = 0：

Much better! Now the quasi-logarithmic distribution of floating-point somewhat cancels the 1/z nonlinearity, giving us similar precision at the near plane to an integer depth buffer, and vastly improved precision everywhere else. The precision worsens only very slowly as you move farther out.

好多了！现在，浮点的准对数分布略微抵消了1 / z非线性，使得我们在近平面处获得与整数深度缓冲区类似的精度，并且在其它地方大大提高了精度。当您移出更远时，精度只会非常缓慢地恶化。

The reversed-Z trick has probably been independently reinvented several times, but goes at least as far back as a SIGGRAPH ’99 paper by Eugene Lapidous and Guofang Jiao (no open-access link available, unfortunately). It was more recently re-popularized in blog posts by Matt Pettineo and Brano Kemen, and by Emil Persson's Creating Vast Game Worlds SIGGRAPH 2012 talk.

颠倒过来的技巧可能在历史上已经被独立重新发掘了几次，但至少可以追溯到Eugene Lapidous和Guofang Jiao（不幸的是，没有可用的开放式访问链接）的SIGGRAPH'99论文。它最近在Matt Pettineo和Brano Kemen的博客文章以及Emil Persson的创建Vast Game Worlds SIGGRAPH 2012谈话中被重新推广。

All the previous diagrams assumed [0, 1] as the post-projection depth range, which is the D3D convention. What about OpenGL?

所有前面的图都假定[0，1]作为投影后深度范围，这是D3D惯例。那么OpenGL呢？

OpenGL by default assumes a [-1, 1] post-projection depth range. This doesn't make a difference for integer formats, but with floating-point, all the precision is stuck uselessly in the middle. (The value gets mapped into [0, 1] for storage in the depth buffer later, but that doesn't help, since the initial mapping to [-1, 1] has already destroyed all the precision in the far half of the range.) And by symmetry, the reversed-Z trick will not do anything here.

默认情况下，OpenGL假定[-1,1]投影后深度范围。这对整数格式没有什么影响，但是对于浮点格式，所有精度都会在中间无用地集中。（该值被映射到[0,1]以便稍后存储在深度缓冲区中，但这没有帮助，因为初始映射到[-1,1]已经破坏了范围远半部分的所有精度。）通过对称，逆Z技巧在这里不会有任何作用。

Fortunately, in desktop OpenGL you can fix this with the widely-supported ARB_clip_control extension (now also core in OpenGL 4.5 as glClipControl). Unfortunately, in GL ES you're out of luck.

幸运的是，在桌面OpenGL中，您可以使用广泛支持的ARB_clip_control扩展（现在也作为glClipControl在OpenGL 4.5中作为核心）来解决此问题。不幸的是，在GL ES中，你运气不好。

The Effects of Roundoff Error

取整误差的影响

The 1/z mapping and the choice of float versus integer depth buffer are a big part of the precision story, but not all of it. Even if you have enough depth precision to represent the scene you're trying to render, it's easy to end up with your precision controlled by error in the arithmetic of the vertex transformation process.

1 / z映射和浮点数与整数的选择是整个精确度故事的重要组成部分，但不是全部。即使您具有足够的深度精度来表示要渲染的场景，也很容易导致精度由顶点转换过程的算术中的误差控制。

As mentioned earlier, Upchurch and Desbrun studied this and came up with two main recommendations to minimize roundoff error:

如前所述，Upchurch和Desbrun对此进行了研究，并提出了两项主要建议以最大限度地减少舍入误差：

Use an infinite far plane.

Keep the projection matrix separate from other matrices, and apply it in a separate operation in the vertex shader, rather than composing it into the view matrix.

使用无限远的far plane。
保持投影矩阵与其他矩阵分开，并在顶点着色器中单独操作，而不是将其组合到视图矩阵中。

Upchurch and Desbrun came up with these recommendations through an analytical technique, based on treating roundoff errors as small random perturbations introduced at each arithmetic operation, and keeping track of them to first order through the transformation process. I decided to check the results using direct simulation.

Upchurch和Desbrun通过分析技术提出了这些建议，该分析技术基于将舍入误差视为每次算术操作引入的小随机扰动，并通过转换过程对它们进行第一次跟踪。我决定使用直接模拟来检查结果。

My source code is here—Python 3.4 with numpy. It works by generating a sequence of random points, ordered by depth, spaced either linearly or logarithmically between the near and far planes. Then it passes the points through view and projection matrices and the perspective divide, using 32-bit float precision throughout, and optionally quantizes the final result to 24-bit integer. Finally, it runs through the sequence and counts how many times two adjacent points (which originally had distinct depths) have either become indistiguishable because they mapped to the same depth value, or have actually swapped order. In other words, it measures the rate at which depth comparison errors occur—which corresponds to issues like Z-fighting—under different scenarios.

我的源代码是Python 3.4和numpy。它的工作原理是生成一系列随机点，按深度排序，在近平面和远平面之间以线性或对数间隔排列。然后，它通过视图和投影矩阵以及透视分割来传递点，使用32位浮点精度，并可选地将最终结果量化为24位整数。最后，它遍历序列并计算两个相邻点（最初具有不同深度）已经变得不可分割，因为它们映射到相同的深度值，或者实际上已经交换了顺序。换句话说，它测量在不同情景下发生深度比较错误的速率 - 例如Z-fighting等问题。

Here are the results obtained for near = 0.1, far = 10K, with 10K linearly spaced depths. (I tried logarithmic depth spacing and other near/far ratios as well, and while the detailed numbers varied, the general trends in the results were the same.)

这里是 near= 0.1，far= 10K，10K线性间隔深度获得的结果。（我尝试了对数深度间距和其它near/far比率，虽然详细数字各不相同，但结果的总体趋势是相同的。

In the table, “indist” means indistinguishable (two nearby depths mapped to the same final depth buffer value), and “swap” means that two nearby depths swapped order.

在表格中，“indist”表示无法区分（两个附近深度映射到相同的最终深度缓冲区值），“swap”表示两个附近深度交换顺序。

Apologies for not graphing these, but there are too many dimensions to make it easy to graph! In any case, looking at the numbers, a few general results are clear.

抱歉没有绘图展示，但是无论如何，从数字来看，一些一般规律还是显而易见的。

There is no difference between float and integer depth buffers in most setups. The arithmetic error swamps the quantization error. In part this is because float32 and int24 have almost the same-sized ulp in [0.5, 1] (because float32 has a 23-bit mantissa), so there actually is almost no additional quantization error over the vast majority of the depth range.

In many cases, separating the view and projection matrices (following Upchurch and Desbrun’s recommendation) does make some improvement. While it doesn't lower the overall error rate, it does seem to turn swaps into indistinguishables, which is a step in the right direction.

An infinite far plane makes only a miniscule difference in error rates. Upchurch and Desbrun predicted a 25% reduction in absolute numerical error, but it doesn't seem to translate into a reduced rate of comparison errors.

在大多数设置中，float和integer深度缓冲区之间没有区别。算术错误吞噬量化误差。部分原因是因为float32和int24在[0.5,1]中具有几乎相同大小的ulp（因为float32具有23位尾数），所以在绝大多数深度范围内实际上几乎没有额外的量化误差。
在许多情况下，分离视图和投影矩阵（遵循Upchurch和Desbrun的建议）确实有所改进。虽然它不会降低整体错误率，但它似乎将掉期转换为不可区分，这是向正确方向迈出的一步。
无限远平面在错误率方面只会造成极小的差异。 Upchurch和Desbrun预测绝对数值误差减少25％，但似乎并没有将比较误差率降低。

The above points are practically irrelevant, though, because the real result that matters here is: the reversed-Z mapping is basically magic. Check it out:

上述几点看起来并无作用，但是，这里真正重要的结论是：reversed-Z映射几乎是魔法。让我们一探究竟：

Reversed-Z with a float depth buffer gives a zero error rate in this test. Now, of course you can make it generate some errors if you keep tightening the spacing of the input depth values. Still, reversed-Z with float is ridiculously more accurate than any of the other options.

Reversed-Z with an integer depth buffer is as good as any of the other integer options.

Reversed-Z erases the distinctions between precomposed versus separate view/projection matrices, and finite versus infinite far planes. In other words, with reversed-Z you can compose your projection matrix with other matrices, and you can use whichever far plane you like, without affecting precision at all.

使用浮点深度缓冲区反转Z可以在此测试中给出零错误率。现在，如果继续收紧输入深度值的间距，当然可以使其产生一些错误。不过，与浮点数精度相反的reversed-Z比其他任何方式都更可靠。
用整数深度缓冲区reversed-Z与其他任何整数选项一样好。
Reversed-Z消除了预分解视图/投影矩阵与有限与无限远平面之间的区别。换句话说，使用逆Z可以用其他矩阵组成投影矩阵，并且可以使用任何你喜欢的平面，而不会影响精度。

I think the conclusion here is clear. In any perspective projection situation, just use a floating-point depth buffer with reversed-Z! And if you can't use a floating-point depth buffer, you should still use reversed-Z. It isn't a panacea for all precision woes, especially if you're building an open-world environment that contains extreme depth ranges. But it's a great start.

我认为这里的结论很清楚。在任何透视投影情况下，只需使用带有反转Z的浮点深度缓冲区！如果你不能使用浮点深度缓冲区，你仍然应该使用反转Z. 对于所有精确的灾难，这不是万能的，特别是如果你正在构建一个包含极端深度范围的开放世界的环境。但这是一个很好的开始。

Nathan is a Graphics Programmer, currently working at NVIDIA on the DevTech software team. You can read more on his blog here.

Nathan是一名图形编程人员，目前在NVIDIA公司的DevTech软件团队工作。你可以在他的博客上阅读更多内容。

（译者注：WebGL平台开发一些类似SSAO的后期处理时，需要高精度的depth贴图。这时候depth的精度问题就容易暴露出来，尤其在一些精度低的平台设备上，例如IOS。导致的结果是，渲染的结果上有躁纹。而在实际场景中，往往相机的远近平面设置是不固定的，不能通过设置相机参数来优化精度。为了解决这个问题，我查阅了一些资料，其中这篇文章介绍得比较详细，但没有找到中文翻译的版本，所以翻在博客上供国内开发者阅读。主要采用Google翻译，译者手动修复了不通顺的地方。但译者水平有限，为了防止翻译错误影响阅读，这里采用中英对照形式。）