7.20 Low Precision Shader Support In D3D
7.20 D3D 中的低精度着色器支持****
7.20.1 Overview
7.20.1 概述****
This adds support for 10bit (2.8 fixed point) and 16bit precision float and in some cases limited integer arithmetic to shader model 2.0+.
增加了对 10 位(2.8 定点)和 16 位精度浮点数的支持,在某些情况下,将有限的整数算术应用到着色器模型2.0及以上版本。
"2.8定点"是一种数字表示方法,它是定点数的一种形式。在这种表示法中,数字被分为两部分:一部分是整数部分,另一部分是小数部分。"2.8"表示整数部分有2位,小数部分有8位。
· min16float - minimum 16-bit floating point value.
min16float - 最小 16 位浮点值。
· min10float - minimum 10-bit floating point value.
min10float - 最小 10 位浮点值。
· min16int - minimum 16-bit signed integer.
min16int - 最小 16 位有符号整数。
· min12int - minimum 12-bit signed integer.
min12int - 最小 12 位有符号整数。
· min16uint - minimum 16-bit unsigned integer.
min16uint - 最小 16 位无符号整数。
Shader<->memory I/O operations are unchanged for simplicity, e.g. shader constants continue to be defined as 32-bit per component.
为简单起见,Shader<->memory I/O 操作保持不变,例如,着色器常量继续被定义为每个组件32位。
Implementations are allowed to execute low precision operations at higher precision. So 10-bit arithmetic could be done at 10-bits or more (say 32-bit) precision.
允许实现以更高的精度执行低精度操作。因此,10 位算术可以以 10 位或更高(比如 32 位)的精度完成。
min10float a = 0.2;
min10float b = 1.1;
min10float c = a + b = 1.3;
可以使用32位精度实现
float a = 0.2;
float b = 1.1;
float c = a + b = 1.3;
7.20.1.1 Design Goals / Assumptions
7.20.1.1 设计目标/假设****
1. Enable D3D applications to take advantage of hardware that implements low precision shader arithmetic
1. 使 D3D 应用程序能够利用硬件实现低精度着色器运算。
2. Shaders authored for low precision work unmodified on hardware that operates at higher precision
2. 在硬件上,低精度着色器的编写通常不需要进行修改,即可在支持更高精度的硬件上运行。
· Application does not have to author multiple versions of a shader, but has to be careful that the shader will operate at variable precision as low as the minimum precision it chooses
· 应用程序不必创作多个版本的着色器,但必须注意着色器将操作在可变精度低至它选择的最小精度
3. Shaders authored for low precision can trivially be cleaned up by the runtime to be in a format that old drivers understand
3. 为低精度编写的着色器可以很容易地由运行时清理,使其采用旧驱动程序可以理解的格式
4. Minimal driver work to either support low precision processing or not support it
4. 最小的驱动程序工作,要么支持低精度处理,要么不支持它
· E.g. Drivers can compile shaders once when they are initially submitted
· 例如,驱动程序可以在最初提交着色器时编译一次
· Ideally, Constant Buffers also don’t require any special processing by drivers to account for the contents being referenced at various precisions (IHV can choose to build downconverting hardware for this)
· 理想情况下,常量缓冲区也不需要驱动程序进行任何特殊处理来考虑以各种精度引用的内容 (IHV 可以选择为此生成下变频硬件)
· Drivers that don’t support the feature can simply ignore the precision hints.
· 不支持该功能的驱动程序可以简单地忽略精度提示。
· To understand the precision level a given shader instruction in the bytecode can operate (including converting precisions on operands if necessary), drivers will not have to do any complex far reaching analysis – just looking at the current instruction should be informative enough, possibly with the help of shader declarations.
· 要了解字节码中给定着色器指令可以操作的精度级别(包括在必要时转换操作数上的精度),驱动程序将不必进行任何复杂的深远分析 - 只需查看当前指令就应该有足够的信息,可能需要着色器声明的帮助。
struct VertexIn
{
float3 PosL : POSITION;
float4 Color : COLOR;
};
struct VertexOut
{
float4 PosH : SV_POSITION;
float4 Color : COLOR;
};
min10float4 a = min10float4(1.0,2.0,3.0,4.0);
VertexOut VS(VertexIn vin)
{
VertexOut vout;
vout.PosH = a;
return vout;
}
// cbuffer $Globals
// {
// min10float4 a; // Offset: 0 Size: 16
// = 0x3f800000 0x40000000 0x40400000 0x40800000
// }
vs_5_0
dcl_globalFlags refactoringAllowed | enableMinimumPrecision
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output_siv o0.xyzw, position
mov o0.xyzw, cb0[0].xyzw {min2_8f as def32}
ret
5. Application codebase does not need to change at all to use low precision shaders
5. 应用程序代码库不需要更改即可使用低精度着色器
· Shaders can be dropped in with no other codebase change
· 着色器可以放入其中,而无需更改其他代码库
Struct PSInput
{
float4 position : SV_POSITION;
float4 color : COLOR;
};
PSInput VSMain(float4 position : POSITION, float4 color : COLOR)
{
PSInput result;
result.position = position;
Result.color = color;
return result;
}
改为低精度着色器
Struct PSInput
{
min16float4 position : SV_POSITION;
min16float4 color : COLOR;
};
PSInput VSMain(min16float4 position : POSITION, min16float4 color : COLOR)
{
PSInput result;
result.position = position;
Result.color = color;
return result;
}
6. Low precision support is added to all interesting shader models (2.x-5.0) as opposed to limiting it to the bottom end or adding a new shader model.
6. 低精度支持被添加到所有有趣的着色器模型 (2.x-5.0) 中,而不是将其限制在底端或添加新的着色器模型。
· Applications don’t have to make a choice between choosing low precision vs using other features if the hardware supports it all.
· 如果硬件支持所有功能,则应用程序不必在选择低精度和使用其他功能之间做出选择。
· Similarly hardware vendors implementing any shader level can choose to exploit low precision (indepdendent decisions).
· 同样,实现任何着色器级别的硬件供应商都可以选择利用低精度(独立决策)。
7. Data format for the various low precisions is well defined, though it is not directly visible to applications
7. 各种低精度的数据格式已明确定义,但对应用程序不直接可见
· During shader execution, implementations can use equal or any amount of additional precision.
· 在着色器执行期间,实现可以使用相等或任意数量的额外精度。
7.20.2 Precision Levels
7.20.2 精度级别****
The new 10 and 16 bit precision levels for shaders are inspired by their existence in some real hardware and their presence in OpenGL ES. (8 bit was considered but cut due to its limitations versus the value it seemed to provide at the time).
| .**** | Default Precision**** | Min 10-bit fixed point (2.8)**** | Min 16-bit int / float**** | 32-bit int/float**** | 64-bit float**** |
|---|---|---|---|---|---|
| Executing at higher precision allowed? | - | Y | Y | N | N |
| Shader Constants | - | N | N | Y | Y |
| SM 2.x | VS: fp32 / int23 PS: fp24 (s16e7) / int 16 | opt | opt | N | N |
| SM 3.0 | fp32 | N | N | Y | N |
| SM 4.x | fp32 / int32 | opt | opt | Y | opt |
| SM 5.0 | fp32 / int32 | opt | opt | Y | opt |
| Float range | - | [-2,2) | [-214,214] | Full IEEE 754 | Full IEEE 754 |
| Float magnitude range浮点浮动 | - | 2-8...2 | On SM 4+, includes INF(无穷大)/NAN | Full IEEE 754 | Full IEEE 754 |
| Int range | - | - | (-211,211), Full range signed and unsigned on SM4+ | full | - |
7.20.2.1 10-bit min precision level
7.20.2.1 10位最小精度级别****
This is a 2.8 fixed point value, though the fixed point semantics may not be identical to the general fixed point semantics defined in the D3D10+ specs.Following the D3D10+ fixed point semantics is recommended for future hardware that may choose to implement the 10-bit precision level.
这是一个 2.8 定点值,但定点语义可能与 D3D10+ 规范中定义的一般定点语义不同。对于可能选择实现 10 位精度级别的未来硬件,建议遵循 D3D10+ 定点语义。
In Direct3D 10, the following types are modifiers to the float type:
在 Direct3D 10 中,以下类型是浮点类型的修饰符:
· snorm float - IEEE 32-bit signed-normalized float in range -1 to 1 inclusive.
snorm float - IEEE 32 位有符号规范化浮点数,范围为 -1 到 1(含)。
· unorm float - IEEE 32-bit unsigned-normalized float in range 0 to 1 inclusive.
unorm float - IEEE 32 位无符号规范化浮点数,范围为 0 到 1(含)。
8-bit UNORM data is invertable when passed through 10-bit min-precision storage. For example: Suppose UNORM 8-bit data that is point sampled from the texture format DXGI_FORMAT_R8G8B8A8_UNORM gets read into a shader and is stored and passed around in the 10-bit representation.If that data s subsequently written unchanged out to a UNORM 8-bit output (such as a DXGI_FORMAT_R8G8B8A8_UNORM rendertarget) the output UNORM value matches the input UNORM value.This guarantee does not (cannot) apply for other formats passing through 10-bit, such as 8-bit UNORM_SRGB or higher precision UNORM values like 16-bit UNORM.
当通过10位最小精度存储传递时,8位UNORM数据是可逆的。例如:假设从纹理格式DXGI_FORMAT_R8G8B8A8_UNORM中点采样的UNORM 8位数据被读入着色器,并以10位表示形式存储和传递。如果该数据随后不变地写入到UNORM 8位输出(例如,DXGI_FORMAT_R8G8B8A8_UNORM渲染目标),则输出的UNORM值将与输入的UNORM值匹配。这个保证不适用于通过10位传递的其他格式,例如8位的UNORM_SRGB或更高精度的UNORM值,如16位的UNORM。
From the shader point of view the 10-bit min-precision level this appears as a float value with at minimum [-2,2) range.
从着色器的角度来看,10位最小精度级别表现为一个浮点值,其范围至少为[-2,2)。
Hardware that supports 10-bit precision must also support 16-bit precision.
支持 10 位精度的硬件也必须支持 16 位精度。
7.20.2.2 16-bit min-precision level
7.20.2.2 16位最小精度级别****
7.20.2.2.1 float16
7.20.2.2.1 浮点数16
For float values, this is float 16 as defined in the D3D10+ specs. The exception is that for Shader Models 2, the max. exponent encoding (normally defining NaN/INF) are unused (undefined).
对于浮点值,这是 D3D10+ 规范中定义的浮点 16。例外情况是,对于着色器模型 2,最大指数编码(通常定义 NaN/INF)未使用(未定义)。
Conversion from float32 (e.g. from shader constants) to float16 may or may not flush float16 denorm to 0, and round to zero is used, per D3D spec for high to low precision float.Float16 arithmetic operations within the shader may or may not flush float16 denorm to 0, and may either round to nearest even or truncate to a representable number. Out of range values in conversion from float32 or arithmetic may produce +/-MAX_FLOAT16 or +/- INF.
从float32(例如,来自着色器常量)到float16的转换可能会也可能不会将float16 denorm为0,并且根据D3D规范,用于高精度到低精度浮点数的转换时,将使用向零舍入。着色器内的float16算术运算可能会也可能不会将float16 denorm为0,并且可能向最近的偶数舍入或截断为可表示的数字。从float32转换或算术运算中得到的超出范围的值可能会产生+/-MAX_FLOAT16或+/- INF。
denorm 是指 IEEE754中那些特别小的接近0的数值
16-bit integer min-precision is available as well in HLSL. For Shader Models 2, this is constrained to be representable as integral floats (1.0f, 2.0f, etc.) in a float16 encoding.In the shader bytecode these appear simply as float16, so native integer operations are not available. (it may not be worth bothering to expose this constrained form of int16 for SM 2/3)
在HLSL中,也提供了16位整数的最小精度。对于着色器模型2,这被限制为能够以float16编码表示为整数浮点数(1.0f,2.0f等)。在着色器字节码中,这些只是以float16的形式出现,因此不可用原生的整数运算。(对于SM 2/3,可能没有必要去暴露这种受限制的int16形式。)
7.20.2.3 int16/uint16****
For shader model 4+, native integer ops can be used on 16-bit min-precision values, however applications must beware that the device could choose to simply use larger-than-16-bit (e.g.32 bit) integer ops without any clamping to maintain the illusion that there are not more than 16 bits present.
对于着色器模型4+,可以在16位最小精度值上使用本地整数操作,但是应用程序必须注意,设备可能会选择简单地使用大于16位(例如32位)的整数操作,而不进行任何截断,以维持不存在超过16位的假象。
总之,这句话提醒开发人员在着色器中使用整数操作时要谨慎,因为设备可能会自动选择更大的整数位数,而不进行截断。
Shader Constants feeding 16-bit shader arithmetic are always fp32 encoded for Shader Model 2. For Shader Models 4+, Shader Constants feeding 16-bit in the shader are specified as float32 or UINT32/INT32 as appropriate (i.e.unchanged from the way constants feed into float32 arithmetic).
为Shader Model 2,输入16位着色器运算的着色器常量总是以fp32编码。对于Shader Models 4+,根据具体情况而定,在着色器中输16位的着色器常量被指定为float32或UINT32/INT32(即,与常量输入到float32运算的方式相同)。
7.20.3 Low Precision Shader Bytecode
7.20.3 低精度着色器字节码****
7.20.3.1 D3D9****
A new MIN_PRECISION enum is added to the source and dest parameter token, definition below. This specifies the minimum precision level for the entire operation – implementations can use equal or greater precision.
一个新的MIN_PRECISION枚举被添加到源和目的参数标记中,定义如下。这指定了整个操作的最小精度级别 - 实现可以使用相同或更高的精度。
This new enum co-exists with the PARTIALPRECISION flag that is already in the same dest parameter token – see the comment below.
这个新的枚举与已位于同一目的参数标记中的 PARTIALPRECISION 标志共存 - 请参阅下面的注释。
7.20.3.1.1 Token Format
7.20.3.1.1 Token 格式****
| // Source or dest token bits [15:14]:#define D3D11_SB_OPERAND_MIN_PRECISION_MASK 0x0001C000#define D3D11_SB_OPERAND_MIN_PRECISION_SHIFT 14 typedef enum _D3DSHADER_MIN_PRECISION{ D3DMP_DEFAULT = 0, // Default precision for the shader model D3DMP_16 = 1, // Min 16 bit per component D3DMP_2_8 = 2, // Min 10 bits (2.8) per component} D3DSHADER_MIN_PRECISION;// When MIN_PRECISION is nonzero on a dest token, the dest modifier// D3DSPDM_PARTIALPRECISION must also be set for consistency//// If D3DSPDM_PARTIALPRECISION is set but// D3DSHADER_MIN_PRECISION is D3DMP_DEFAULT(0),// it is equivalent to D3DSPDM_PARTIALPRECISION + D3DMP_16// (partial PARTIALPRECISION existed before MIN_PRECISION was// added, so this defines how the two can coexist without changing// meaning for old shaders) |
|---|
7.20.3.1.2 Usage Cases
7.20.3.1.2 用例****
The src/dest token for instructions in PS/VS 2.x can use the MIN_PRECISION enum in the following circumstances:
在以下情况下,PS/VS 2.x中的指令的src/dest token可以使用MIN_PRECISION枚举:
1.Any shader instruction with an output (e.g. arithmetic, texture fetch instructions )
1. 任何带有输出的着色器指令(例如算术、纹理获取指令)
2. PS 2.x input texcoord (t#) declarations (allowing lower precision interpolation)
2. PS 2.x 输入texcoord(t#) 声明(允许较低精度的插值)
· Does not apply to PS 2.x input color (v#) declarations, as these were already by definition called out to be as low as 8 bit)
· 不适用于 PS 2.x 输入颜色 (v#) 声明,因为根据定义,这些声明已被列为低至 8 位)
3. PS 3.0 input attribute (v#) declarations (allowing lower precision interpolation)
3. PS 3.0 输入属性 (v#) 声明(允许较低精度的插值)
4. Constant references (discussed more here(7.20.3.5))
4. 常量引用(此处 (7.20.3.5) 详细讨论)
5. (Shader Model 3.0 is not affected since the D3D11 runtime does not expose it)
5. (着色器模型 3.0 不受影响,因为 D3D11 运行时不会公开它)
7.20.3.1.3 Interpreting Minimum Precision
7.20.3.1.3 解释最小精度****
See here(7.20.3.4); this is common across D3D9 and D3D10+.
看这里 (7.20.3.4) ; 这在 D3D9 和 D3D10+ 中很常见。
7.20.3.2 D3D10+****
A new MIN_PRECISION enum is added to the dest parameter token, definition below. This specifies the minimum precision level for the entire operation – implementations can use equal or greater precision.
一个新的MIN_PRECISION枚举被添加到目的参数标记中,定义如下。这指定了整个操作的最小精度级别 - 实现可以使用相同或更高的精度。
The encoding distinguishes type (e.g. float vs. sint vs. uint), in addition to precision level, to disambiguate instructions like “mov” that don’t already imply a type. This makes a difference when there is a size change involved in the instruction. E.g.moving a 32 bit float to a min. 16 bit float is a different task for hardware than moving a 32 bit uint to a min. 16 bit uint. This type distinction is not needed for the D3D9 shader bytecode because all arithmetic is “float” there.
编码还区分类型(例如 float 和 sint 和 uint),除了精度级别之外,为了消除歧义指令诸如“mov”,这些指令尚未暗示类型。当指令中涉及大小更改时,这会有所不同。例如,对于硬件来说,将 32 位浮点数移动到最小 16 位浮点数与将 32 位 uint 移动到最小 16 位 uint 的任务是不同的。D3D9 着色器字节码不需要此类型区分,因为在那里所有运算都是“浮点”。
7.20.3.2.1 Token Format
7.20.3.2.1 Token格式****
| // Min precision specifier for source/dest operands. This// fits in the extended operand token field. Implementations are free to// execute at higher precision than the min – details spec’d elsewhere.// This is part of the opcode specific control range.typedef enum D3D11_SB_OPERAND_MIN_PRECISION{ D3D11_SB_OPERAND_MIN_PRECISION_DEFAULT = 0, // Default precision // for the shader model D3D11_SB_OPERAND_MIN_PRECISION_FLOAT_16 = 1, // Min 16 bit/component float D3D11_SB_OPERAND_MIN_PRECISION_FLOAT_2_8 = 2, // Min 10(2.8)bit/comp. float D3D11_SB_OPERAND_MIN_PRECISION_SINT_16 = 4, // Min 16 bit/comp. signed integer D3D11_SB_OPERAND_MIN_PRECISION_UINT_16 = 5, // Min 16 bit/comp. unsigned integer} D3D11_SB_OPERAND_MIN_PRECISION;#define D3D11_SB_OPERAND_MIN_PRECISION_MASK 0x0001C000#define D3D11_SB_OPERAND_MIN_PRECISION_SHIFT 14 // DECODER MACRO: For an OperandToken1 that can specify// a minimum precision for execution, find out what it is.#define DECODE_D3D11_SB_OPERAND_MIN_PRECISION(OperandToken1) ((D3D11_ SB_OPERAND_MIN_PRECISION)(((OperandToken1)& D3D11_SB_OPERAND_MIN_PRECISION_MASK)>> D3D11_SB_OPERAND_MIN_PRECISION_SHIFT)) // ENCODER MACRO: Encode minimum precision for execution// into the extended operand token, OperandToken1#define ENCODE_D3D11_SB_OPERAND_MIN_PRECISION(MinPrecision) (((MinPrecision)<< D3D11_SB_OPERAND_MIN_PRECISION_SHIFT)& D3D11_SB_OPERAND_MIN_PRECISION_MASK) // ----------------------------------------------------------------------------// Global Flags Declaration//// OpcodeToken0://... snip ... // [16:16] Enable minimum-precision data types ... snip ... //// OpcodeToken0 is followed by no operands.//// ----------------------------------------------------------------------------... snip ...#define D3D11_1_SB_GLOBAL_FLAG_ENABLE_MINIMUM_PRECISION (1<<16)... snip ... // DECODER MACRO: Get global flags#define DECODE_D3D10_SB_GLOBAL_FLAGS(OpcodeToken0) ((OpcodeToken0)&D3D10_SB_GLOBAL_FLAGS_MASK) // ENCODER MACRO: Encode global flags#define ENCODE_D3D10_SB_GLOBAL_FLAGS(Flags) ((Flags)&D3D10_SB_GLOBAL_FLAGS_MASK) |
|---|
7.20.3.3 Usage Cases
7.20.3.3 用例****
The dest and source operand tokens in SM 4.0+ can use the MIN_PRECISION enum in the following circumstances:
SM 4.0+ 中的 dest 和 source 操作数tokens可以在以下情况下使用 MIN_PRECISION 枚举:
1. Any instruction that returns values to the shader
1. 向着色器返回值的任何指令
· e.g. mul
· 例如:mul
2. Any memory fetch (incl texture sampling) or data move
2. 任何内存提取(包括纹理采样)或数据移动
3. Type conversion instructions
3. 类型转换指令
· Those involving doubles, e.g. ftod or dtof only allow precision lowering on the float32 side of the operation.
· 对于涉及到双精度浮点数的操作,例如 ftod 或 dtof,只允许在 float32 的一侧降低精度。
· Other conversions, such as f32tof16, allow precision lowering on either side of the operation.
· 其他转换,如 f32tof16,允许在操作的任一侧精确降低。
4.Exceptions (precision lowering not allowed)
4.例外情况(不允许降低精度)
· double precision arithmetic
· 双精度运算
· atomic operations
· 原子操作
· load/store to non-Typed Unordered Access Views (Typed UAVs ok, since that involves format conv.)
· 加载/存储到非类型化的无序访问视图(类型化的UAVs可以,因为这涉及格式转换。)
· Geometry Shader stream output
· 几何着色器流输出
5. Input and output attribute declarations
5. 输入和输出属性声明
l At VS input, the input data types continue to be defined externally (Input Layout), but MIN_PRECISION can still be part of the shader input declaration, indicating how the shader will expect to see the data after it has been read in (post format conversion).
l 在 VS 输入时,输入数据类型继续在外部定义 (输入布局) ,但MIN_PRECISION仍然可以是着色器输入声明的一部分,指示着色器在读入数据后将如何查看数据(格式转换后)。
7.20.3.4 Interpreting Precision (same for D3D9 and D3D10+)
7.20.3.4 解释精度(D3D9 和 D3D10+ 相同)****
1. Source operands are incoming stored at the (minimum) precision indicated on the operand. If no minimum precision is specified (default) the operand precision is 32-bit.
1. 源操作数按操作数上指示的(最小)精度存储。如果没有指定最小精度,(默认值)则操作数精度为 32 位。
2. The precision specified on the output operand determines the minimum storage needed for the output as well as the minimum precision for the operation.
2. 在输出操作数上指定的精度决定了输出所需的最小存储以及操作的最小精度。
3. Mixing precisions across operands and the instruction is valid, but should be rare. Drivers may need to expand format changes into separate individual type conversions to the instruction’s precision unless the conversion is supported natively.
3. 在操作数和指令之间混合精度是有效的,但应该很少见。驱动程序可能需要将格式更改扩展到单独的单个类型转换,以达到指令的精度,除非本机支持转换。
4. Precisions on the index value in dynamic indexing scenarios or other addressing (such as texture coordinates for a texture fetch) just follow the precision indicated on the value, unaffected by the instruction precision.
4. 在动态索引方案或其他寻址(例如纹理提取的纹理坐标)中,索引值的精度仅遵循值上指示的精度,不受指令精度的影响。
The same applies for condition parameters in conditional instructions (like movc).
这同样适用于条件指令(如 movc)中的条件参数。
movc[_sat] dest[.mask], src0[.swizzle], [-]src1[_abs][.swizzle], [-]src2[_abs][.swizzle],
if src0, then src1 else src2
5. See below(7.20.3.5) for a discussion about shader constants.
5. 有关着色器常量的讨论,请参阅下文 (7.20.3.5) 。
7.20.3.5 Shader Constants
7.20.3.5 着色器常量****
Shader constants are defined at full 32-bit per component.New hardware implementing low precision is encouraged to design efficient downconversion support upon constant access, otherwise some driver work or extra conversion instructions will need to be added by the driver into shaders that read 32-bit per component constants into lower precision shader operations.
着色器常量被定义每个分量完整的32 位。鼓励实现低精度的新硬件在常量访问时设计高效的下转换支持,否则驱动程序需要进行一些工作或者驱动程序需要在读取32位每组件常量到低精度着色器操作的着色器中添加额外的转换指令。
Alternative approaches were considered where low precision constants are exposed all the way to the application (freeing driver/hardware from having to convert constants), but the added complexity in the programming model vs the benefit didn’t hold up at least at this time.
考虑了一些替代方法,其中低精度常数被暴露给应用程序(使驱动程序/硬件无需转换常数),但是相对于收益,编程模型中增加的复杂性至少在目前并不成立。
7.20.3.6 Referencing Shader Constants within Shaders
7.20.3.6 在着色器中引用着色器常量****
When referencing a shader constant from a low precision instruction, if the constant value is out of the range of the instruction’s precision level, the value read is undefined.For constant values within range of a low precision instruction reference, the precision of the value may still get quantized down from full 32 bits.
当从低精度指令中引用着色器常量时,如果常量值超出了指令精度级别的范围,读取的值是未定义的。对于低精度指令参考范围内的常量值,该值的精度仍可能从完整的 32 位向下量化。
Shader constants referenced in shader source operands will be marked at the precision they are to be referenced at, even though they come down the API/DDI at 32-bit per component.
在着色器源操作数中引用的着色器常量将被标记为它们应被引用的精度,即使它们通过API/DDI以每个组件32位的形式传递。
1. The constant buffer precision indicated on reference may be different than the precision of a given instruction, since multiple instructions in the shader at different precisions may read the same constant.
1. 常量缓冲区精度在引用上的显示可能与给定指令的精度不同,因为着色器中不同精度的多个指令可能会读取相同的常量。
2. The HLSL compiler guarantees that all accesses of a given constant are marked with the same precision, indicating how much storage is needed for them regardless of what precision operations that reference them are using.
2. HLSL 编译器保证给定常量的所有访问都以相同的精度进行标记,从而指示它们需要多少存储空间,而不管引用它们使用的精度操作如何。
3. Implementations that may need to downconvert constants ahead of shader invocation (likely not ideal) can easily determine the required precision/storage for constants within a shader just by observing how they are tagged on first reference in the shader.
3. 可能需要在着色器调用之前降低常量的实现(可能并非理想),只需通过观察着色器中首次引用时如何标记它们,就可以轻松确定常量所需的精度/存储。
4. In cases of dynamic indexing of constants, there is no way to know which parts of a constant buffer will be referenced at what precision ahead of time. Adding declarations that indicate this information was not deemed worth it at this time.
4. 在常量动态索引的情况下,无法提前知道常量缓冲区的哪些部分将以什么精度被引用。添加声明此信息的声明目前被认为不值得。
7.20.3.7 Component Swizzling
7.20.3.7 组件置换****
Low precision data is referenced by component in masks and swizzles – xyzw - just like default precision data. It is as though the registers do have a smaller number of bits (for hardware that supports lower precision).This is unlike the way double precision is mapped, where xy contains one double and zw contains another. Low precision doesn’t yield sub-fields within .x for example.
低精度数据在掩码和swizzles -xyzw中通过组件引用,就像默认精度数据一样。这就好像寄存器确实具有较少的位数(适用于支持较低精度的硬件)。这与双精度的映射方式不同,其中 xy 包含一个双精度,zw 包含另一个双精度。例如,低精度不会在 .x 中生成子字段。
一个组件是32位,一个寄存器有4个组件,共128位。
The HLSL compiler will not generate code that mixes precisions in different components of any xyzw register (mostly for simplicity, even though this may not matter for hardware).
HLSL 编译器不会生成在任何 xyzw 寄存器的不同组件中混合精度的代码(主要是为了简单起见,即使这对硬件可能无关紧要)。
7.20.3.8 Low Precision Shader Limits
7.20.3.8 低精度着色器限制****
The use of min / low precision specifiers never increases the maximum amount of resources available to a shader (such as limits on inputs, outputs or temp storage), since the shader must always be able to function on hardware that does not operate at low precision.
使用最小/低精度说明符永远不会增加着色器可用的最大资源量(例如对输入、输出或临时存储的限制),因为着色器必须始终能够在不以低精度运行的硬件上运行。
7.20.4 Feature Exposure
7.20.4 功能曝光****
In the D3D system, HLSL shaders are compiled independent of any given device – e.g. they should typically be compiled offline. This compilation step produces device-agnostic bytecode, apart from the choice of shader target, e.g. vs_4_0.
在 D3D 系统中,HLSL 着色器的编译独立于任何给定设备,例如,它们通常应脱机编译。此编译步骤生成与设备无关的字节码,除了选择着色器目标(例如vs_4_0)。
The minimum precision facility described above can be optionally used within any 4_0+ shader, including 4_0_level_9_1 to 4_0_level9_3. These shader targets are all available through the D3D11 runtime, exposing D3D9+ hardware via Shader Model 2_x+.The D3D9 runtime will not expose the low precision modes – updating that runtime is out of scope.
上述最小精度工具可以选择在任何 4_0+ 着色器中使用,包括4_0_level_9_1到4_0_level9_3。这些着色器目标都可通过 D3D11 运行时使用,并通过着色器模型 2_x+ 公开 D3D9+ 硬件。D3D9 运行时不会公开低精度模式 - 更新该运行时超出了范围。
7.20.4.1 Discoverability
7.20.4.1 可发现性****
There is a mechanism at the API to discover the precision levels supported by the current device.Note that in Windows 8 the OS did not allow drivers to expose only 10 bit without also exposing 16 bit, but subsequent operating systems relax that requirement (so an implementation may expose 10 bit min precision but not 16 bit min precision).
API 中有一种机制可以发现当前设备支持的精度级别。请注意,在 Windows 8 中,操作系统不允许驱动程序只公开 10 位而不公开 16 位,但后续操作系统放宽了该要求(因此实现可能会公开 10 位最小精度,但不会公开 16 位最小精度)。
Even though the hardware’s precision support is visible to applications, applications do not have to adjust their shaders for the hardware’s precision level given that by definition operations defined with a min precision run at higher precision on hardware that doesn’t support the min precision.
即使硬件的精度支持对应用程序可见,应用程序也不必针对硬件的精度级别调整其着色器,因为根据定义,使用最小精度定义的操作在不支持最小精度的硬件上以更高的精度运行。
It is fine for hardware to not support low precision processing at all – by simply reporting “DEFAULT” as its precision support. The reason it is called “DEFAULT” rather than some numerical precision is depending on the shader model, there may not be standard value to express.E.g. the default precision in SM 2.x is fp24 (or greater) within the shader, even though there is no API visible fp24 format. If the device reports “DEFAULT” precision, all min-precision specifiers in shaders are ignored.
硬件完全不支持低精度处理是可以的 - 只需报告“DEFAULT”作为其精度支持即可。之所以将其称为“DEFAULT”而不是一些数值精度,是因为取决于着色器模型,可能没有标准值可以表示。例如,SM 2.x 中的默认精度在着色器中是 fp24(或更高),即使没有 API 可见的 fp24 格式。如果设备报告“DEFAULT”精度,则忽略着色器中的所有最小精度说明符。
D3D9 devices are permitted to report a min-precision level that is lower for the Pixel Shader than for the Vertex Shader (all reported via the Windows Next D3D9 DDI).D3D10+ devices can only report a single min-precision level that applies to all shader stages (reported via the Windows Next D3D11.1 DDI) – since it does not seem to make sense to single out the VS any more.
允许 D3D9 设备报告像素着色器的最小精度级别低于顶点着色器(全部通过 Windows Next D3D9 DDI 报告)。D3D10+设备只能报告适用于所有着色器阶段的单个最小精度级别(通过Windows Next D3D11.1 DDI报告)——因为似乎不再有必要单独指定顶点着色器(VS)的精度了。
Note that if the application uses Feature Level 9_x on D3D10+ hardware, the D3D9 DDIs are still used, so the min-precision levels can be reported differently there between VS and PS, as mentioned for D3D9, even though via the D3D11.1 DDI only a single precision can be reported.
请注意,如果应用程序在 D3D10+ 硬件上使用功能级别9_x,则仍使用 D3D9 DDI,因此 VS 和 PS 之间的最小精度级别可能会有所不同,如 D3D9 所述,即使通过 D3D11.1 DDI 只能报告单个精度。
DDI(device driver interface)指的是设备驱动接口。
7.20.4.2 Shader Management
7.20.4.2 着色器管理****
Regardless of the min precision level supported by a given device, it is always valid to use a shader that was compiled using any combination of the low precision levels on it.For example if a device’s min precision level is 32-bit, it is fine to use a shader compiled with some variables that have a min precision of 10 bit.The device is free to implement the low precision operations at any equal or higher precision level (including precision levels not available at the API).
无论给定设备支持的最小精度级别是什么,使用在其上编译的任何低精度级别组合的着色器始终是有效的。例如,如果设备的最小精度级别为 32 位,可以使用着色器编译一些最小精度为 10 位的变量。该设备可以自由地实现低精度操作在一些同等或更高的精度级别(包括 API 上不可用的精度级别)。
For old drivers (pre-D3D11.1 DDI) that are not aware of the low precision feature, the D3D runtime will patch the shader bytecode on shader creation to remove it.This preserves the intent of the shader, since it is valid for the device to execute operations tagged with a min precision level at a higher precision.
对于不知道低精度功能的旧驱动程序(D3D11.1 之前的 DDI),在着色器创建上,D3D 运行时将修补着色器字节码以将其删除。这保留了着色器的意图,因为设备执行用最小精度级别标记的操作在更高的精度下是有效的。
7.20.4.3 APIs/DDIs
7.20.4.3 API/DDI s****
An API for reporting device precision support, no other D3D11 API surface area changes apply.
用于报告设备精度支持的 API,不适用其他 D3D11 API 外围应用更改。
As far as other DDI additions, there is device precision reporting, the shader bytecode additions detailed earlier, and finally a variant of the existing shader stage I/O signature DDI:
就其他 DDI 的添加而言,有设备精度报告,前面详细介绍的着色器字节码添加,最后是现有着色器阶段 I/O 签名 DDI 的一个变体:
The I/O signature DDI includes MinPrecision in the signature entry. This shows up as D3D11_SB_INSTRUCTION_MIN_PRECISION_DEFAULT if the shader didn’t specify a min-precision:
I/O 签名 DDI 在条目中包括 MinPrecision。如果着色器未指定最小精度,则显示为D3D11_SB_INSTRUCTION_MIN_PRECISION_DEFAULT:
| typedef struct D3D11_1DDIARG_SIGNATURE_ENTRY{ D3D10_SB_NAME SystemValue; // D3D10_SB_NAME_UNDEFINED if the particular entry doesn't have a system name. UINT Register; BYTE Mask;// (D3D10_SB_OPERAND_4_COMPONENT_MASK >> 4), meaning 4 LSBs are xyzw respectively D3D11_SB_INSTRUCTION_MIN_PRECISION MinPrecision;} D3D11_1DDIARG_SIGNATURE_ENTRY; typedef struct D3D11_1DDIARG_STAGE_IO_SIGNATURES{ D3D11_1DDIARG_SIGNATURE_ENTRY* pInputSignature; UINT NumInputSignatureEntries; D3D11_1DDIARG_SIGNATURE_ENTRY* pOutputSignature; UINT NumOutputSignatureEntries;} D3D11_1DDIARG_STAGE_IO_SIGNATURES; |
|---|
Motivation: Recall that this DDI exists to complement the shader creation DDIs by providing a more complete picture of the shader stage<->stage I/O layout than may be visible just from an individual shader’s bytecode.
动机:回想一下,此 DDI 的存在是为了补充着色器创建 DDI,通过提供比单个着色器的字节码可能可见的更完整的着色器阶段<->阶段I/O布局的图片。
For example sometimes an upstream stage provides data not consumed by a downstream shader, but it should be possible for a driver to compile a shader on its own without having to wait and see what other shaders it gets used with.
例如,有时上游阶段提供的数据并未被下游着色器使用,但驱动程序应该可以自行编译着色器,而不必等待并查看它会与哪些其他着色器一起使用。
MinPrecision is added in case that affects how the driver shader compiler would want to pack the inter-stage I/O data.
添加MinPrecision是为了考虑这可能会影响驱动程序着色器编译器如何打包阶段间I/O数据。
7.20.4.4 HLSL Exposure
7.20.4.4 HLSL暴露****
Out of scope for this spec.
超出此规范的范围。