通用着色器内部结构3

156 阅读12分钟

7.19 Subroutines / Interfaces

7.19 子程序/接口****

7.19.1 Overview

7.19.1 概述****

The programmable graphics pipeline has given software developers greatly enhanced flexibility and power. As a result, shader programming has evolved to the point where programmers need to combine multiple code building blocks (i.e. subroutines) on the fly.Current approaches generally cause the static creation of thousands of one-off shaders, each using a particular combination of subroutines to realize a specific effect.The use of flow control and looping can reduce the number of these precompiled combinations, but these techniques have a dramatic effect on the runtime performance of the shader code, and applications are still sensitive to the extra instructions and registers used in common shaders. Furthermore, since the shader programs are "kernels" or inner loops, any extra overhead for trying to reuse the same instruction stream to represent multiple combinations is more noticeable than in more traditional CPU code.The application developer has no way of knowing when it is safe, in regards to performance, to use flow control to mitigate code complexity. This leads to a different performance problem: dealing with of thousands of shaders.

可编程图形流水线大大增强了软件开发人员的灵活性和功能。因此,着色器编程已经发展到程序员需要动态组合多个代码构建块(即子例程)的地步。当前的方法通常会导致静态创建数千个一次性着色器,每个着色器都使用特定的子程序组合来实现特定的效果。使用流控制和循环可以减少这些预编译组合的数量,但这些技术对着色器代码的运行时性能有巨大影响,并且应用程序仍然对常见着色器中使用的额外指令和寄存器敏感。此外,由于着色器程序是’内核’或内部循环,尝试重用相同的指令流来表示多种组合的一些额外开销,在更传统的CPU代码中会更加明显。

应用程序开发者无法知道何时可以安全地使用流控制来减轻代码复杂性,以便不影响性能。这导致了一个不同的性能问题:处理数千个着色器。

 

The goal of this feature is to allow applications to have a simple, expressive programming model that abstracts away this combinatoric complexity while still achieving the performance of the custom precompiled shaders.To achieve this goal, we move the complexity from the application level to the driver level where hardware-specific knowledge can be utilized to reduce program size and complexity.

此功能的目标是允许应用程序拥有简单、富有表现力的编程模型,该模型抽象出这种组合复杂性,同时仍能实现自定义预编译着色器的性能。为了实现这一目标,我们将复杂性从应用程序级别转移到驱动程序级别,从而可以利用特定于硬件的知识来减小程序大小和复杂性。

 

To satisfy the performance requirements of inner loop code, the overhead of calling conventions and lost optimizations needs to be addressed. Our method avoids the overhead by using a subroutine model that virtually "inlines" the functions that can be called.This is done by compiling code normally up to a call site, and then compiling all possible callees with the current state of the caller. The functions called would then be optimized for the current register state by mapping inputs and outputs to their current register locations.While this approach increases overall program size, it avoids the cost of both parameter passing and stack save/restore, thereby avoiding the overhead of traditional function calls while preserving runtime flexibility.

为了满足内部循环代码的性能要求,需要解决调用约定和丢失优化的开销。我们的方法通过使用子例程模型来避免开销,该模型实际上“内联”了可以调用的函数。这是通过正常编译代码直到调用点,然后使用调用者的当前状态编译所有可能的被调用函数来完成的。然后,被调用的函数将根据当前寄存器状态进行优化,将输入和输出映射到它们当前的寄存器位置。虽然这种方法增加了整体程序大小,但它避免了参数传递和堆栈保存/恢复的成本,从而避免了传统函数调用的开销,同时保持了运行时的灵活性。

 

The IL ASM has code blocks that act and look like subroutines; there are defined in/out parameters and registers are all local (in/out/temp/scratch).Some global references remain: textures, constant buffers, and sampler. The main difference from normal subroutines is that each location that can call a subroutine has a declaration describing the call destinations that are possible.

IL(Intermediate Language)汇编代码块类似于子程序,具有定义的输入/输出参数,而寄存器都是局部的(输入/输出/临时/临时存储)。一些全局引用仍然存在:纹理、常量缓冲区和采样器。与普通子例程的主要区别在于,每个可以调用子例程的位置都有一个声明,用于描述可能的调用目标。

 

The set of functions to call when executing a given shader program can be changed between draw calls when calling SetShader.When binding the shader program to the pipeline, the list of functions to use is specified. Selecting the set of functions to use between draw calls allows the driver to recalculate the hardware requirements for a specified set of functions.Calculating the true number of registers required for a given "specialization" of a shader provides the combined flexibility of choice at runtime and the performance of a specialized shader.

在调用SetShader时,执行给定着色器程序时要调用的函数集可以在绘制调用之间更改。将着色器程序绑定到管道时,将指定要使用的函数列表。在绘制调用之间选择要使用的函数集,可以让驱动程序重新计算指定函数集的硬件需求。计算给定着色器的’专门化’所需的真正寄存器数量,提供了在运行时选择的灵活性和专门化着色器的性能的组合。

 

7.19.2 Differences from 'Real' Subroutines

7.19.2 与“真实”子例程的区别****

The primary difference of this approach from "real" subroutines is that at runtime no calling convention is used. Each time a function could be called, a version of the function is emitted to match the caller’s register and other state. Since a new version of the callee is emitted for each location in the caller code that the function is called from, all optimizations used when inlining apply, except that callee code must remain functionally separate from caller code.

此方法与“实际”子例程的主要区别在于,在运行时不使用调用约定。每次可以调用函数时,都会发出一个版本的函数以匹配调用方的寄存器和其他状态。由于调用函数的调用方代码中的每个位置都会发出新版本的被调用方,因此内联时使用的所有优化都适用,但被调用方代码在功能上必须与调用方代码保持独立。

 

Take an example: The main function has an fcall(22.7.19) instruction and that fcall instruction has two function implementations that could be called.When generating the microcode for the program to execute, the code is generated up to the fcall routine and the current state of the registers and other shader state is stored off in "StateBeforeCall". Then code is generated for the first function that can be called starting with the current state of register allocation, scratch registers, etc.Next the current state is restored to StateBeforeCall and the code for the second function is generated.Finally the current state is restored to StateBeforeCall again and the impacts of the outputs of the fcall are applied to the current state, and code generation continues after the fcall.

举个例子:main 函数有一个 fcall 指令,该 fcall (22.7.19) 指令有两个可以调用的函数实现。在生成要执行的程序的微码时,代码将生成到 fcall 例程,并且寄存器的当前状态和其他着色器状态存储在“StateBeforeCall”中。然后为第一个函数生成代码,该函数可以从寄存器分配、暂存寄存器等的当前状态开始调用。接下来,将当前状态还原为 StateBeforeCall,并生成第二个函数的代码。最后,将当前状态再次恢复到 StateBeforeCall,并将 fcall 输出的影响应用于当前状态,并在 fcall 之后继续生成代码。

 

Limitations are present in the IL that allow for the calling destination to have a version of a function’s microcode emitted using the current register knowledge of the caller to allocate the callee’s local registers after the caller’s registers so that no saving/restoring of data is required when crossing the function boundary.

在IL中存在限制,允许调用目标使用调用者当前的寄存器知识发出函数的微代码版本,以在调用者的寄存器之后分配被调用者的局部寄存器,从而在跨越函数边界时无需保存/恢复数据。

 

The downside from "real" subroutines is that the amount of code to represent the program can become quite large. No code sharing is done between multiple call sites.If code is larger than the code cache, and the miss latency is not hidden by some other mechanism, then "real" subroutines are very useful. Assuming that the code bloat size is minimal (i.e.each function is only ever called from one location), then performance will be better with the new method – no parameter passing overhead, inlining optimizations, etc.

“真实”子例程的缺点是表示程序的代码量可能会变得非常大。多个调用位置之间不会进行代码共享。如果代码大于代码缓存,并且失效延迟没有被其他机制隐藏,那么“真实”子例程非常有用。假设代码膨胀大小最小(即每个函数只从一个位置调用),那么使用新方法的性能会更好——没有参数传递开销、内联优化等。

 

Another problem with the new method is that all destinations must be known at compile time. Due to validation that is currently done, all calls will be need to be known. As that requirement is relaxed, "real" subroutines are a better way of handling late binding destinations.

新方法的另一个问题是,在编译时必须知道所有目标。由于当前已完成的验证,因此需要知道所有调用。随着该要求的放宽,“真实”子例程是处理后期绑定目标的更好方法。

 

HLSL requires that all texture and sampler parameters be rooted in some well-known global object so that the compiler can determine which texture or sampler index to use for a particular texture or sampler variable throughout the entire program. As fcalls constitute a late-binding boundary the compiler cannot easily track parameter identity and thus texture and sampler arguments to fcalls are not allowed.Note that when only concrete classes are used this isn’t a problem. Additionally, texture and sampler members of classes should be allowed, this limitation only applies to parameters to interface methods that are used with full fcall dispatch.

HLSL 要求所有纹理和采样器参数都植根于某个已知的全局对象,以便编译器可以确定在整个程序中对特定纹理或采样器变量使用哪个纹理或采样器索引。由于 fcalls 构成了后期绑定边界,编译器无法轻松跟踪参数标识,因此不允许对 fcalls 使用纹理和采样器参数。请注意,当仅使用具体类时,这不是一个问题。此外,应允许类的纹理和采样器成员,此限制仅适用于完全 fcall 调用使用的接口方法的参数。

 

Also see the related topics Uniform Indexing of Resources and Samplers(7.11) as well as the this register.

另请参阅相关主题 Unified Indexing of Resources and Samplers (7.11) 以及 this[] (22.7.20) 寄存器。

 

7.19.3 Subroutines: Non-goals

7.19.3 子程序:非目标****

l Fast linking

l 快速链接

Not intended for improving compilation time 

不用于缩短编译时间

l DLL support

l DLL 支持

Not intended for reuse of microcode for standard libraries 

不适用于标准库的微码重用

l Dynamic virtual functions 

l 动态虚函数

Changes to functions called occurs between draw calls – relatively low frequency 

对调用的函数的更改发生在绘制调用之间 - 频率相对较低

 

7.19.4 Subroutines - Instruction Reference

7.19.4 子程序 - 指令参考****

1. dcl_function_body (Function Body Declaration)(22.3.49)

2. dcl_function_table (Function Table Declaration)(22.3.50)

3. dcl_interface/dcl_interface_dynamicindexed (Interface Declaration)(22.3.51)

4. fcall fp#[arrayIndex]callSite

5. "this" Register(22.7.20)

 

7.19.5 Simple Example

7.19.5 简单示例****

7.19.5.1 HLSL - Simple Example

7.19.5.1 HLSL - 简单示例****

interface Light    {        float3 Calculate(float3 Position, float3 Normal);    };     class AmbientLight : Light    {        float3 Calculate(float3 Position, float3 Normal)        {            return AmbientValue;        }         float3 AmbientValue;    };     class DirectionalLight : Light    {        float3 Calculate(float3 Position, float3 Normal)        {            float3 LightDir = normalize(Position - LightPosition);            float LightContrib = saturate( dot( Normal, -LightDir) );            return LightColor * LightContrib;        }         float3 LightPosition;        float3 LightColor;    };     AmbientLight MyAmbient;    DirectionalLight MyDirectional;     float4 main (Light MyInstance, float3 CurPos: CurPosition,                 float3 Normal : Normal) : SV_Target    {        float4 Ret;        Ret.xyz = MyInstance.Calculate(CurPos, Normal);        Ret.w = 1.0;         return Ret;    }

 

7.19.5.2 IL - Simple Example

7.19.5.2 IL - 简单示例****

Register:       this[]

Stage(s):       All(22.1.1)

Description:    Register that refers to 'this' data.

Operation:      'this' data associated with interface object instances is set at the API

                when any given shader is bound to the pipeline.  There are at most 253 slots

                for 'this' data.

                this’ 数据与接口对象实例关联,当任何给定的着色器绑定到管线时,API 设置了这些数据。最多有 253 个槽位用于 ‘this’ 数据。

 The number was chosen to put a bound on the size of the DDI for passing the data to the driver.

                This data can be considered from the point of view of a shader as a 253

                entry array of 32-bit per component 4 component read only registers.

选择这个数字是为了限制传递数据给驱动程序的 DDI 大小。从着色器的角度来看,这些数据可以视为一个 253 个条目的数组,每个条目都由 32 位每组件 4 组件的只读寄存器组成。

                The 4 components of a this[] register contain:

                x: UINT32 index for which constant buffer holds the instance data

                y: UINT32 base element offset of the instance data in the instance constant buffer.

                z: UINT32 base texture index

                w: UINT32 base sampler index

x: UINT32 表示常量缓冲区中保存实例数据的索引

y: UINT32 表示实例数据在实例常量缓冲区中的基本元素偏移量。

z: UINT32 表示基本纹理索引

w: UINT32 表示基本采样器索引

| //// Generated by Microsoft (R) HLSL Shader Compiler 10.1////// Buffer Definitions://// cbuffer Globals//////  structAmbientLight//  //      //      float3AmbientValue;          //Offset:   0////  MyAmbient;                      //Offset:   0Size:   12//  //  structDirectionalLight//  //      //      float3LightPosition;         //Offset:  16//      float3LightColor;            //Offset:  32////  MyDirectional;                  //Offset:  16Size:   28////////interfacesGlobals// {////   struct AmbientLight//   {//       //       float3 AmbientValue;           // Offset:    0////   } MyAmbient;                       // Offset:    0 Size:    12//   //   struct DirectionalLight//   {//       //       float3 LightPosition;          // Offset:   16//       float3 LightColor;             // Offset:   32////   } MyDirectional;                   // Offset:   16 Size:    28//// }//// interfaces ThisPointer// {////   interface Light MyInstance;        // Offset:    0 Size:     1//// }////// Resource Bindings://// Name                                 Type  Format         Dim      HLSL Bind  Count// ------------------------------ ---------- ------- ----------- -------------- ------// $Globals                          cbuffer      NA          NA            cb0      1//////// Input signature://// Name                 Index   Mask Register SysValue  Format   Used// -------------------- ----- ------ -------- -------- ------- ------// CurPosition              0   xyz         0     NONE   float   xyz// Normal                   0   xyz         1     NONE   float   xyz////// Output signature://// Name                 Index   Mask Register SysValue  Format   Used// -------------------- ----- ------ -------- -------- ------- ------// SV_Target                0   xyzw        0   TARGET   float   xyzw////// Available Class Types://// Name                             ID CB Stride Texture Sampler// ------------------------------ ---- --------- ------- -------// DirectionalLight                  0         2       0       0// AmbientLight                      1         1       0       0//// Available Class Instances://// Name                        Type CB CB Offset Texture Sampler// --------------------------- ---- -- --------- ------- -------// MyAmbient                      1  0         0       -       -// MyDirectional                  0  0         1       -       -//// Interface slots, 1 total:////             Slots// +----------+---------+---------------------------------------// | Type ID  |   0     |0    1    // | Table ID |         |0    1    // +----------+---------+---------------------------------------ps_5_0dcl_globalFlags refactoringAlloweddcl_constantbuffer CB0[3], immediateIndexed // Function table for AmbientLight.    dcl_function_body fb0    dcl_function_table ft0 = { fb0 }     // Function table for DirectionalLight.    dcl_function_body fb1    dcl_function_table ft1 = { fb1 }     // main's MyMaterial parameter.    dcl_interface fp0[1][1] = { ft0, ft1 };     // main shader code // call AmbientLight or DirectionalLight based on function pointer bound//float4 main (Light MyInstance, float3 CurPos: CurPosition,                 float3 Normal : Normal) : SV_Target//{//        float4 Ret;//        Ret.xyz = MyInstance.Calculate(CurPos, Normal);//        Ret.w = 1.0;//        return Ret;//}    fcall fp0[0][0]    mov o0.xyz, r0.xyzx    mov o0.w, l(1.000000)    ret // AmbientLight::Calculate//float3 Calculate(float3 Position, float3 Normal)//{//  return AmbientValue;//}    label fb0    mov r0.w, this[0].y  //实例数据在实例常量缓冲区中的基本元素偏移量    mov r1.x, this[0].x  //常量缓冲区中保存实例数据的索引    mov r0.xyz, cb[r1.x + 0][r0.w + 0].xyzx    ret // DirectionalLight::Calculate//float3 Calculate(float3 Position, float3 Normal)//{//  float3 LightDir = normalize(Position - LightPosition);//  float LightContrib = saturate( dot( Normal, -LightDir) );//  return LightColor * LightContrib;//}    label fb1    mov r0.w, this[0].y    mov r1.xyz, this[0].xyxx    add r1.yzw, v0.xxyz, -cb[r1.z + 0][r1.y + 0].xxyz    dp3 r2.x, r1.yzwy, r1.yzwy    rsq r2.x, r2.x  //dest = 1.0f / sqrt(src0)    mul r1.yzw, r1.yyzw, r2.xxxx    dp3_sat r1.y, v1.xyzx, -r1.yzwy    mul r1.xyz, r1.yyyy, cb[r1.x + 0][r0.w + 1].xyzx    mov r0.xyz, r1.xyzx    ret | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

 

7.19.5.3 API - Simple Example

7.19.5.3 API - 简单示例****

//create the shader    //    and specify the class library to load class instance info into    pDevice->CreatePixelShader(pShaderCode, pMyClassLinkage, &pMyPS);     //get a handle to the MyDirectional and MyAmbient class instances    //    from the class library    //the zero is an array index for when the variable is an array.    pMyClassLinkage->GetClassInstance(L"MyDirectional", 0, &pMyDirectionalLight);    pMyClassLibrary->GetClassInstance(L"MyAmbient", 0, & pMyAmbientLight);     while (true)    {        // select either the MyDirectionalList or MyAmbient class        if (DirectionalLighting)            pDevice->PSSetShader(pMyPS, &pMyDirectionalLight, 1);        else            pDevice->PSSetShader(pMyPS, &pMyAmbientLight, 1);         RenderScene();    }

 

7.19.6 Runtime API for Interfaces

7.19.6 接口的运行时 API****

7.19.6.1 Overview

7.19.6.1概述****

The programming model for subroutines is an interface driven model. The interface provides the definition of the function tables that can be switched between efficiently.A level of data abstraction is also present to allow for swapping of both data and function pointers during SetShader calls. At SetShader time, an array of class instantiations is specified that correspond to the interfaces that are used by the shader.The shader reflection system specifies information for each entry in the required interface array. A runtime reflection API is required to be able to specify the class instance in a way that can be efficiently mapped by the runtime to function pointers for the driver calls to consume.The runtime API does not need to be complex, just a method of providing handles to class instances.

子例程的编程模型是接口驱动的模型。该接口提供了函数表的定义,该函数表可以高效切换。还存在数据抽象级别,以允许在 SetShader 调用期间交换数据和函数指针。在 SetShader 时,指定了与着色器使用的接口相对应的类实例化数组。着色器反射系统为所需接口数组中的每个条目指定信息。运行时反射 API 需要能够指定类实例,以便运行时可以有效地将该实例映射到驱动程序调用的函数指针。运行时 API 不需要很复杂,只需为类实例提供句柄的方法即可。

 

The runtime API has only one goal: Provide a handle to SetShader that can be efficiently used to specify to the driver what functions should be executed for a given shader bind.To achieve this goal, a collection of class information is required if the class instance handles are to be shared across multiple shaders i.e. between all shaders within an effect.When a shader is created, a ID3D11ClassLinkage is a new parameter that specifies where to add the class metadata to. If the same class library is specified to two shaders, then the same class instance handles are used when binding either shader.The collection of class metadata could be global to a given device, but that could become cumbersome when mixing large collection of shaders (i. e. keeping a middleware solution separate from another middleware solution).

运行时 API 只有一个目标:为 SetShader 提供句柄,该句柄可用于有效地向驱动程序指定应为给定着色器绑定执行哪些函数。为了实现此目标,如果要在多个着色器之间共享类实例句柄,即在效果中的所有着色器之间共享类信息,则需要类信息的集合。创建着色器时,ID3D11ClassLinkage 是一个新参数,用于指定指定将类元数据添加到何处。如果为两个着色器指定了相同的类库,则在绑定任一着色器时将使用相同的类实例句柄。

类元数据的集合对于给定设备可能是全局的,但在混合大量着色器时,这可能会变得很麻烦(即将中间件解决方案与另一个中间件解决方案分开)。

 

7.19.6.2 Prototype of changes

7.19.6.2 变更原型****

interface ID3D11ClassLinkage : IUnknown    {    // PRIMARY FUNCTION - get a reference to an instance of a class    //    that exists in a shader.  The common scenario is to refer to    //    variables declared in shaders, which means that a reference is    //    acquired with this function and then passed in on SetShader        HRESULT GetClassInstance(            WCHAR *pszClassInstanceName,            UINT uInstanceIndex,            ID3D11ClassInstance **pClassInstance);     //  Create a class instance reference that is the combination of a class    //    type and the location of the data to use for the class instance    //      - not the common scenario, but useful in case the data location    //        for a class is dynamic or not known until runtime        HRESULT CreateClassInstance(            WCHAR *pszClassTypeName,            UINT ConstantBufferOffset,            UINT ConstantVectorOffset,            UINT TextureOffset,            UINT SamplerOffset,            ID3D11ClassInstance **pClassInstance);    }     //  Specifying the calls in "10 speak".  Use the follow as an example    //    of how one could retrofit D3D10 and then put that into the D3D11 API    //    i.e. ignoring split of Creats off of device, new stages, etc.    Interface ID3D11Device    {        [ … Existing calls … ]     //  Shader create calls take a parameter to specify the class library    //     to append the class symbol information from the shader into    //     this is a NON-OPTIONAL parameter.  A shader is unusable without    //     the funciton table information being used (assuming it has any)        HRESULT CreateVertexShader(            void *pShaderBytecode,            SIZE_T BytecodeLength,            ID3D11ClassLinkage *pClassLinkage,            ID3D11VertexShader **ppVertexShader);         HRESULT CreateGeometryShader(            void *pShaderBytecode,            SIZE_T BytecodeLength,            ID3D11ClassLinkage *pClassLinkage,            ID3D11VertexShader **ppVertexShader);         HRESULT CreatePixelShader(            void *pShaderBytecode,            SIZE_T BytecodeLength,            ID3D11ClassLinkage *pClassLinkage,            ID3D11VertexShader **ppVertexShader);     // Not shown: Similar to above for Hull Shader, Domain Shader and Compute Shader        HRESULT CreateClassLinkage(            ID3D11ClassLinkage **ppClassLinkage);     //  Shader bind calls take an extra array to specify the function tables    //      to use until the next bind shader call        void VSSetShader(            ID3D11VertexShader *pShader,            ID3D11ClassInstance *ppClassInstances,            UINT NumInstances);         void GSSetShader(            ID3D11GeometryShader *pShader,            ID3D11ClassInstance *ppClassInstances,            UINT NumInstances);         void PSSetShader(            ID3D11PixelShader *pShader,            ID3D11ClassInstance *ppClassInstances,            UINT NumInstances);         // Not shown: Similar to above for Hull Shader, Domain Shader and Compute Shader    }

 

7.19.7 Complex Example

7.19.7 复杂示例****

7.19.7.1 HLSL - Complex Example

7.19.7.1 HLSL - 复杂示例****

interface Light    {        float3 Calculate(float3 Position, float3 Normal);    };     class AmbientLight : Light    {        float3 m_AmbientValue;         float3 Calculate(float3 Position, float3 Normal)        {            return m_AmbientValue;        }    };     class DirectionalLight : Light    {        float3 m_LightDir;        float3 m_LightColor;         float3 Calculate(float3 Position, float3 Normal)        {            float LightContrib = saturate( dot( Normal, -m_LightDir) );            return m_LightColor * LightContrib;        }    };     uint g_NumLights;    uint g_LightsInUse[4];    Light g_Lights[9];     float3 AccumulateLighting(float3 Position, float3 Normal)    {        float3 Color = 0;         for (uint i = 0; i < g_NumLights; i++)        {            Color += g_Lights[g_LightsInUse[i]].Calculate(Position, Normal);        }         return Color;    }     interface Material    {        void Perturb(in out float3 Position, in out float3 Normal, in out float2 TexCoord);        float3 CalculateLitColor(float3 Position, float3 Normal, float2 TexCoord);    };     class FlatMaterial : Material    {        float3 m_Color;         void Perturb(in out float3 Position, in out float3 Normal, in out float2 TexCoord)        {        }        float3 CalculateLitColor(float3 Position, float3 Normal, float2 TexCoord)        {            return m_Color * AccumulateLighting(Position, Normal);        }    };     class TexturedMaterial : Material    {        float3 m_Color;        Texture2D m_Tex;        sampler m_Sampler;         void Perturb(in out float3 Position, in out float3 Normal, in out float2 TexCoord)        {        }        float3 CalculateLitColor(float3 Position, float3 Normal, float2 TexCoord)        {            float3 Color = m_Color;             Color *= m_Tex.Sample(m_Sampler, TexCoord) * 0.1234;             Color *= AccumulateLighting(Position, Normal);             return Color;        }    };     class StrangeMaterial : Material    {        void Perturb(in out float3 Position, in out float3 Normal, in out float2 TexCoord)        {            Position += Normal * 0.1;        }        float3 CalculateLitColor(float3 Position, float3 Normal, float2 TexCoord)         {            return AccumulateLighting(Position, Normal);        }    };     float TestValueFromLight(Light Obj, float3 Position, float3 Normal)    {        float3 Calc = Obj.Calculate(Position, Normal);        return saturate(Calc.x + Calc.y + Calc.z);    }     AmbientLight g_Ambient0;    DirectionalLight g_DirLight0;    DirectionalLight g_DirLight1;    DirectionalLight g_DirLight2;    DirectionalLight g_DirLight3;    DirectionalLight g_DirLight4;    DirectionalLight g_DirLight5;    DirectionalLight g_DirLight6;    DirectionalLight g_DirLight7;     FlatMaterial g_FlatMat0;    TexturedMaterial g_TexMat0;    StrangeMaterial g_StrangeMat0;     float4 main (        Material MyMaterial,        float3 CurPos: CurPosition,        float3 Normal : Normal,        float2 TexCoord : TexCoord0) : SV_Target    {        float4 Ret;         if (TestValueFromLight(g_DirLight0, CurPos, Normal) > 0.5)        {            MyMaterial.Perturb(CurPos, Normal, TexCoord);        }         Ret.xyz = MyMaterial.CalculateLitColor(CurPos, Normal, TexCoord);        Ret.w = 1;         return Ret;    }

7.19.7.2 IL - Complex Example

7.19.7.2 IL - 复杂示例****

   // This pointers are a four-element vector with indices for    // which constant buffer holds the instance data (.x element),    // the base offset of the instance data in the instance constant    // buffer, the base texture index and the base sampler index.    // Basic instance members will therefore be referenced with    // cb[r0.x][r0.y + member_offset].    // This pointers can be in arrays so the first [] index    // can also have a register to indicate array access.    //     //    // For this example assume that globals are put in cbuffers    // in the following order.  Entries are offset:size in    // register (four-component) units.    //    // cb0:    //     0:1 - g_NumLights.    //     1:4 - g_LightsInUse.    //     5:1 - g_Ambient0.    //     6:2 - g_DirLight0.    //     8:2 - g_DirLight1.    //    10:2 - g_DirLight2.    //    12:2 - g_DirLight3.    //    14:2 - g_DirLight4.    //    16:2 - g_DirLight5.    //    18:2 - g_DirLight6.    //    20:2 - g_DirLight7.    //    22:1 - g_FlatMat0.    //    23:1 - g_TexMat0.    //    // g_StrangeMat0 takes no space.    //    // interfaces:    //     0:1 - MyMaterial.    //     1:9 - g_Lights.    //    // textures:    //     0:1 - g_TexMat0.    //    // samplers:    //     0:1 - g_TexMat0.    //    // The this pointers for the concrete objects would then be:    // g_Ambient0:    { 0,  5, -, - }    // g_DirLight0:   { 0,  6, -, - }    // g_DirLight1:   { 0,  8, -, - }    // g_DirLight2:   { 0, 10, -, - }    // g_DirLight3:   { 0, 12, -, - }    // g_DirLight4:   { 0, 14, -, - }    // g_DirLight5:   { 0, 16, -, - }    // g_DirLight6:   { 0, 18, -, - }    // g_DirLight7:   { 0, 20, -, - }    // g_FlatMat0:    { 0, 22, -, - }    // g_TexMat0:     { 0, 23, 0, 0 }    // g_StrangeMat0: { -,  -, -, - }    //     //    // Function bodies are declared explicitly so    // that it’s known in advance which bodies exist    // and how many bodies there are overall.    //     dcl_function_body fb0    dcl_function_body fb1    dcl_function_body fb2    dcl_function_body fb3    dcl_function_body fb4    dcl_function_body fb5    dcl_function_body fb6    dcl_function_body fb7    dcl_function_body fb8    dcl_function_body fb9    dcl_function_body fb10    dcl_function_body fb11     //    // Function tables work similarly to vtables for C++ except    // that a table has an entry per call site for an interface    // instead of per method.    //     // Function table for AmbientLight.    // One call site in AccumulateLighting multiplied by three calls of    // AccumulateLighting from CalculateLitColor.    dcl_function_table ft0 { fb3, fb6, fb9 }     // Function table for DirectionalLight.    // One call site in AccumulateLighting multiplied by three calls of    // AccumulateLighting from CalculateLitColor.    dcl_function_table ft1 { fb4, fb7, fb10 }     // Function table for FlatMaterial.    // One call to Perturb in main and one call to CalculateLitColor in main.    dcl_function_table ft2 { fb0, fb5 }     // Function table for TexturedMaterial.    // One call to Perturb in main and one call to CalculateLitColor in main.    dcl_function_table ft3 { fb1, fb8 }     // Function table for StrangeMaterial.    // One call to Perturb in main and one call to CalculateLitColor in main.    dcl_function_table ft4 { fb2, fb11 }     //    // Function table pointers.  Each of these needs to bound before    // the shader is usable.  The idea is that binding gives    // a reference to one of the function tables above so that    // the method slots can be filled in.    // The compiler will not generate pointers for unreferenced objects.    //    // A function table pointer has a full set of method slots to    // avoid the extra level of indirection that a C++ pointer-to-    // pointer-to-vtable representation would require (that would also    // require that this pointers be 5-tuples).  In the HLSL virtual    // inlining model it's always known what global variable/input is    // used for a call so we can set up tables per root object.    //    // Function pointer decls indicate which function tables are    // legal to use with them.  This also allows derivation of    // method correlation information.    //    // The first [] of an interface decl is the array size.    // If dynamic indexing is used the decl will indicate    // that, as shown below.  An array of interface pointers can    // be indexed statically also, it isn’t required that    // arrays of interface pointers mean dynamic indexing.    //    // Numbering of interface pointers takes array size into    // account, so the first pointer after a four entry    // array fp6[4][1] would be fp10.    //    // The second [] of an interface decl is the number    // of call sites, which must match the number of bodies in    // each table referenced in the decl.    //     // main's MyMaterial parameter.    dcl_interface fp0[1][2] = { ft2, ft3, ft4 };     // g_Lights entries.    dcl_interface_dynamicindexed fp1[9][3] = { ft0, ft1 };     // main routine.     // TestValueFromLight is a regular routine and is inlined.    // The Calculate reference inside of it is passed the concrete    // instance DirLight0 so it is devirtualized and inlined.    dp3_sat r0.x, v1.xyzx, -cb0[6].xyzx    mul r0.yz, r0.xxxx, cb0[7].xxyx    add r0.y, r0.z, r0.y    mad_sat r0.x, cb0[7].z, r0.x, r0.y     // The return of TestValueFromLight is tested.    lt r0.x, l(0.500000), r0.x    if_nz r0.x       // The call to Perturb is a full fcall      fcall fp0[0][0]      mov r2.xyz, r0.xyzx      mov r0.x, r0.w      mov r0.y, r1.x     else       mov r2.xyz, v1.xyzx      mov r0.xy, v2.xyxx     endif     // The call to CalculateLitColor is a full fcall.    fcall fp0[0][1]     mov o0.xyz, r1.xyzx    mov o0.w, l(1.000000)    ret     //    // Function bodies.    //     // FlatMaterial version of main's call to Perturb.    label fb0    mov r0.xyz, v1.xyzx    mov r0.w, v2.y    mov r1.x, v2.x    ret     // TexturedMaterial version of main's call to Perturb.    label fb1    mov r0.xyz, v1.xyzx    mov r0.w, v2.x    mov r1.x, v2.y    ret     // StrangeMaterial version of main's call to Perturb.    // NOTE: Position is not used later so the compiler has killed    // the update to Position from this body.    label fb2    mov r0.xyz, v1.xyzx    mov r0.w, v2.x    mov r1.x, v2.y    ret     // AmbientLight version of FlatMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    // NOTE: the Calculate bodies all look superficially    // identical but all are different.  In one case    // the array index is r1 and the return value is r4,    // in one case the array index is r1 and the return value    // is r5 and in the last case the array index is in r0    // and the return is in r5.  Bodies are not interchangeable.    label fb3    // Array index is r1, return is r4.    mov r2.w, this[r1.w + 1].y    mov r1.w, this[r1.w + 1].x    mov r4.xyz, cb[r1.w + 0][r2.w + 0].xyzx    ret     // DirectionalLight version of FlatMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    label fb4    // Array index is r1, return is r4.    mov r2.w, this[r1.w + 1].y    mov r3.w, this[r1.w + 1].x    mov r4.w, this[r1.w + 1].y    mov r5.x, this[r1.w + 1].x    dp3_sat r4.w, r2.xyzx, -cb[r5.x + 0][r4.w + 0].xyzx    mul r5.xyz, r4.wwww, cb[r3.w + 0][r2.w + 1].xyzx    mov r4.xyz, r5.xyzx    ret     // FlatMaterial version of main's call to CalculateLitColor.    label fb5     // AccumulateLighting is inlined.    mov r3.xyz, l(0,0,0,0)    mov r0.w, l(0)     loop      // g_NumLights is cb0[0].      uge r1.w, r0.w, cb0[0].x      breakc_nz r1.w       // Get g_Lights[g_LightsInUse[i]].      // g_LightsInUse is cb0[1-4].      // g_Lights is cb0[5-13].      mov r1.w, cb0[r0.w + 1].x       // Call Calculate.  Array index is r1.      fcall fp1[r1.w + 0][0]       // Return is expected in r4.      mov r0.xyz, r4.xyzx      add r3.xyz, r3.xyzx, r0.xyzx      iadd r0.w, r0.w, l(1)    endloop     // Multiply times color.    mov r0.xy, this[0].yxyy    mul r0.xyz, r3.xyzx, cb[r0.y + 0][r0.x + 0].xyzx    mov r1.xyz, r0.xyzx    ret     // AmbientLight version of TexturedMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    label fb6    // Array index is r1, return is r5.    mov r2.w, this[r1.w + 1].y    mov r1.w, this[r1.w + 1].x    mov r5.xyz, cb[r1.w + 0][r2.w + 0].xyzx    ret     // DirectionalLight version of TexturedMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    label fb7    // Array index is r1, return is r5.    mov r2.w, this[r1.w + 1].y    mov r3.w, this[r1.w + 1].x    mov r4.w, this[r1.w + 1].y    mov r5.w, this[r1.w + 1].x    dp3_sat r4.w, r2.xyzx, -cb[r5.w + 0][r4.w + 0].xyzx    mul r6.xyz, r4.wwww, cb[r3.w + 0][r2.w + 1].xyzx    mov r5.xyz, r6.xyzx    ret     // TexturedMaterial version of main's call to CalculateLitColor.    label fb8     // Texture sample.    mov r4.xy, this[0].zw    sample r0.xyz, v2.xy, t[r4.x].xyz, s[r4.y]    mul r0.xyz, r0.xyzx, l(0.123400, 0.123400, 0.123400, 0.000000)     // m_Color multiplied by texture sample.    mov r0.w, this[0].y    mov r1.w, this[0].x    mul r0.xyz, r0.xyzx, cb[r1.w + 0][r0.w + 0].xyzx     // AccumulateLighting is inlined.    mov r4.xyz, l(0,0,0,0)    mov r0.w, l(0)    loop      // g_NumLights is cb0[0].      uge r1.w, r0.w, cb0[0].x      breakc_nz r1.w       // Get g_Lights[g_LightsInUse[i]].      // g_LightsInUse is cb0[1-4].      // g_Lights is cb0[5-13].      mov r1.w, cb0[r0.w + 1].x       // Call Calculate.  Array index is in r1.      fcall fp1[r1.w + 0][1]       // Return is expected in r5.      mov r3.xyz, r5.xyzx      add r4.xyz, r4.xyzx, r3.xyzx      iadd r0.w, r0.w, l(1)    endloop     // Multiply accumulated color times texture color.    mul r0.xyz, r0.xyzx, r4.xyzx    mov r1.xyz, r0.xyzx    ret     // AmbientLight version of StrangeMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    label fb9    // Array index is r0, return is r5.    mov r1.w, this[r0.w + 1].y    mov r0.w, this[r0.w + 1].x    mov r5.xyz, cb[r0.w + 0][r1.w + 0].xyzx    ret     // DirectionalLight version of StrangeMaterial.CalculateLitColor-calls-    // AccumulateLighting's call to Calculate.    label fb10    // Array index is r0, return is r5.    mov r1.w, this[r0.w + 1].y    mov r2.w, this[r0.w + 1].x    mov r3.w, this[r0.w + 1].y    mov r4.w, this[r0.w + 1].x    dp3_sat r3.w, r2.xyzx, -cb[r4.w + 0][r3.w + 0].xyzx    mul r6.xyz, r3.wwww, cb[r2.w + 0][r1.w + 1].xyzx    mov r5.xyz, r6.xyzx    ret     // StrangeMaterial version of main's call to CalculateLitColor.    label fb11     // AccumulateLighting is inlined.    mov r4.xyz, l(0,0,0,0)    mov r0.z, l(0)     loop      // g_NumLights is cb0[0].x.      uge r0.w, r0.z, cb0[0].x      breakc_nz r0.w       // Get g_Lights[g_LightsInUse[i]].      // g_LightsInUse is cb0[1-4].      // g_Lights is cb0[5-13].      mov r0.w, cb0[r0.z + 1].x       // Call Calculate.  Array index is in r0.      fcall fp1[r0.w + 0][2]       // Return is in r5.      mov r3.xyz, r5.xyzx      add r4.xyz, r4.xyzx, r3.xyzx      iadd r0.z, r0.z, l(1)    endloop    mov r1.xyz, r4.xyzx    ret

 

7.19.7.3 API - Complex Example

7.19.7.3 API - 复杂示例****

// create a class library to hold class instance data    pDevice->CreateClassLinkage(&pMyClassTable);     // create the shader and supply a class library to add class instance data    pDevice->        CreatePixelShader(pMyCompiledPixelShader, pMyClassLinkage, &pMyPS);     // use reflection to get where data should be stored in interface array    NumInterfaces = pMyPSReflection->GetNumInterfaces();    pMyLightsVar = pMyPSReflection->GetVariableByName("g_Lights");    iLightOffset = pMyLightsVar->GetInterfaceSlot(0);    pMyMaterialVar = pMyPSReflection->GetVariableByName("$MyMaterial");    iMatOffset = pMyPSReflection->GetInterfaceSlot(0);     // Use class library to get references to all class instances    //   needed in the shader.    pMyClassTable->GetClassInstance("g_Ambient0", 0, &pAmbient0);    pMyClassTable->GetClassInstance("g_DirLight0", &pDirLight[0]);    pMyClassTable->GetClassInstance("g_DirLight1", &pDirLight[1]);    pMyClassTable->GetClassInstance("g_DirLight2", &pDirLight[2]);    pMyClassTable->GetClassInstance("g_DirLight3", &pDirLight[3]);    pMyClassTable->GetClassInstance("g_DirLight4", &pDirLight[4]);    pMyClassTable->GetClassInstance("g_DirLight5", &pDirLight[5]);    pMyClassTable->GetClassInstance("g_DirLight6", &pDirLight[6]);    pMyClassTable->GetClassInstance("g_DirLight7", &pDirLight[7]);    pMyClassTable->GetClassInstance("g_FlatMat0", &pFlatMat0);    pMyClassTable->GetClassInstance("g_TexMat0", &pTexMat0);    pMyClassTable->GetClassInstance("g_StrangeMat0", &pStrangeMat0);     // sets lights in array - they do not change only indices to them do    pMyInterfaceArray[iLightOffset] = pAmbient0;    for (uint i = 0; i < 8; i++)    {        pMyInterfaceArray[iLightOffset + i + 1] = pDirLight[i];    }     while (true)    {        if (bFlatSunlightOnly)        {            // Set g_NumLights to 1 in constant buffer.            // Set g_LightsInUse[0] to 1 in constant buffer.            pMyInterfaceArray[iMatOffset] = pFlatMat0;        }        else if (bStrangeMaterials)        {            // Set g_NumLights and fill out g_LightsInUse.            pMyInterfaceArray[iMatOffset] = pStrangeMat0;        }        else        {            // Set g_NumLights and fill out g_LightsInUse.            pMyInterfaceArray[iMatOffset] = pTexMat0;        }        // Set the pixel shader and the interfaces to until the next bind call        pDevice->PSSetShader(pMyPS, pMyInterfaceArray, NumInterfaces);         // Use the shader that was just bound to draw something        RenderScene();    }