www.tomshardware.com/features/in…
- The new approach also enables programmers to specify that certain threads are used in a certain manner through an expansion of the PowerThrottling API, which allows developers to assign a QoS attribute to their threads. Additionally, a new EcoQos classification tags threads that respond best on the efficiency cores to assure they are prioritized to execute on the E-Cores.
- Microsoft says that the Edge browser and 'various' Windows 11 components now take advantage of the EcoQos classification system.
- Additionally, Thread Director can also detect the instruction mix (scalar/vector) used in any given thread at a nanosecond granularity, and then communicate that data to the Windows 11 scheduler so the thread can be steered to the correct execution core, be that a high-performance P-Core or an efficient E-Core.Typically, vector/AI workloads will be prioritized to performance cores while scalar instructions and background tasks are moved to efficiency cores.
www.phoronix.com/news/Intel-…
- The Linux driver for Intel HFI was submitted today as part of the thermal updates for Linux 5.18.
- Each capability is given as a unit-less quantity in the range [0-255].
www.intel.com/content/dam…
- Thread specific hardware support is enumerated via the CPUID instruction and enabled by the OS via writing to configuration MSRs.
- Developers can also opt in threads to run on Efficient cores at efficient frequencies to optimize power and performance. This action is done by using QoS APIs to define the PowerThrottling of the thread.
- Hard processor affinities can disrupt OS decisions. While the affinity might be used to guide threads towards performance or efficiency, given performance/efficiency are dynamic capabilities and not core type based. Therefore hard affinities may not yield desired results.
- WINDOWS CORE PARKING ENGINE has two level setting.
- PERFORMANCE STATE CONTROL ENGINE: Intel® Speed Shift Technology (i.e., HWP) is leveraged by Windows to indicate performance constraints on threads that need to run at efficient performance levels, or tune energy performance preference on different slider positions to meet system wide performance vs. battery life goals.
- E-cores are more performant than the SMT sibling of a busy core.
www.intel.com/content/www…
- 每个Pcore 独享一个L2的cache,每四个Ecore共享一个L2 cache
- AVX512 在Pcore上默认被禁用,在Ecore上不支持。
- In a hybrid CPU, the old assumptions no longer hold true, and developers must fully enumerate the available logical processors on a system to determine the power and performance characteristics of each logical processor.
- GetLogicalProcessorInformation and GetSystemCPUSetInformation, that allow you to fully enumerate logical processors.
- GetLogicalProcessorInformation 能知道共享的资源,包括SMT的Pcore,Cache,
- GetSystemCPUSetInformation 更有优势,他会报告group信息,以及effiencyClass的ratio,但不会报告cache拓扑结构,可以与上述函数混用。
- Binning(装箱/组合) 逻辑处理器,将同一个类别,或者共享cache的内存的逻辑处理器。
- 只有windows11支持thread director
- 线程调度应该由上到下,app提供OS更多调度的信息,而不是由下到上。
- 线程优先级的设置的目的,1.告诉os调度的频率,2.放到更强的core thread affinity的设置,由弱到强的api
- SetThreadIdealProcessor() # 尽量但不保证
- SetThreadPriority() # 尽量但不保证,可以设置调度频率以及对应时长,可以设置动态优先级进行boost
- SetThreadInformation() # 设置现成的power控制,节省power,让其运行在Ecore上,此外还能控制cache的优先级
- SetThreadSelectedCPUSets() # 设置线程选择cpu的集合。
- SetThreadAffinityMask(). # 这个会强制指定,不管windows以及itd的设置
- 建议一开始就设置好affinity,中途不要改。
- 设置线程策略的时候,考虑到上下文的(强制)切换。
- 可以通过Logical Topologies,设置一张表,更好地规划线程的执行。
- Best Senario就是设置两个线程池,一个是Pcore,一个是Ecore的
lore.kernel.org/lkml/202209…
- 根据不同的workload的IPC,将程序分成不同的class,IPC越大,class越大。
- load balancer会发现进程中高级指令的使用度,并且会把高级指令使用多的程序分配到IPC高的核心。
- 硬件去负责分割(定义) classes of tasks。(Hardware is free to partition its instruction set into an arbitrary number of classes. It must provide a mechanism identify the class of the currently running task and inform the kernel about the performance of each class of task on each type of CPU.)
- linux kernel在程序运行的时候,会去读取硬件标记的class,并且记录到当前的结构体。
- 新任务在开始的时候都是默认分类,在程序运行的时候,hardware monitor会将其分类。
- 如果任务一直在sleep,可能就没法分类.
- asymmetric指的是 不同的IPC对于不同的类型的core.
www.youtube.com/watch?v=M8M…
- 一个独立的Ecore是比busy的smt sibling好,但是在4.10的itmt中,却总优先给Pcore安排任务,这个策略不好(4.10),在5.16中修复了。
- P core不仅在频率上比 E core上高,而且IPC也高(Pcore/Ecore=1.27)
- HFI中Performance socore的计算方法 ratioGHz, 例如(1.274.4),(1.0*3.0).E core的没说。
- ITD会更加细化,不同的指令在P/E之间的差距很多,也就是P/E不是一个恒定值。(SSE:1.27,AVX2:1.5,VNNI:2.0,PAUSE:1.0),Score会按照不同的指令集打分。
www.intel.com/content/www…
HARDWARE FEEDBACK INTERFACE AND INTEL® THREAD DIRECTOR
The Intel Thread Director table structure extends the Hardware Feedback Interface table structure without breaking backward compatibility. The Hardware Feedback Interface can be viewed as having two capabilities and a single class.
cpuid
- input : EAX,ECX(sometimes)
- output: EAX,EBX,ECX,EDX
- CPUID通过EAX=06H,去检查是power managerment,其中EAX的Bit19标记了是否支持HFI,其中EDX里面标记了是否支持performance reporting以及energy efficiency reporting. 此外Bit23 标记了是否支持thread director
- From 3-218 Vol. 2A
SMP: symetric multi processor SMT: simultimously multi threading
HFI
- 通过CPUID.EAX=06H去检查是否支持HFI
- 创建一个HFI的table,并把物理地址写入到MSR寄存器(MSR_IA32_HW_FEEDBACK_PTR)
- 通过写入MSR寄存器(MSR_IA32_HW_FEEDBACK_CONFIG),启动HFI的功能。
- 收到状态更新通知后,把table拷贝到本地,然后将MSR寄存器(MSR_IA32_PACKAGE_THERM_STATUS)的26置零,让其去继续去更新,重复此操作。
- OS根据table里面的值安排相应的策略。
Intel PowerManagerment
- HWP: hardware-controlled performance(=Speed shift)对于当前的工作配置(电源策略),调整到更好的电压与频率。
- HFI/ITD: 用于线程调度,将线程调度到合适的CPU上。
- Intel Turbo Boost Technology(睿频): 自动超频
- ITMT: Intel Turbo Boost Max Technology 3.0:找到体质(favor core ratio)最好的CPU,对不同的体质,执行不同的ratio,而非统一的ratio