NumPy 源码解析(五十三)
.\numpy\numpy\_core\src\common\mem_overlap.c
/*
Solving memory overlap integer programs and bounded Diophantine equations with
positive coefficients.
Asking whether two strided arrays `a` and `b` overlap is equivalent to
asking whether there is a solution to the following problem::
sum(stride_a[i] * x_a[i] for i in range(ndim_a))
-
sum(stride_b[i] * x_b[i] for i in range(ndim_b))
==
base_b - base_a
0 <= x_a[i] < shape_a[i]
0 <= x_b[i] < shape_b[i]
for some integer x_a, x_b. Itemsize needs to be considered as an additional
dimension with stride 1 and size itemsize.
Negative strides can be changed to positive (and vice versa) by changing
variables x[i] -> shape[i] - 1 - x[i], and zero strides can be dropped, so
that the problem can be recast into a bounded Diophantine equation with
positive coefficients::
sum(a[i] * x[i] for i in range(n)) == b
a[i] > 0
0 <= x[i] <= ub[i]
This problem is NP-hard --- runtime of algorithms grows exponentially with
increasing ndim.
*Algorithm description*
A straightforward algorithm that excludes infeasible solutions using GCD-based
pruning is outlined in Ref. [1]. It is implemented below. A number of other
algorithms exist in the literature; however, this one seems to have
performance satisfactory for the present purpose.
The idea is that an equation::
a_1 x_1 + a_2 x_2 + ... + a_n x_n = b
0 <= x_i <= ub_i, i = 1...n
implies::
a_2' x_2' + a_3 x_3 + ... + a_n x_n = b
0 <= x_i <= ub_i, i = 2...n
0 <= x_1' <= c_1 ub_1 + c_2 ub_2
with a_2' = gcd(a_1, a_2) and x_2' = c_1 x_1 + c_2 x_2 with c_1 = (a_1/a_1'),
and c_2 = (a_2/a_1'). This procedure can be repeated to obtain::
a_{n-1}' x_{n-1}' + a_n x_n = b
0 <= x_{n-1}' <= ub_{n-1}'
0 <= x_n <= ub_n
Now, one can enumerate all candidate solutions for x_n. For each, one can use
the previous-level equation to enumerate potential solutions for x_{n-1}, with
transformed right-hand side b -> b - a_n x_n. And so forth, until after n-1
nested for loops we either arrive at a candidate solution for x_1 (in which
case we have found one solution to the problem), or find that the equations do
not allow any solutions either for x_1 or one of the intermediate x_i (in
which case we have proved there is no solution for the upper-level candidates
chosen). If no solution is found for any candidate x_n, we have proved the
problem is infeasible --- which for the memory overlap problem means there is
no overlap.
*Performance*
Some common ndarray cases are easy for the algorithm:
- Two arrays whose memory ranges do not overlap.
These will be excluded by the bounds on x_n, with max_work=1. We also add
this check as a fast path, to avoid computing GCDs needlessly, as this can
take some time.
- Arrays produced by continuous slicing of a continuous parent array (no
*/
*Integer overflows*
# 算法使用固定宽度整数编写,如果检测到整数溢出,可能会以失败结束(实现中捕获所有情况)。潜在的失败模式:
- Array extent sum(stride*(shape-1)) is too large (for int64).
# 数组的范围和 sum(stride*(shape-1)) 太大(对于 int64)。
- Minimal solutions to a_i x_i + a_j x_j == b are too large,
# 最小解 a_i x_i + a_j x_j == b 太大,
in some of the intermediate equations.
# 这段文字描述了算法中某些中间方程的使用情况。
We do this part of the computation in 128-bit integers.
# 在这部分计算中,我们使用128位整数。
In general, overflows are expected only if array size is close to
NPY_INT64_MAX, requiring ~exabyte size arrays, which is usually not possible.
# 通常情况下,只有在数组大小接近NPY_INT64_MAX时才会发生溢出,这需要大约百亿亿字节大小的数组,这通常是不可能的。
References
----------
.. [1] P. Ramachandran, ''Use of Extended Euclidean Algorithm in Solving
a System of Linear Diophantine Equations with Bounded Variables''.
Algorithmic Number Theory, Lecture Notes in Computer Science **4076**,
182-192 (2006). doi:10.1007/11792086_14
# 参考文献1:Ramachandran在《算法数论》中介绍了扩展欧几里得算法在解有界变量线性丢番图方程组中的应用。
.. [2] Cornuejols, Urbaniak, Weismantel, and Wolsey,
''Decomposition of integer programs and of generating sets.'',
Lecture Notes in Computer Science 1284, 92-103 (1997).
# 参考文献2:Cornuejols等人在《计算机科学讲义》中讨论了整数程序和生成集的分解。
.. [3] K. Aardal, A.K. Lenstra,
''Hard equality constrained integer knapsacks'',
Lecture Notes in Computer Science 2337, 350-366 (2002).
# 参考文献3:Aardal和Lenstra在《计算机科学讲义》中探讨了硬等式约束整数背包问题。
/*
Copyright (c) 2015 Pauli Virtanen
All rights reserved.
Licensed under 3-clause BSD license, see LICENSE.txt.
*/
/* 设置 NumPy 的 API 版本,禁用过时的 API */
#define NPY_NO_DEPRECATED_API NPY_API_VERSION
/* 清理 Py_ssize_t 宏定义,以支持最新的 Python 对象 API */
#define PY_SSIZE_T_CLEAN
#include <Python.h>
/* 引入 NumPy 的头文件 */
#include "numpy/ndarrayobject.h"
/* 引入自定义的内存重叠检测头文件 */
#include "mem_overlap.h"
/* 引入处理扩展整数 128 位的头文件 */
#include "npy_extint128.h"
/* 引入标准库头文件 */
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
/* 定义取最大值的宏函数 */
#define MAX(a, b) (((a) >= (b)) ? (a) : (b))
/* 定义取最小值的宏函数 */
#define MIN(a, b) (((a) <= (b)) ? (a) : (b))
/**
* 欧几里得算法求最大公约数 (GCD) 的函数
*
* 解决方程 gamma*a1 + epsilon*a2 == gcd(a1, a2)
* 其中 |gamma| < |a2|/gcd, |epsilon| < |a1|/gcd.
*/
static void
euclid(npy_int64 a1, npy_int64 a2, npy_int64 *a_gcd, npy_int64 *gamma, npy_int64 *epsilon)
{
npy_int64 gamma1, gamma2, epsilon1, epsilon2, r;
assert(a1 > 0);
assert(a2 > 0);
gamma1 = 1;
gamma2 = 0;
epsilon1 = 0;
epsilon2 = 1;
/* 在迭代过程中,a1 和 a2 保持在 |a1|, |a2| 的界限内,因此没有整数溢出 */
while (1) {
if (a2 > 0) {
r = a1/a2;
a1 -= r*a2;
gamma1 -= r*gamma2;
epsilon1 -= r*epsilon2;
}
else {
*a_gcd = a1;
*gamma = gamma1;
*epsilon = epsilon1;
break;
}
if (a1 > 0) {
r = a2/a1;
a2 -= r*a1;
gamma2 -= r*gamma1;
epsilon2 -= r*epsilon1;
}
else {
*a_gcd = a2;
*gamma = gamma2;
*epsilon = epsilon2;
break;
}
}
}
/**
* 预计算最大公约数 (GCD) 和边界转换的函数
*/
static int
diophantine_precompute(unsigned int n,
diophantine_term_t *E,
diophantine_term_t *Ep,
npy_int64 *Gamma, npy_int64 *Epsilon)
{
npy_int64 a_gcd, gamma, epsilon, c1, c2;
unsigned int j;
char overflow = 0;
assert(n >= 2);
/* 使用欧几里得算法计算第一和第二项的最大公约数和相应的 gamma, epsilon */
euclid(E[0].a, E[1].a, &a_gcd, &gamma, &epsilon);
Ep[0].a = a_gcd;
Gamma[0] = gamma;
Epsilon[0] = epsilon;
if (n > 2) {
c1 = E[0].a / a_gcd;
c2 = E[1].a / a_gcd;
/* 计算 Ep[0].ub = E[0].ub * c1 + E[1].ub * c2 的安全加法 */
Ep[0].ub = safe_add(safe_mul(E[0].ub, c1, &overflow),
safe_mul(E[1].ub, c2, &overflow), &overflow);
if (overflow) {
return 1; /* 溢出情况,返回错误 */
}
}
/* 循环计算剩余项的最大公约数和相应的 gamma, epsilon */
for (j = 2; j < n; ++j) {
euclid(Ep[j-2].a, E[j].a, &a_gcd, &gamma, &epsilon);
Ep[j-1].a = a_gcd;
Gamma[j-1] = gamma;
Epsilon[j-1] = epsilon;
if (j < n - 1) {
c1 = Ep[j-2].a / a_gcd;
c2 = E[j].a / a_gcd;
/* 计算 Ep[j-1].ub = c1 * Ep[j-2].ub + c2 * E[j].ub 的安全加法 */
Ep[j-1].ub = safe_add(safe_mul(c1, Ep[j-2].ub, &overflow),
safe_mul(c2, E[j].ub, &overflow), &overflow);
if (overflow) {
return 1; /* 溢出情况,返回错误 */
}
}
}
return 0; /* 所有项计算完成,无溢出情况,返回成功 */
}
/**
* Depth-first bounded Euclid search
*/
static mem_overlap_t
diophantine_dfs(unsigned int n,
unsigned int v,
diophantine_term_t *E,
diophantine_term_t *Ep,
npy_int64 *Gamma, npy_int64 *Epsilon,
npy_int64 b,
Py_ssize_t max_work,
int require_ub_nontrivial,
npy_int64 *x,
Py_ssize_t *count)
{
npy_int64 a_gcd, gamma, epsilon, a1, u1, a2, u2, c, r, c1, c2, t, t_l, t_u, b2, x1, x2;
npy_extint128_t x10, x20, t_l1, t_l2, t_u1, t_u2;
mem_overlap_t res;
char overflow = 0;
if (max_work >= 0 && *count >= max_work) {
return MEM_OVERLAP_TOO_HARD;
}
/* Fetch precomputed values for the reduced problem */
// 根据问题的减少,获取预先计算的值
if (v == 1) {
a1 = E[0].a;
u1 = E[0].ub;
}
else {
a1 = Ep[v-2].a;
u1 = Ep[v-2].ub;
}
a2 = E[v].a;
u2 = E[v].ub;
a_gcd = Ep[v-1].a;
gamma = Gamma[v-1];
epsilon = Epsilon[v-1];
/* Generate set of allowed solutions */
// 生成允许的解集合
c = b / a_gcd;
r = b % a_gcd;
if (r != 0) {
++*count;
return MEM_OVERLAP_NO;
}
c1 = a2 / a_gcd;
c2 = a1 / a_gcd;
/*
The set to enumerate is:
x1 = gamma*c + c1*t
x2 = epsilon*c - c2*t
t integer
0 <= x1 <= u1
0 <= x2 <= u2
and we have c, c1, c2 >= 0
*/
// 枚举的集合为:
// x1 = gamma*c + c1*t
// x2 = epsilon*c - c2*t
// 其中 t 是整数
// 0 <= x1 <= u1
// 0 <= x2 <= u2
// 同时 c, c1, c2 >= 0
x10 = mul_64_64(gamma, c);
x20 = mul_64_64(epsilon, c);
t_l1 = ceildiv_128_64(neg_128(x10), c1);
t_l2 = ceildiv_128_64(sub_128(x20, to_128(u2), &overflow), c2);
t_u1 = floordiv_128_64(sub_128(to_128(u1), x10, &overflow), c1);
t_u2 = floordiv_128_64(x20, c2);
if (overflow) {
return MEM_OVERLAP_OVERFLOW;
}
if (gt_128(t_l2, t_l1)) {
t_l1 = t_l2;
}
if (gt_128(t_u1, t_u2)) {
t_u1 = t_u2;
}
if (gt_128(t_l1, t_u1)) {
++*count;
return MEM_OVERLAP_NO;
}
t_l = to_64(t_l1, &overflow);
t_u = to_64(t_u1, &overflow);
x10 = add_128(x10, mul_64_64(c1, t_l), &overflow);
x20 = sub_128(x20, mul_64_64(c2, t_l), &overflow);
t_u = safe_sub(t_u, t_l, &overflow);
t_l = 0;
x1 = to_64(x10, &overflow);
x2 = to_64(x20, &overflow);
if (overflow) {
return MEM_OVERLAP_OVERFLOW;
}
/* The bounds t_l, t_u ensure the x computed below do not overflow */
// t_l, t_u 的边界确保下面计算的 x 不会溢出
if (v == 1) {
/* 如果当前深度 v 等于 1,表示到达递归的基本情况 */
/* Base case */
if (t_u >= t_l) {
/* 如果上界 t_u 大于等于下界 t_l */
/* Calculate x[0] and x[1] based on linear equations */
x[0] = x1 + c1*t_l;
x[1] = x2 - c2*t_l;
/* 如果需要检查上界是否为非平凡解 */
if (require_ub_nontrivial) {
unsigned int j;
int is_ub_trivial;
is_ub_trivial = 1;
/* 检查每个变量是否满足上界的一半 */
for (j = 0; j < n; ++j) {
if (x[j] != E[j].ub/2) {
is_ub_trivial = 0;
break;
}
}
/* 如果上界为平凡解,则忽略 */
if (is_ub_trivial) {
++*count;
return MEM_OVERLAP_NO;
}
}
/* 返回存在内存重叠的标志 */
return MEM_OVERLAP_YES;
}
/* 增加计数并返回内存未重叠的标志 */
++*count;
return MEM_OVERLAP_NO;
}
else {
/* 如果当前深度 v 大于 1,递归到所有可能的候选解 */
/* Recurse to all candidates */
for (t = t_l; t <= t_u; ++t) {
/* 计算当前变量 x[v] 的值 */
x[v] = x2 - c2*t;
/* 计算剩余的线性方程式的右侧值 b2 */
b2 = safe_sub(b, safe_mul(a2, x[v], &overflow), &overflow);
/* 如果计算溢出 */
if (overflow) {
return MEM_OVERLAP_OVERFLOW;
}
/* 递归调用解决剩余变量的线性方程组 */
res = diophantine_dfs(n, v-1, E, Ep, Gamma, Epsilon,
b2, max_work, require_ub_nontrivial,
x, count);
/* 如果找到内存重叠的解,则返回结果 */
if (res != MEM_OVERLAP_NO) {
return res;
}
}
/* 增加计数并返回内存未重叠的标志 */
++*count;
return MEM_OVERLAP_NO;
}
/**
* 解决有界丢番图方程
*
* 考虑的问题是:
* A[0] x[0] + A[1] x[1] + ... + A[n-1] x[n-1] == b
* 0 <= x[i] <= U[i]
* A[i] > 0
*
* 使用深度优先的欧几里德算法解决,如[1]中所述。
*
* 如果 require_ub_nontrivial!=0,则寻找满足以下条件的解:
* 当 b = A[0]*(U[0]/2) + ... + A[n-1]*(U[n-1]/2),但忽略 x[i] = U[i]/2 的平凡解。
* 所有的 U[i] 必须能被 2 整除。在这种情况下,给定的 b 值会被忽略。
*/
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_diophantine(unsigned int n, diophantine_term_t *E, npy_int64 b,
Py_ssize_t max_work, int require_ub_nontrivial, npy_int64 *x)
{
mem_overlap_t res;
unsigned int j;
// 检查每个项的系数和上界
for (j = 0; j < n; ++j) {
if (E[j].a <= 0) {
return MEM_OVERLAP_ERROR; // 如果系数小于等于0,则返回错误状态
}
else if (E[j].ub < 0) {
return MEM_OVERLAP_NO; // 如果上界小于0,则返回无重叠状态
}
}
// 如果需要非平凡的上界解
if (require_ub_nontrivial) {
npy_int64 ub_sum = 0;
char overflow = 0;
// 检查所有项的上界是否能被2整除,并计算修正后的 b 值
for (j = 0; j < n; ++j) {
if (E[j].ub % 2 != 0) {
return MEM_OVERLAP_ERROR; // 如果某个上界不能被2整除,则返回错误状态
}
ub_sum = safe_add(ub_sum,
safe_mul(E[j].a, E[j].ub/2, &overflow),
&overflow); // 计算修正后的 b 值
}
if (overflow) {
return MEM_OVERLAP_ERROR; // 如果计算过程中发生溢出,则返回错误状态
}
b = ub_sum; // 更新 b 的值为修正后的值
}
// 如果 b 小于0,则返回无重叠状态
if (b < 0) {
return MEM_OVERLAP_NO;
}
// 对于没有变量的情况
if (n == 0) {
if (require_ub_nontrivial) {
/* 对于0个变量的情况只有平凡解 */
return MEM_OVERLAP_NO;
}
if (b == 0) {
return MEM_OVERLAP_YES; // 如果 b 为0,则返回重叠状态
}
return MEM_OVERLAP_NO; // 否则返回无重叠状态
}
// 对于只有一个变量的情况
else if (n == 1) {
if (require_ub_nontrivial) {
/* 对于1个变量的情况只有平凡解 */
return MEM_OVERLAP_NO;
}
if (b % E[0].a == 0) {
x[0] = b / E[0].a;
if (x[0] >= 0 && x[0] <= E[0].ub) {
return MEM_OVERLAP_YES; // 如果计算出的解在合理范围内,则返回重叠状态
}
}
return MEM_OVERLAP_NO; // 否则返回无重叠状态
}
// 对于多于一个变量的情况
else {
Py_ssize_t count = 0;
diophantine_term_t *Ep = NULL;
npy_int64 *Epsilon = NULL, *Gamma = NULL;
// 分配内存并检查分配情况
Ep = malloc(n * sizeof(diophantine_term_t));
Epsilon = malloc(n * sizeof(npy_int64));
Gamma = malloc(n * sizeof(npy_int64));
if (Ep == NULL || Epsilon == NULL || Gamma == NULL) {
res = MEM_OVERLAP_ERROR; // 如果内存分配失败,则返回错误状态
}
else if (diophantine_precompute(n, E, Ep, Gamma, Epsilon)) {
res = MEM_OVERLAP_OVERFLOW; // 如果预计算过程中发生溢出,则返回溢出状态
}
else {
// 进行深度优先搜索解方程
res = diophantine_dfs(n, n-1, E, Ep, Gamma, Epsilon, b, max_work,
require_ub_nontrivial, x, &count);
}
free(Ep); // 释放动态分配的内存
free(Gamma);
free(Epsilon);
return res; // 返回计算结果状态
}
}
static int
diophantine_sort_A(const void *xp, const void *yp)
{
npy_int64 xa = ((diophantine_term_t*)xp)->a;
// 比较函数,按照结构体中的 a 成员排序
// 用于在排序算法中对结构体数组进行排序
# 从指针 yp 强制转换为 diophantine_term_t 类型的指针,并取出其成员变量 a 的值
npy_int64 ya = ((diophantine_term_t*)yp)->a;
# 如果 xa 小于 ya,则返回 1,表示 xa 比 ya 小
if (xa < ya) {
return 1;
}
# 如果 ya 小于 xa,则返回 -1,表示 ya 比 xa 小
else if (ya < xa) {
return -1;
}
# 否则,返回 0,表示 xa 和 ya 相等
else {
return 0;
}
/**
* Simplify Diophantine decision problem.
*
* Combine identical coefficients, remove unnecessary variables, and trim
* bounds.
*
* The feasible/infeasible decision result is retained.
*
* Returns: 0 (success), -1 (integer overflow).
*/
NPY_VISIBILITY_HIDDEN int
diophantine_simplify(unsigned int *n, diophantine_term_t *E, npy_int64 b)
{
unsigned int i, j, m;
char overflow = 0;
/* Skip obviously infeasible cases */
for (j = 0; j < *n; ++j) {
if (E[j].ub < 0) {
return 0;
}
}
if (b < 0) {
return 0;
}
/* Sort vs. coefficients */
qsort(E, *n, sizeof(diophantine_term_t), diophantine_sort_A);
/* Combine identical coefficients */
m = *n;
i = 0;
for (j = 1; j < m; ++j) {
if (E[i].a == E[j].a) {
E[i].ub = safe_add(E[i].ub, E[j].ub, &overflow);
--*n;
}
else {
++i;
if (i != j) {
E[i] = E[j];
}
}
}
/* Trim bounds and remove unnecessary variables */
m = *n;
i = 0;
for (j = 0; j < m; ++j) {
E[j].ub = MIN(E[j].ub, b / E[j].a);
if (E[j].ub == 0) {
/* If the problem is feasible at all, x[i]=0 */
--*n;
}
else {
if (i != j) {
E[i] = E[j];
}
++i;
}
}
if (overflow) {
return -1;
}
else {
return 0;
}
}
/**
* Gets a half-open range [start, end) of offsets from the data pointer
*/
NPY_VISIBILITY_HIDDEN void
offset_bounds_from_strides(const int itemsize, const int nd,
const npy_intp *dims, const npy_intp *strides,
npy_intp *lower_offset, npy_intp *upper_offset)
{
npy_intp max_axis_offset;
npy_intp lower = 0;
npy_intp upper = 0;
int i;
for (i = 0; i < nd; i++) {
if (dims[i] == 0) {
/* If the array size is zero, return an empty range */
*lower_offset = 0;
*upper_offset = 0;
return;
}
/* Expand either upwards or downwards depending on stride */
max_axis_offset = strides[i] * (dims[i] - 1);
if (max_axis_offset > 0) {
upper += max_axis_offset;
}
else {
lower += max_axis_offset;
}
}
/* Return a half-open range */
upper += itemsize;
*lower_offset = lower;
*upper_offset = upper;
}
/**
* Gets a half-open range [start, end) which contains the array data
*/
static void
get_array_memory_extents(PyArrayObject *arr,
npy_uintp *out_start, npy_uintp *out_end,
npy_uintp *num_bytes)
{
npy_intp low, upper;
int j;
offset_bounds_from_strides(PyArray_ITEMSIZE(arr), PyArray_NDIM(arr),
PyArray_DIMS(arr), PyArray_STRIDES(arr),
&low, &upper);
# 计算指向数组数据开始位置的指针
*out_start = (npy_uintp)PyArray_DATA(arr) + (npy_uintp)low;
# 计算指向数组数据结束位置的指针
*out_end = (npy_uintp)PyArray_DATA(arr) + (npy_uintp)upper;
# 计算数组每个元素的字节大小
*num_bytes = PyArray_ITEMSIZE(arr);
# 根据数组的维度信息,计算数组总共占用的字节数
for (j = 0; j < PyArray_NDIM(arr); ++j) {
*num_bytes *= PyArray_DIM(arr, j);
}
/**
* 将数组的步长转换为项集合。
*
* Args:
* arr: NumPy 数组对象指针
* terms: 存储转换后项的数组
* nterms: 项的数量,通过指针传递
* skip_empty: 是否跳过空数组维度的标志
*
* Returns:
* 0 表示成功,1 表示整数溢出
*
* 该函数根据数组的维度和步长信息,将每个维度的步长转换为项集合,存储在 terms 数组中。
* 如果 skip_empty 标志被设置且某维度的尺寸为 1 或步长为 0,则跳过该维度的处理。
* 对于步长为负数的情况,将其转换为正数处理,并检查是否存在整数溢出。
*/
static int
strides_to_terms(PyArrayObject *arr, diophantine_term_t *terms,
unsigned int *nterms, int skip_empty)
{
int i;
for (i = 0; i < PyArray_NDIM(arr); ++i) {
if (skip_empty) {
if (PyArray_DIM(arr, i) <= 1 || PyArray_STRIDE(arr, i) == 0) {
continue;
}
}
terms[*nterms].a = PyArray_STRIDE(arr, i);
if (terms[*nterms].a < 0) {
terms[*nterms].a = -terms[*nterms].a;
}
if (terms[*nterms].a < 0) {
/* 整数溢出 */
return 1;
}
terms[*nterms].ub = PyArray_DIM(arr, i) - 1;
++*nterms;
}
return 0;
}
/**
* 判断两个数组是否共享内存。
*
* Returns:
* 0 (不共享内存), 1 (共享内存), 或 < 0 (解决失败)
*
* Notes:
* 解决失败可能是由于整数溢出或解决问题所需工作量超过 max_work 导致。
* 该问题是 NP-难的,最坏情况下的运行时间与维度数量呈指数关系。
* max_work 控制处理的工作量,可以是精确的 (max_work == -1),
* 也可以仅仅是一个简单的内存范围检查 (max_work == 0),或者设置一个上限
* max_work > 0 用于考虑的解决方案候选数量。
*
* 函数的主要目的是检查两个数组的内存是否重叠。
*/
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_may_share_memory(PyArrayObject *a, PyArrayObject *b,
Py_ssize_t max_work)
{
npy_int64 rhs;
diophantine_term_t terms[2*NPY_MAXDIMS + 2];
npy_uintp start1 = 0, end1 = 0, size1 = 0;
npy_uintp start2 = 0, end2 = 0, size2 = 0;
npy_uintp uintp_rhs;
npy_int64 x[2*NPY_MAXDIMS + 2];
unsigned int nterms;
get_array_memory_extents(a, &start1, &end1, &size1);
get_array_memory_extents(b, &start2, &end2, &size2);
if (!(start1 < end2 && start2 < end1 && start1 < end1 && start2 < end2)) {
/* 内存范围不重叠 */
return MEM_OVERLAP_NO;
}
if (max_work == 0) {
/* 需要的工作量太大,放弃 */
return MEM_OVERLAP_TOO_HARD;
}
/* 将问题转换为具有正系数的丢番图方程形式。
由 offset_bounds_from_strides 计算的边界对应于所有正步长。
start1 + sum(abs(stride1)*x1)
== start2 + sum(abs(stride2)*x2)
== end1 - 1 - sum(abs(stride1)*x1')
== end2 - 1 - sum(abs(stride2)*x2')
<=>
sum(abs(stride1)*x1) + sum(abs(stride2)*x2')
== end2 - 1 - start1
OR
sum(abs(stride1)*x1') + sum(abs(stride2)*x2)
== end1 - 1 - start2
我们选择具有较小 RHS 的问题(由于上面的范围检查,它们都是非负的)。
*/
uintp_rhs = MIN(end2 - 1 - start1, end1 - 1 - start2);
if (uintp_rhs > NPY_MAX_INT64) {
/* 整数溢出 */
return MEM_OVERLAP_OVERFLOW;
}
rhs = (npy_int64)uintp_rhs;
nterms = 0;
# 如果数组 a 的步幅转换为对应的项失败,则返回内存重叠溢出错误码
if (strides_to_terms(a, terms, &nterms, 1)) {
return MEM_OVERLAP_OVERFLOW;
}
# 如果数组 b 的步幅转换为对应的项失败,则返回内存重叠溢出错误码
if (strides_to_terms(b, terms, &nterms, 1)) {
return MEM_OVERLAP_OVERFLOW;
}
# 如果数组 a 的元素字节大小大于 1
if (PyArray_ITEMSIZE(a) > 1) {
# 将项中的 a 设为 1
terms[nterms].a = 1;
# 将项中的 ub 设为数组 a 的元素字节大小减 1
terms[nterms].ub = PyArray_ITEMSIZE(a) - 1;
# 项的数量加一
++nterms;
}
# 如果数组 b 的元素字节大小大于 1
if (PyArray_ITEMSIZE(b) > 1) {
# 将项中的 a 设为 1
terms[nterms].a = 1;
# 将项中的 ub 设为数组 b 的元素字节大小减 1
terms[nterms].ub = PyArray_ITEMSIZE(b) - 1;
# 项的数量加一
++nterms;
}
""" 简化,如果可能 """
# 简化二次方程组,如果失败则返回内存重叠溢出错误码
if (diophantine_simplify(&nterms, terms, rhs)) {
""" 整数溢出 """
return MEM_OVERLAP_OVERFLOW;
}
""" 求解 """
# 调用函数解二次方程组并返回结果
return solve_diophantine(nterms, terms, rhs, max_work, 0, x);
/**
* Determine whether an array has internal overlap.
*
* Returns: 0 (no overlap), 1 (overlap), or < 0 (failed to solve).
*
* max_work and reasons for solver failures are as in solve_may_share_memory.
*/
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_may_have_internal_overlap(PyArrayObject *a, Py_ssize_t max_work)
{
// 定义用于解决的二次方程项和解向量
diophantine_term_t terms[NPY_MAXDIMS+1];
npy_int64 x[NPY_MAXDIMS+1];
unsigned int i, j, nterms;
// 检查数组是否是连续的,是的话快速返回无重叠
if (PyArray_ISCONTIGUOUS(a)) {
/* Quick case */
return MEM_OVERLAP_NO;
}
// 内存重叠问题是寻找两个不同的解决方案
// 初始化二次方程的项
nterms = 0;
if (strides_to_terms(a, terms, &nterms, 0)) {
// 如果转换 strides 到方程项时溢出,返回溢出错误
return MEM_OVERLAP_OVERFLOW;
}
if (PyArray_ITEMSIZE(a) > 1) {
// 如果数组元素大小大于1,添加额外的项来处理
terms[nterms].a = 1;
terms[nterms].ub = PyArray_ITEMSIZE(a) - 1;
++nterms;
}
// 清除零系数和空项
i = 0;
for (j = 0; j < nterms; ++j) {
if (terms[j].ub == 0) {
continue;
}
else if (terms[j].ub < 0) {
// 如果上界小于0,表示无重叠
return MEM_OVERLAP_NO;
}
else if (terms[j].a == 0) {
// 如果系数为0,表示有重叠
return MEM_OVERLAP_YES;
}
if (i != j) {
terms[i] = terms[j];
}
++i;
}
nterms = i;
// 扩展上界以处理内部重叠问题
for (j = 0; j < nterms; ++j) {
terms[j].ub *= 2;
}
// 根据系数排序;不能调用 diophantine_simplify,因为它可能改变决策问题的不等式部分
qsort(terms, nterms, sizeof(diophantine_term_t), diophantine_sort_A);
// 解决二次方程
return solve_diophantine(nterms, terms, -1, max_work, 1, x);
}
.\numpy\numpy\_core\src\common\mem_overlap.h
/* Bounds check only */
/* Exact solution */
// 内存重叠情况的枚举类型
typedef enum {
MEM_OVERLAP_NO = 0, /* 没有重叠 */
MEM_OVERLAP_YES = 1, /* 存在重叠 */
MEM_OVERLAP_TOO_HARD = -1, /* 最大工作量超出 */
MEM_OVERLAP_OVERFLOW = -2, /* 由于整数溢出导致算法失败 */
MEM_OVERLAP_ERROR = -3 /* 无效输入 */
} mem_overlap_t;
// 二次方程式解的项
typedef struct {
npy_int64 a; // 系数 a
npy_int64 ub; // 上界
} diophantine_term_t;
// 解二次方程式的函数声明
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_diophantine(unsigned int n, diophantine_term_t *E,
npy_int64 b, Py_ssize_t max_work, int require_nontrivial,
npy_int64 *x);
// 简化二次方程式的函数声明
NPY_VISIBILITY_HIDDEN int
diophantine_simplify(unsigned int *n, diophantine_term_t *E, npy_int64 b);
// 检查两个数组是否可能共享内存的函数声明
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_may_share_memory(PyArrayObject *a, PyArrayObject *b,
Py_ssize_t max_work);
// 检查数组内部是否可能存在重叠的函数声明
NPY_VISIBILITY_HIDDEN mem_overlap_t
solve_may_have_internal_overlap(PyArrayObject *a, Py_ssize_t max_work);
// 根据步长计算偏移边界的函数声明
NPY_VISIBILITY_HIDDEN void
offset_bounds_from_strides(const int itemsize, const int nd,
const npy_intp *dims, const npy_intp *strides,
npy_intp *lower_offset, npy_intp *upper_offset);
这些注释解释了每个声明和宏定义的作用,包括函数的功能描述和枚举类型的含义。
.\numpy\numpy\_core\src\common\meta.hpp
namespace np { namespace meta {
/// @addtogroup cpp_core_meta
/// @{
namespace details {
// 模板结构体:根据指定大小和无符号类型,选择合适的整数类型
template<int size, bool unsig>
struct IntBySize;
// 特化模板结构体:根据 uint8_t 的大小和无符号类型选择 int8_t 或 uint8_t 类型
template<bool unsig>
struct IntBySize<sizeof(uint8_t), unsig> {
using Type = typename std::conditional<
unsig, uint8_t, int8_t>::type;
};
// 特化模板结构体:根据 uint16_t 的大小和无符号类型选择 int16_t 或 uint16_t 类型
template<bool unsig>
struct IntBySize<sizeof(uint16_t), unsig> {
using Type = typename std::conditional<
unsig, uint16_t, int16_t>::type;
};
// 特化模板结构体:根据 uint32_t 的大小和无符号类型选择 int32_t 或 uint32_t 类型
template<bool unsig>
struct IntBySize<sizeof(uint32_t), unsig> {
using Type = typename std::conditional<
unsig, uint32_t, int32_t>::type;
};
// 特化模板结构体:根据 uint64_t 的大小和无符号类型选择 int64_t 或 uint64_t 类型
template<bool unsig>
struct IntBySize<sizeof(uint64_t), unsig> {
using Type = typename std::conditional<
unsig, uint64_t, int64_t>::type;
};
} // namespace details
/// 提供任意整数类型的安全转换为固定宽度整数类型。
template<typename T>
struct FixedWidth {
// 获取 T 类型的大小和无符号信息,选择对应的固定宽度整数类型 TF_
using TF_ = typename details::IntBySize<
sizeof(T), std::is_unsigned<T>::value
>::Type;
// 如果 T 是整数类型,则 Type 是 TF_;否则 Type 是 T 自身。
using Type = typename std::conditional<
std::is_integral<T>::value, TF_, T
>::type;
};
/// @} cpp_core_meta
}} // namespace np::meta
.\numpy\numpy\_core\src\common\npdef.hpp
/// @addtogroup cpp_core_defs
/// @{
/// Whether compiler supports C++20
/// Wraps `__has_builtin`
/// @} cpp_core_defs
/// @addtogroup cpp_core_defs
/// @{
/// Whether compiler supports C++20
// 如果编译器支持 C++20,定义 NP_HAS_CPP20 为 1
// 如果编译器不支持 C++20,定义 NP_HAS_CPP20 为 0
/// Wraps `__has_builtin`
// 使用 __has_builtin 宏检测编译器是否支持指定的内建函数
// 如果编译器不支持 __has_builtin 宏,则默认指定内建函数不可用
/// @} cpp_core_defs
.\numpy\numpy\_core\src\common\npstd.hpp
namespace np {
/// @addtogroup cpp_core_types
/// @{
using std::uint8_t; // 使用std命名空间中的uint8_t
using std::int8_t; // 使用std命名空间中的int8_t
using std::uint16_t; // 使用std命名空间中的uint16_t
using std::int16_t; // 使用std命名空间中的int16_t
using std::uint32_t; // 使用std命名空间中的uint32_t
using std::int32_t; // 使用std命名空间中的int32_t
using std::uint64_t; // 使用std命名空间中的uint64_t
using std::int64_t; // 使用std命名空间中的int64_t
using std::uintptr_t; // 使用std命名空间中的uintptr_t
using std::intptr_t; // 使用std命名空间中的intptr_t
using std::complex; // 使用std命名空间中的complex
using std::uint_fast16_t; // 使用std命名空间中的uint_fast16_t
using std::uint_fast32_t; // 使用std命名空间中的uint_fast32_t
using SSize = Py_ssize_t; // 定义SSize为Py_ssize_t类型,用于表示大小或索引
/** Guard for long double.
*
* The C implementation defines long double as double
* on MinGW to provide compatibility with MSVC to unify
* one behavior under Windows OS, which makes npy_longdouble
* not fit to be used with template specialization or overloading.
*
* This type will be set to `void` when `npy_longdouble` is not defined
* as `long double`.
*/
using LongDouble = typename std::conditional<
!std::is_same<npy_longdouble, long double>::value,
void, npy_longdouble
>::type;
/// @} cpp_core_types
} // namespace np
.\numpy\numpy\_core\src\common\npy_argparse.c
/**
* Define NPY_NO_DEPRECATED_API to use the latest NumPy API version.
* This prevents the use of deprecated API features.
*/
/**
* Define _MULTIARRAYMODULE to specify the multi-array module.
* This is used to indicate the module being compiled.
*/
/**
* Define PY_SSIZE_T_CLEAN to use Py_ssize_t API for Python C API functions.
* This ensures compatibility with Python's size type, which may vary across
* different versions and configurations.
*/
#define PY_SSIZE_T_CLEAN
/**
* Include Python.h to gain access to Python C API functions and definitions.
* This header file provides essential macros, types, and function declarations
* for extending Python with C or C++ code.
*/
#include <Python.h>
/**
* Include ndarraytypes.h to access NumPy's array and dtype definitions.
* This header file defines data structures and macros necessary for working
* with NumPy arrays and data types in C extension modules.
*/
/**
* Include npy_2_compat.h for backward compatibility with older NumPy versions.
* This ensures that the extension module remains compatible with NumPy's API
* across different versions of the library.
*/
#include "numpy/npy_2_compat.h"
/**
* Include npy_argparse.h to utilize argument parsing utilities provided by NumPy.
* This header file provides functions and macros for parsing and validating
* function arguments passed from Python to C extension modules.
*/
#include "npy_argparse.h"
/**
* Include npy_import.h for functions related to importing NumPy in C extension modules.
* This header file includes functions and macros that facilitate importing NumPy
* and ensuring compatibility across different configurations.
*/
#include "npy_import.h"
/**
* Include arrayfunction_override.h for overriding array functions in NumPy.
* This header file contains declarations and macros that enable overriding or
* extending built-in NumPy array functions with custom implementations.
*/
#include "arrayfunction_override.h"
/**
* Small wrapper converting Python integer to C int using PyLong_AsLong function.
*
* This function handles conversion of Python integers to C int, checking for
* overflow conditions and type errors.
*
* @param obj The Python object to convert (should be an integer)
* @param value Pointer to the output C int value
* @returns NPY_SUCCEED on success, NPY_FAIL on failure
*/
NPY_NO_EXPORT int
PyArray_PythonPyIntFromInt(PyObject *obj, int *value)
{
/* Python's behavior is to check explicitly for float types */
if (NPY_UNLIKELY(PyFloat_Check(obj))) {
PyErr_SetString(PyExc_TypeError,
"integer argument expected, got float");
return NPY_FAIL;
}
long result = PyLong_AsLong(obj);
if (NPY_UNLIKELY((result == -1) && PyErr_Occurred())) {
return NPY_FAIL;
}
if (NPY_UNLIKELY((result > INT_MAX) || (result < INT_MIN))) {
PyErr_SetString(PyExc_OverflowError,
"Python int too large to convert to C int");
return NPY_FAIL;
}
else {
*value = (int)result;
return NPY_SUCCEED;
}
}
/**
* Type definition for a function pointer to convert a Python object to a C type.
*/
typedef int convert(PyObject *, void *);
/**
* Internal function to initialize keyword argument parsing for NumPy functions.
*
* This function performs several tasks:
* 1. Checks input consistency to detect coding errors, such as missing | after optional parameters.
* 2. Determines the number of positional-only arguments, total arguments, required arguments,
* and keyword arguments.
* 3. Interns all keyword argument strings to optimize parsing performance by using
* identity-based comparisons and reducing string creation overhead.
*
* @param funcname Name of the function being parsed, used mainly for error reporting.
* @param cache A cache object stored statically within the parsing function.
* @param va_orig Argument list passed to npy_parse_arguments.
* @return 0 on success, -1 on failure
*/
static int
initialize_keywords(const char *funcname,
_NpyArgParserCache *cache, va_list va_orig) {
va_list va;
int nargs = 0; // Total number of arguments
int nkwargs = 0; // Number of keyword arguments
int npositional_only = 0; // Number of positional-only arguments
int nrequired = 0; // Number of required arguments
int npositional = 0; // Number of positional arguments
char state = '\0'; // State variable for argument parsing
va_copy(va, va_orig);
while (1) {
/* Count length first: */
// 从可变参数中依次取出参数:name(字符串指针)、converter(转换器指针)、data(数据指针)
char *name = va_arg(va, char *);
convert *converter = va_arg(va, convert *);
void *data = va_arg(va, void *);
/* Check if this is the sentinel, only converter may be NULL */
// 检查是否为哨兵值(结束标志),只有 converter 可能为 NULL
if ((name == NULL) && (converter == NULL) && (data == NULL)) {
break;
}
// 如果 name 为 NULL,则抛出异常并返回 -1
if (name == NULL) {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: name is NULL in %s() at "
"argument %d.", funcname, nargs);
va_end(va);
return -1;
}
// 如果 data 为 NULL,则抛出异常并返回 -1
if (data == NULL) {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: data is NULL in %s() at "
"argument %d.", funcname, nargs);
va_end(va);
return -1;
}
// 参数计数增加
nargs += 1;
// 如果参数名以 '|' 开头
if (*name == '|') {
// 如果当前状态为 '$',则抛出异常并返回 -1
if (state == '$') {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: positional argument `|` "
"after keyword only `$` one to %s() at argument %d.",
funcname, nargs);
va_end(va);
return -1;
}
// 更新状态为 '|'
state = '|';
// 将 name 指针向前移动到实际的参数名位置
name++; /* advance to actual name. */
// 增加位置参数计数
npositional += 1;
}
// 如果参数名以 '$' 开头
else if (*name == '$') {
// 更新状态为 '$'
state = '$';
// 将 name 指针向前移动到实际的参数名位置
name++; /* advance to actual name. */
}
// 如果参数名不以 '|' 或 '$' 开头
else {
// 如果状态不是初始状态 '\0',则抛出异常并返回 -1
if (state != '\0') {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: non-required argument after "
"required | or $ one to %s() at argument %d.",
funcname, nargs);
va_end(va);
return -1;
}
// 必需参数计数增加,位置参数计数增加
nrequired += 1;
npositional += 1;
}
// 如果参数名为空字符串 '\0'
if (*name == '\0') {
// 增加位置参数且只能作为位置参数的计数
npositional_only += 1;
// 如果状态为 '$' 或者非关键字参数的数量与位置参数数量不一致,则抛出异常并返回 -1
if (state == '$' || npositional_only != npositional) {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: non-kwarg marked with $ "
"to %s() at argument %d or positional only following "
"kwarg.", funcname, nargs);
va_end(va);
return -1;
}
}
// 如果参数名不为空字符串
else {
// 关键字参数计数增加
nkwargs += 1;
}
}
va_end(va);
// 如果位置参数计数为 -1,则将其设置为 nargs
if (npositional == -1) {
npositional = nargs;
}
// 如果参数数量超过 _NPY_MAX_KWARGS,抛出异常并返回 -1
if (nargs > _NPY_MAX_KWARGS) {
PyErr_Format(PyExc_SystemError,
"NumPy internal error: function %s() has %d arguments, but "
"the maximum is currently limited to %d for easier parsing; "
"it can be increased by modifying `_NPY_MAX_KWARGS`.",
funcname, nargs, _NPY_MAX_KWARGS);
return -1;
}
/*
* 设置缓存对象的参数信息,用于后续的处理。
*/
cache->nargs = nargs;
cache->npositional_only = npositional_only;
cache->npositional = npositional;
cache->nrequired = nrequired;
/*
* 将 kw_strings 数组全部置为 NULL,以便后续更容易进行清理(并且保证 NULL 结尾)。
*/
memset(cache->kw_strings, 0, sizeof(PyObject *) * (nkwargs + 1));
/*
* 使用 va_orig 复制一个可变参数列表,以便后续操作。
*/
va_copy(va, va_orig);
for (int i = 0; i < nargs; i++) {
/*
* 遍历非关键字参数,这些参数不需要额外的设置。
*/
char *name = va_arg(va, char *);
va_arg(va, convert *);
va_arg(va, void *);
if (*name == '|' || *name == '$') {
name++; /* 忽略 | 和 $ 符号 */
}
if (i >= npositional_only) {
int i_kwarg = i - npositional_only;
/*
* 如果当前参数是关键字参数,则将其字符串名转换为 Python Unicode 对象并存储在缓存的 kw_strings 数组中。
* 如果转换失败,则清理资源并跳转到错误处理标签。
*/
cache->kw_strings[i_kwarg] = PyUnicode_InternFromString(name);
if (cache->kw_strings[i_kwarg] == NULL) {
va_end(va);
goto error;
}
}
}
/*
* 结束可变参数的处理。
*/
va_end(va);
return 0;
/**
* 用于处理参数解析的通用辅助函数
*
* 查看宏版本以获取如何使用此函数的示例模式。
*
* @param funcname 函数名字符串
* @param cache 参数解析器缓存对象
* @param args 传递给 Python 的参数(METH_FASTCALL)
* @param len_args 参数数组的长度
* @param kwnames 关键字参数的名称
* @param ... 参数列表(参见宏版本),以 NULL, NULL, NULL 结尾:名称,转换器,值
* @return 成功返回 0,失败返回 -1
*/
NPY_NO_EXPORT int
_npy_parse_arguments(const char *funcname,
_NpyArgParserCache *cache,
PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames,
...)
{
// 如果 npositional 未初始化
if (NPY_UNLIKELY(cache->npositional == -1)) {
va_list va;
va_start(va, kwnames);
// 初始化关键字参数
int res = initialize_keywords(funcname, cache, va);
va_end(va);
if (res < 0) {
return -1;
}
}
// 如果传入参数个数大于需要的位置参数个数
if (NPY_UNLIKELY(len_args > cache->npositional)) {
// 抛出位置参数个数不正确的错误
return raise_incorrect_number_of_positional_args(
funcname, cache, len_args);
}
/* NOTE: Could remove the limit but too many kwargs are slow anyway. */
// 所有参数的数组
PyObject *all_arguments[NPY_MAXARGS];
// 将传入的位置参数放入 all_arguments 数组
for (Py_ssize_t i = 0; i < len_args; i++) {
all_arguments[i] = args[i];
}
/* Without kwargs, do not iterate all converters. */
// 最大参数个数为传入位置参数个数
int max_nargs = (int)len_args;
Py_ssize_t len_kwargs = 0;
// 如果有关键字参数,首先处理它们
// 如果关键字参数列表不为空
if (NPY_LIKELY(kwnames != NULL)) {
// 获取关键字参数的个数
len_kwargs = PyTuple_GET_SIZE(kwnames);
// 获取缓存中的最大参数个数
max_nargs = cache->nargs;
// 将额外的位置参数初始化为NULL
for (int i = len_args; i < cache->nargs; i++) {
all_arguments[i] = NULL;
}
// 遍历关键字参数列表
for (Py_ssize_t i = 0; i < len_kwargs; i++) {
// 获取关键字参数的键和对应的值
PyObject *key = PyTuple_GET_ITEM(kwnames, i);
PyObject *value = args[i + len_args];
PyObject *const *name;
/* 超快速路径,检查对象的身份是否相同: */
// 遍历缓存中的关键字字符串列表,查找是否有身份相同的关键字对象
for (name = cache->kw_strings; *name != NULL; name++) {
if (*name == key) {
break;
}
}
// 如果没有找到身份相同的关键字对象
if (NPY_UNLIKELY(*name == NULL)) {
/* 慢速回退,如果由于某些原因身份检查失败 */
// 再次遍历缓存中的关键字字符串列表,进行对象的相等性比较
for (name = cache->kw_strings; *name != NULL; name++) {
int eq = PyObject_RichCompareBool(*name, key, Py_EQ);
if (eq == -1) {
return -1;
}
else if (eq) {
break;
}
}
// 如果还是没有找到匹配的关键字对象
if (NPY_UNLIKELY(*name == NULL)) {
/* 无效的关键字参数。 */
PyErr_Format(PyExc_TypeError,
"%s() got an unexpected keyword argument '%S'",
funcname, key);
return -1;
}
}
// 计算参数在函数参数列表中的位置
Py_ssize_t param_pos = (
(name - cache->kw_strings) + cache->npositional_only);
/* 可能会有相同位置的参数 */
// 如果该位置已经存在参数对象,则报错
if (NPY_UNLIKELY(all_arguments[param_pos] != NULL)) {
PyErr_Format(PyExc_TypeError,
"argument for %s() given by name ('%S') and position "
"(position %zd)", funcname, key, param_pos);
return -1;
}
// 将参数值存储到参数列表中对应的位置
all_arguments[param_pos] = value;
}
}
/*
* 这时候 `all_arguments` 中要么是 NULL 要么是对象
* 参数和关键字参数的总数不会超过函数声明的最大参数个数,否则上面的逻辑会检测到错误。
*/
// 断言:位置参数个数加上关键字参数个数不会超过缓存中声明的最大参数个数
assert(len_args + len_kwargs <= cache->nargs);
/* 现在 `all_arguments` 包含的要么是NULL要么是实际的对象 */
// 初始化可变参数列表
va_list va;
va_start(va, kwnames);
// 遍历可变参数列表,处理每个参数的转换
for (int i = 0; i < max_nargs; i++) {
// 跳过当前可变参数列表中的下一个参数
va_arg(va, char *);
// 获取下一个参数作为转换器的指针
convert *converter = va_arg(va, convert *);
// 获取下一个参数作为需要填充数据的指针
void *data = va_arg(va, void *);
// 如果当前参数为空,则继续处理下一个参数
if (all_arguments[i] == NULL) {
continue;
}
// 定义变量 res 来存储转换结果
int res;
// 如果转换器为空,则直接将当前参数赋值给数据指针
if (converter == NULL) {
*((PyObject **) data) = all_arguments[i];
continue;
}
// 使用转换器将当前参数转换为目标数据,并获取转换结果
res = converter(all_arguments[i], data);
// 根据转换结果判断下一步动作
// 如果转换成功,继续处理下一个参数
if (NPY_UNLIKELY(res == NPY_SUCCEED)) {
continue;
}
// 如果转换失败,跳转到转换失败的处理标签
else if (NPY_UNLIKELY(res == NPY_FAIL)) {
/* It is usually the users responsibility to clean up. */
goto converting_failed;
}
// 如果需要支持清理操作,给出错误信息并跳转到转换失败的处理标签
else if (NPY_UNLIKELY(res == Py_CLEANUP_SUPPORTED)) {
/* TODO: Implementing cleanup if/when needed should not be hard */
PyErr_Format(PyExc_SystemError,
"converter cleanup of parameter %d to %s() not supported.",
i, funcname);
goto converting_failed;
}
// 如果出现了意料之外的结果,触发断言错误
assert(0);
}
// 检查是否传递了足够的必需参数
// 通常情况下,必需参数不作为关键字参数传递
if (NPY_UNLIKELY(len_args < cache->nrequired)) {
// 如果最大参数个数小于必需参数个数,抛出缺少参数的异常
if (NPY_UNLIKELY(max_nargs < cache->nrequired)) {
raise_missing_argument(funcname, cache, max_nargs);
goto converting_failed;
}
// 遍历检查每个必需参数,如果为空,抛出缺少参数的异常
for (int i = 0; i < cache->nrequired; i++) {
if (NPY_UNLIKELY(all_arguments[i] == NULL)) {
raise_missing_argument(funcname, cache, i);
goto converting_failed;
}
}
}
// 结束可变参数处理
va_end(va);
// 返回成功状态
return 0;
converting_failed:
// 处理转换失败的标签,返回失败状态
return -1;
converting_failed:
// 结束可变参数列表的使用
va_end(va);
// 返回错误码 -1,表示转换失败
return -1;
}
.\numpy\numpy\_core\src\common\npy_argparse.h
/*
* This file defines macros to help with keyword argument parsing.
* This solves two issues as of now:
* 1. Pythons C-API PyArg_* keyword argument parsers are slow, due to
* not caching the strings they use.
* 2. It allows the use of METH_ARGPARSE (and `tp_vectorcall`)
* when available in Python, which removes a large chunk of overhead.
*
* Internally CPython achieves similar things by using a code generator
* argument clinic. NumPy may well decide to use argument clinic or a different
* solution in the future.
*/
NPY_NO_EXPORT int
PyArray_PythonPyIntFromInt(PyObject *obj, int *value);
typedef struct {
int npositional;
int nargs;
int npositional_only;
int nrequired;
/* Null terminated list of keyword argument name strings */
PyObject *kw_strings[_NPY_MAX_KWARGS+1];
} _NpyArgParserCache;
/*
* The sole purpose of this macro is to hide the argument parsing cache.
* Since this cache must be static, this also removes a source of error.
*/
/**
* Macro to help with argument parsing.
*
* The pattern for using this macro is by defining the method as:
*
* @code
* static PyObject *
* my_method(PyObject *self,
* PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
* {
* NPY_PREPARE_ARGPARSER;
*
* PyObject *argument1, *argument3;
* int argument2 = -1;
* if (npy_parse_arguments("method", args, len_args, kwnames),
* "argument1", NULL, &argument1,
* "|argument2", &PyArray_PythonPyIntFromInt, &argument2,
* "$argument3", NULL, &argument3,
* NULL, NULL, NULL) < 0) {
* return NULL;
* }
* }
* @endcode
*
* The `NPY_PREPARE_ARGPARSER` macro sets up a static cache variable necessary
* to hold data for speeding up the parsing. `npy_parse_arguments` must be
* used in conjunction with the macro defined in the same scope.
* (No two `npy_parse_arguments` may share a single `NPY_PREPARE_ARGPARSER`.)
*
* @param funcname Name of the function using the argument parsing.
* @param args Python passed args (METH_FASTCALL)
* @param len_args Number of arguments (not flagged)
* @param kwnames Tuple as passed by METH_FASTCALL or NULL.
* @param ... List of arguments must be param1_name, param1_converter,
* *param1_outvalue, param2_name, ..., NULL, NULL, NULL.
* Where name is ``char *``, ``converter`` a python converter
* function or NULL and ``outvalue`` is the ``void *`` passed to
* the converter (holding the converted data or a borrowed
* reference if converter is NULL).
*
* @return Returns 0 on success and -1 on failure.
*/
NPY_NO_EXPORT int
/* 定义 _npy_parse_arguments 函数,用于解析函数参数
funcname: 函数名,用于标识当前调用的函数
cache_ptr: 指向 _NpyArgParserCache 结构的指针,用于缓存数据
args: 指向参数数组的指针,其中包含传递给函数的位置参数
len_args: 参数数组的长度
kwnames: Python 关键字参数的元组对象
...: 变长参数列表,以 NULL 结尾,每三个参数依次为参数名、转换器、值
函数声明标记为 NPY_GCC_NONNULL(1),确保第一个参数不为 NULL
*/
_npy_parse_arguments(const char *funcname,
/* cache_ptr is a NULL initialized persistent storage for data */
_NpyArgParserCache *cache_ptr,
PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames,
/* va_list is NULL, NULL, NULL terminated: name, converter, value */
...) NPY_GCC_NONNULL(1);
/* 定义宏 npy_parse_arguments,简化函数参数解析过程
funcname: 函数名,用于标识当前调用的函数
args: 指向参数数组的指针,其中包含传递给函数的位置参数
len_args: 参数数组的长度
kwnames: Python 关键字参数的元组对象
...: 变长参数列表,传递给 _npy_parse_arguments 函数
通过宏调用 _npy_parse_arguments 函数,传递参数和额外的 __VA_ARGS__ 参数
*/
_npy_parse_arguments(funcname, &__argparse_cache, \
args, len_args, kwnames, __VA_ARGS__)
/* 结束 ifdef 保护,确保头文件只被包含一次 */
.\numpy\numpy\_core\src\common\npy_binsearch.h
// 如果没有定义__NPY_BINSEARCH_H__,则定义__NPY_BINSEARCH_H__,避免重复包含
extern "C" {
// 声明 PyArray_BinSearchFunc 类型的函数指针,该函数用于二分查找
typedef void (PyArray_BinSearchFunc)(const char*, const char*, char*,
npy_intp, npy_intp,
npy_intp, npy_intp, npy_intp,
PyArrayObject*);
// 声明 PyArray_ArgBinSearchFunc 类型的函数指针,该函数用于带有参数的二分查找
typedef int (PyArray_ArgBinSearchFunc)(const char*, const char*,
const char*, char*,
npy_intp, npy_intp, npy_intp,
npy_intp, npy_intp, npy_intp,
PyArrayObject*);
// 获取适合指定数据类型和查找方向的二分查找函数指针
NPY_NO_EXPORT PyArray_BinSearchFunc* get_binsearch_func(PyArray_Descr *dtype, NPY_SEARCHSIDE side);
// 获取适合指定数据类型和查找方向的带参数的二分查找函数指针
NPY_NO_EXPORT PyArray_ArgBinSearchFunc* get_argbinsearch_func(PyArray_Descr *dtype, NPY_SEARCHSIDE side);
}
.\numpy\numpy\_core\src\common\npy_cblas.h
/*
* This header provides numpy a consistent interface to CBLAS code. It is needed
* because not all providers of cblas provide cblas.h. For instance, MKL provides
* mkl_cblas.h and also typedefs the CBLAS_XXX enums.
*/
/* Allow the use in C++ code. */
extern "C"
{
/*
* Enumerated and derived types
*/
// Define enum for row-major and column-major order
enum CBLAS_ORDER {CblasRowMajor=101, CblasColMajor=102};
// Define enum for transpose operations
enum CBLAS_TRANSPOSE {CblasNoTrans=111, CblasTrans=112, CblasConjTrans=113};
// Define enum for upper and lower triangular matrices
enum CBLAS_UPLO {CblasUpper=121, CblasLower=122};
// Define enum for unit and non-unit diagonal matrices
enum CBLAS_DIAG {CblasNonUnit=131, CblasUnit=132};
// Define enum for left and right side matrices in operations
enum CBLAS_SIDE {CblasLeft=141, CblasRight=142};
// Check macOS version compatibility for Accelerate ILP64 support
// Define BLAS symbol suffix based on ILP64 support
// Concatenate and expand BLAS function names based on symbol conventions
/*
* Use either the OpenBLAS scheme with the `64_` suffix behind the Fortran
* compiler symbol mangling, or the MKL scheme (and upcoming
* reference-lapack
*/
/*
* Note that CBLAS doesn't include Fortran compiler symbol mangling, so ends up
* being the same in both schemes
*/
#define CBLAS_FUNC(name) BLAS_FUNC_EXPAND(name,BLAS_SYMBOL_PREFIX,,BLAS_SYMBOL_SUFFIX)
#ifdef HAVE_BLAS_ILP64
#define CBLAS_INT npy_int64
#define CBLAS_INT_MAX NPY_MAX_INT64
#else
#define CBLAS_INT int
#define CBLAS_INT_MAX INT_MAX
#endif
#define BLASNAME(name) CBLAS_FUNC(name)
#define BLASINT CBLAS_INT
#include "npy_cblas_base.h"
#undef BLASINT
#undef BLASNAME
/*
* Convert NumPy stride to BLAS stride. Returns 0 if conversion cannot be done
* (BLAS won't handle negative or zero strides the way we want).
*/
static inline CBLAS_INT
blas_stride(npy_intp stride, unsigned itemsize)
{
/*
* Should probably check pointer alignment also, but this may cause
* problems if we require complex to be 16 byte aligned.
*/
if (stride > 0 && (stride % itemsize) == 0) {
stride /= itemsize;
if (stride <= CBLAS_INT_MAX) {
return stride;
}
}
return 0;
}
/*
* 定义 CBLAS 的块大小。
*
* 块大小是小于 CBLAS_INT_MAX 的最大二的幂。
*/
}
.\numpy\numpy\_core\src\common\npy_cblas_base.h
/*
* This header provides numpy a consistent interface to CBLAS code. It is needed
* because not all providers of cblas provide cblas.h. For instance, MKL provides
* mkl_cblas.h and also typedefs the CBLAS_XXX enums.
*/
/*
* ===========================================================================
* Prototypes for level 1 BLAS functions (complex are recast as routines)
* ===========================================================================
*/
// Single precision dot product with extended precision accumulation
float BLASNAME(cblas_sdsdot)(const BLASINT N, const float alpha, const float *X,
const BLASINT incX, const float *Y, const BLASINT incY);
// Double precision dot product
double BLASNAME(cblas_dsdot)(const BLASINT N, const float *X, const BLASINT incX, const float *Y,
const BLASINT incY);
// Single precision dot product
float BLASNAME(cblas_sdot)(const BLASINT N, const float *X, const BLASINT incX,
const float *Y, const BLASINT incY);
// Double precision dot product
double BLASNAME(cblas_ddot)(const BLASINT N, const double *X, const BLASINT incX,
const double *Y, const BLASINT incY);
// Complex dot product (unconjugated)
void BLASNAME(cblas_cdotu_sub)(const BLASINT N, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *dotu);
// Complex dot product (conjugated)
void BLASNAME(cblas_cdotc_sub)(const BLASINT N, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *dotc);
// Double complex dot product (unconjugated)
void BLASNAME(cblas_zdotu_sub)(const BLASINT N, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *dotu);
// Double complex dot product (conjugated)
void BLASNAME(cblas_zdotc_sub)(const BLASINT N, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *dotc);
// Euclidean norm of a single precision vector
float BLASNAME(cblas_snrm2)(const BLASINT N, const float *X, const BLASINT incX);
// Sum of absolute values of a single precision vector
float BLASNAME(cblas_sasum)(const BLASINT N, const float *X, const BLASINT incX);
// Euclidean norm of a double precision vector
double BLASNAME(cblas_dnrm2)(const BLASINT N, const double *X, const BLASINT incX);
// Sum of absolute values of a double precision vector
double BLASNAME(cblas_dasum)(const BLASINT N, const double *X, const BLASINT incX);
// Euclidean norm of a single precision complex vector
float BLASNAME(cblas_scnrm2)(const BLASINT N, const void *X, const BLASINT incX);
// Sum of absolute values of a single precision complex vector
float BLASNAME(cblas_scasum)(const BLASINT N, const void *X, const BLASINT incX);
// Euclidean norm of a double precision complex vector
double BLASNAME(cblas_dznrm2)(const BLASINT N, const void *X, const BLASINT incX);
// Sum of absolute values of a double precision complex vector
double BLASNAME(cblas_dzasum)(const BLASINT N, const void *X, const BLASINT incX);
// Index of maximum absolute value of a single precision vector
CBLAS_INDEX BLASNAME(cblas_isamax)(const BLASINT N, const float *X, const BLASINT incX);
// Index of maximum absolute value of a double precision vector
CBLAS_INDEX BLASNAME(cblas_idamax)(const BLASINT N, const double *X, const BLASINT incX);
// Index of maximum absolute value of a single precision complex vector
CBLAS_INDEX BLASNAME(cblas_icamax)(const BLASINT N, const void *X, const BLASINT incX);
注释:
/*
* ===========================================================================
* Prototypes for level 1 BLAS routines
* ===========================================================================
*/
/*
* Routines with standard 4 prefixes (s, d, c, z)
*/
/*
* Function prototype for cblas_izamax:
* Returns the index of the first element with maximum absolute value in X.
* Parameters:
* - N: Number of elements in X
* - X: Pointer to the array of elements (void pointer)
* - incX: Increment for indexing into X
* Returns:
* - CBLAS_INDEX: Index of the element with maximum absolute value
*/
CBLAS_INDEX BLASNAME(cblas_izamax)(const BLASINT N, const void *X, const BLASINT incX);
/*
* Function prototypes for standard BLAS level 1 routines:
* sswap, scopy, saxpy, dswap, dcopy, daxpy, cswap, ccopy, caxpy, zswap, zcopy, zaxpy
*/
/*
* Function prototype for cblas_sswap:
* Swaps elements between two arrays X and Y.
* Parameters:
* - N: Number of elements in X and Y
* - X: Pointer to the array X
* - incX: Increment for indexing into X
* - Y: Pointer to the array Y
* - incY: Increment for indexing into Y
*/
void BLASNAME(cblas_sswap)(const BLASINT N, float *X, const BLASINT incX,
float *Y, const BLASINT incY);
/*
* Function prototype for cblas_scopy:
* Copies elements from array X to array Y.
* Parameters:
* - N: Number of elements in X and Y
* - X: Pointer to the source array X
* - incX: Increment for indexing into X
* - Y: Pointer to the destination array Y
* - incY: Increment for indexing into Y
*/
void BLASNAME(cblas_scopy)(const BLASINT N, const float *X, const BLASINT incX,
float *Y, const BLASINT incY);
/*
* Function prototype for cblas_saxpy:
* Computes Y = alpha*X + Y.
* Parameters:
* - N: Number of elements in X and Y
* - alpha: Scalar alpha
* - X: Pointer to the array X
* - incX: Increment for indexing into X
* - Y: Pointer to the array Y
* - incY: Increment for indexing into Y
*/
void BLASNAME(cblas_saxpy)(const BLASINT N, const float alpha, const float *X,
const BLASINT incX, float *Y, const BLASINT incY);
/*
* Similar function prototypes for double precision (d prefix), complex
* single precision (c prefix), and complex double precision (z prefix) routines
* cblas_dswap, cblas_dcopy, cblas_daxpy, cblas_cswap, cblas_ccopy, cblas_caxpy,
* cblas_zswap, cblas_zcopy, cblas_zaxpy.
*/
/*
* Function prototype for cblas_srotg:
* Constructs a Givens plane rotation matrix.
* Parameters:
* - a: Input/output parameter (see BLAS documentation)
* - b: Input/output parameter (see BLAS documentation)
* - c: Output parameter (see BLAS documentation)
* - s: Output parameter (see BLAS documentation)
*/
void BLASNAME(cblas_srotg)(float *a, float *b, float *c, float *s);
/*
* Function prototype for cblas_srotmg:
* Constructs modified Givens plane rotation matrix.
* Parameters:
* - d1: Input/output parameter (see BLAS documentation)
* - d2: Input/output parameter (see BLAS documentation)
* - b1: Input/output parameter (see BLAS documentation)
* - b2: Input parameter (see BLAS documentation)
* - P: Output parameter (see BLAS documentation)
*/
void BLASNAME(cblas_srotmg)(float *d1, float *d2, float *b1, const float b2, float *P);
/*
* Function prototype for cblas_srot:
* Applies a Givens rotation to vectors X and Y.
* Parameters:
* - N: Number of elements in X and Y
* - X: Pointer to the array X
* - incX: Increment for indexing into X
* - Y: Pointer to the array Y
* - incY: Increment for indexing into Y
* - c: Cosine of the angle of rotation
* - s: Sine of the angle of rotation
*/
void BLASNAME(cblas_srot)(const BLASINT N, float *X, const BLASINT incX,
float *Y, const BLASINT incY, const float c, const float s);
/*
* Function prototype for cblas_srotm:
* Applies modified Givens rotation to vectors X and Y.
* Parameters:
* - N: Number of elements in X and Y
* - X: Pointer to the array X
* - incX: Increment for indexing into X
* - Y: Pointer to the array Y
* - incY: Increment for indexing into Y
* - P: Pointer to the P array
*/
void BLASNAME(cblas_srotm)(const BLASINT N, float *X, const BLASINT incX,
float *Y, const BLASINT incY, const float *P);
/*
* Similar function prototypes for double precision (d prefix) routines:
* cblas_drotg, cblas_drotmg, cblas_drot.
*/
/*
* BLASNAME(cblas_drotm) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - X: 双精度浮点数数组
* - incX: 数组 X 中元素的增量
* - Y: 双精度浮点数数组
* - incY: 数组 Y 中元素的增量
* - P: 双精度浮点数数组,包含参数 P
*/
void BLASNAME(cblas_drotm)(const BLASINT N, double *X, const BLASINT incX,
double *Y, const BLASINT incY, const double *P);
/*
* 带有 S D C Z CS 和 ZD 前缀的例程
*/
/*
* BLASNAME(cblas_sscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 浮点数倍乘因子
* - X: 单精度浮点数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_sscal)(const BLASINT N, const float alpha, float *X, const BLASINT incX);
/*
* BLASNAME(cblas_dscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 双精度浮点数倍乘因子
* - X: 双精度浮点数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_dscal)(const BLASINT N, const double alpha, double *X, const BLASINT incX);
/*
* BLASNAME(cblas_cscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 复数倍乘因子的指针
* - X: 复数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_cscal)(const BLASINT N, const void *alpha, void *X, const BLASINT incX);
/*
* BLASNAME(cblas_zscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 双复数倍乘因子的指针
* - X: 双复数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_zscal)(const BLASINT N, const void *alpha, void *X, const BLASINT incX);
/*
* BLASNAME(cblas_csscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 实部为浮点数倍乘因子,虚部为零的复数
* - X: 复数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_csscal)(const BLASINT N, const float alpha, void *X, const BLASINT incX);
/*
* BLASNAME(cblas_zdscal) 函数的声明
*
* 参数:
* - N: 数组中元素的数量
* - alpha: 实部为双精度浮点数倍乘因子,虚部为零的双复数
* - X: 双复数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_zdscal)(const BLASINT N, const double alpha, void *X, const BLASINT incX);
/*
* ===========================================================================
* level 2 BLAS 的原型
* ===========================================================================
*/
/*
* 带有标准 4 个前缀 (S, D, C, Z) 的例程
*/
/*
* BLASNAME(cblas_sgemv) 函数的声明
*
* 参数:
* - order: 矩阵的存储顺序
* - TransA: 矩阵 A 的转置方式
* - M: 矩阵 A 的行数
* - N: 矩阵 A 的列数
* - alpha: 浮点数倍乘因子
* - A: 单精度浮点数矩阵
* - lda: A 矩阵的行跨度
* - X: 单精度浮点数数组
* - incX: 数组 X 中元素的增量
* - beta: 浮点数倍乘因子
* - Y: 单精度浮点数数组
* - incY: 数组 Y 中元素的增量
*/
void BLASNAME(cblas_sgemv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const float alpha, const float *A, const BLASINT lda,
const float *X, const BLASINT incX, const float beta,
float *Y, const BLASINT incY);
/*
* BLASNAME(cblas_sgbmv) 函数的声明
*
* 参数:
* - order: 矩阵的存储顺序
* - TransA: 矩阵 A 的转置方式
* - M: 矩阵 A 的行数
* - N: 矩阵 A 的列数
* - KL: 矩阵 A 下三角带的宽度
* - KU: 矩阵 A 上三角带的宽度
* - alpha: 浮点数倍乘因子
* - A: 单精度浮点数矩阵
* - lda: A 矩阵的行跨度
* - X: 单精度浮点数数组
* - incX: 数组 X 中元素的增量
* - beta: 浮点数倍乘因子
* - Y: 单精度浮点数数组
* - incY: 数组 Y 中元素的增量
*/
void BLASNAME(cblas_sgbmv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const BLASINT KL, const BLASINT KU, const float alpha,
const float *A, const BLASINT lda, const float *X,
const BLASINT incX, const float beta, float *Y, const BLASINT incY);
/*
* BLASNAME(cblas_strmv) 函数的声明
*
* 参数:
* - order: 矩阵的存储顺序
* - Uplo: 矩阵 A 的上/下三角部分
* - TransA: 矩阵 A 的转置方式
* - Diag: 矩阵 A 的对角元是否为单位矩阵
* - N: 矩阵 A 的阶数
* - A: 单精度浮点数矩阵
* - lda: A 矩阵的行跨度
* - X: 单精度浮点数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_strmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const float *A, const BLASINT lda,
float *X, const BLASINT incX);
/*
* BLASNAME(cblas_stbmv) 函数的声明
*
* 参数:
* - order: 矩阵的存储顺序
* - Uplo: 矩阵 A 的上/下三角部分
* - TransA: 矩阵 A 的转置方式
* - Diag: 矩阵 A 的对角元是否为单位矩阵
* - N: 矩阵 A 的阶数
* - K: 矩阵 A 的带宽
* - A: 单精度浮点数矩阵
* - lda: A 矩阵的行跨度
* - X: 单精度浮点数数组
* - incX: 数组 X 中元素的增量
*/
void BLASNAME(cblas_stbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BL
void BLASNAME(cblas_strsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const float *A, const BLASINT lda, float *X,
const BLASINT incX);
void BLASNAME(cblas_stbsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const float *A, const BLASINT lda,
float *X, const BLASINT incX);
void BLASNAME(cblas_stpsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const float *Ap, float *X, const BLASINT incX);
void BLASNAME(cblas_dgemv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const double alpha, const double *A, const BLASINT lda,
const double *X, const BLASINT incX, const double beta,
double *Y, const BLASINT incY);
void BLASNAME(cblas_dgbmv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const BLASINT KL, const BLASINT KU, const double alpha,
const double *A, const BLASINT lda, const double *X,
const BLASINT incX, const double beta, double *Y, const BLASINT incY);
void BLASNAME(cblas_dtrmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const double *A, const BLASINT lda,
double *X, const BLASINT incX);
void BLASNAME(cblas_dtbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const double *A, const BLASINT lda,
double *X, const BLASINT incX);
void BLASNAME(cblas_dtpmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const double *Ap, double *X, const BLASINT incX);
void BLASNAME(cblas_dtrsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const double *A, const BLASINT lda, double *X,
const BLASINT incX);
// Solve triangular banded system of equations with double precision
void BLASNAME(cblas_dtbsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const double *A, const BLASINT lda,
double *X, const BLASINT incX);
// Solve triangular packed system of equations with double precision
void BLASNAME(cblas_dtpsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const double *Ap, double *X, const BLASINT incX);
// Matrix-vector multiplication for complex numbers with single precision
void BLASNAME(cblas_cgemv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
const void *X, const BLASINT incX, const void *beta,
void *Y, const BLASINT incY);
// General banded matrix-vector multiplication for complex numbers with single precision
void BLASNAME(cblas_cgbmv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const BLASINT KL, const BLASINT KU, const void *alpha,
const void *A, const BLASINT lda, const void *X,
const BLASINT incX, const void *beta, void *Y, const BLASINT incY);
// Triangular matrix-vector multiplication for complex numbers with single precision
void BLASNAME(cblas_ctrmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// Triangular banded matrix-vector multiplication for complex numbers with single precision
void BLASNAME(cblas_ctbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// Triangular packed matrix-vector multiplication for complex numbers with single precision
void BLASNAME(cblas_ctpmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *Ap, void *X, const BLASINT incX);
// Solve triangular system of equations for complex numbers with single precision
void BLASNAME(cblas_ctrsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *A, const BLASINT lda, void *X,
const BLASINT incX);
// Solve triangular banded system of equations for complex numbers with single precision
void BLASNAME(cblas_ctbsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// cblas_ctpsv: 解决复数三角矩阵的向量方程,使用 CBLAS 库函数
void BLASNAME(cblas_ctpsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *Ap, void *X, const BLASINT incX);
// cblas_zgemv: 执行复数一般矩阵-向量乘法,使用 CBLAS 库函数
void BLASNAME(cblas_zgemv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
const void *X, const BLASINT incX, const void *beta,
void *Y, const BLASINT incY);
// cblas_zgbmv: 执行复数带状矩阵-向量乘法,使用 CBLAS 库函数
void BLASNAME(cblas_zgbmv)(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA, const BLASINT M, const BLASINT N,
const BLASINT KL, const BLASINT KU, const void *alpha,
const void *A, const BLASINT lda, const void *X,
const BLASINT incX, const void *beta, void *Y, const BLASINT incY);
// cblas_ztrmv: 执行复数三角矩阵-向量乘法,使用 CBLAS 库函数
void BLASNAME(cblas_ztrmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// cblas_ztbmv: 执行复数带状三角矩阵-向量乘法,使用 CBLAS 库函数
void BLASNAME(cblas_ztbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// cblas_ztpmv: 执行复数带状压缩三角矩阵-向量乘法,使用 CBLAS 库函数
void BLASNAME(cblas_ztpmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *Ap, void *X, const BLASINT incX);
// cblas_ztrsv: 解决复数三角矩阵的线性方程组,使用 CBLAS 库函数
void BLASNAME(cblas_ztrsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *A, const BLASINT lda, void *X,
const BLASINT incX);
// cblas_ztbsv: 解决复数带状三角矩阵的线性方程组,使用 CBLAS 库函数
void BLASNAME(cblas_ztbsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const BLASINT K, const void *A, const BLASINT lda,
void *X, const BLASINT incX);
// cblas_ztpsv: 解决复数带状压缩三角矩阵的线性方程组,使用 CBLAS 库函数
void BLASNAME(cblas_ztpsv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_DIAG Diag,
const BLASINT N, const void *Ap, void *X, const BLASINT incX);
/*
* Routines with S and D prefixes only
*/
void BLASNAME(cblas_ssymv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *A,
const BLASINT lda, const float *X, const BLASINT incX,
const float beta, float *Y, const BLASINT incY);
void BLASNAME(cblas_ssbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const BLASINT K, const float alpha, const float *A,
const BLASINT lda, const float *X, const BLASINT incX,
const float beta, float *Y, const BLASINT incY);
void BLASNAME(cblas_sspmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *Ap,
const float *X, const BLASINT incX,
const float beta, float *Y, const BLASINT incY);
void BLASNAME(cblas_sger)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const float alpha, const float *X, const BLASINT incX,
const float *Y, const BLASINT incY, float *A, const BLASINT lda);
void BLASNAME(cblas_ssyr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *X,
const BLASINT incX, float *A, const BLASINT lda);
void BLASNAME(cblas_sspr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *X,
const BLASINT incX, float *Ap);
void BLASNAME(cblas_ssyr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *X,
const BLASINT incX, const float *Y, const BLASINT incY, float *A,
const BLASINT lda);
void BLASNAME(cblas_sspr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const float *X,
const BLASINT incX, const float *Y, const BLASINT incY, float *A);
void BLASNAME(cblas_dsymv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *A,
const BLASINT lda, const double *X, const BLASINT incX,
const double beta, double *Y, const BLASINT incY);
void BLASNAME(cblas_dsbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const BLASINT K, const double alpha, const double *A,
const BLASINT lda, const double *X, const BLASINT incX,
const double beta, double *Y, const BLASINT incY);
/*
* BLASNAME(cblas_dspmv) 函数
*/
void BLASNAME(cblas_dspmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *Ap,
const double *X, const BLASINT incX,
const double beta, double *Y, const BLASINT incY);
/*
* BLASNAME(cblas_dger) 函数
*/
void BLASNAME(cblas_dger)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const double alpha, const double *X, const BLASINT incX,
const double *Y, const BLASINT incY, double *A, const BLASINT lda);
/*
* BLASNAME(cblas_dsyr) 函数
*/
void BLASNAME(cblas_dsyr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *X,
const BLASINT incX, double *A, const BLASINT lda);
/*
* BLASNAME(cblas_dspr) 函数
*/
void BLASNAME(cblas_dspr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *X,
const BLASINT incX, double *Ap);
/*
* BLASNAME(cblas_dsyr2) 函数
*/
void BLASNAME(cblas_dsyr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *X,
const BLASINT incX, const double *Y, const BLASINT incY, double *A,
const BLASINT lda);
/*
* BLASNAME(cblas_dspr2) 函数
*/
void BLASNAME(cblas_dspr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const double *X,
const BLASINT incX, const double *Y, const BLASINT incY, double *A);
/*
* 以下是只有 C 和 Z 前缀的例程
*/
/*
* BLASNAME(cblas_chemv) 函数
*/
void BLASNAME(cblas_chemv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const void *alpha, const void *A,
const BLASINT lda, const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
/*
* BLASNAME(cblas_chbmv) 函数
*/
void BLASNAME(cblas_chbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const BLASINT K, const void *alpha, const void *A,
const BLASINT lda, const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
/*
* BLASNAME(cblas_chpmv) 函数
*/
void BLASNAME(cblas_chpmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const void *alpha, const void *Ap,
const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
/*
* BLASNAME(cblas_cgeru) 函数
*/
void BLASNAME(cblas_cgeru)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
// 执行复杂数单精度通用矩阵-向量乘法:A = alpha * X * Y^H + A,其中 A 是复数矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_cgerc)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
// 执行复数单精度埃尔米特矩阵乘法:A = alpha * X * X^H + A,其中 A 是复数埃尔米特矩阵,X 是复数向量
void BLASNAME(cblas_cher)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const void *X, const BLASINT incX,
void *A, const BLASINT lda);
// 执行复数单精度埃尔米特矩阵乘法(packed 格式):A = alpha * X * X^H + A,其中 A 是复数埃尔米特矩阵,X 是复数向量
void BLASNAME(cblas_chpr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const float alpha, const void *X,
const BLASINT incX, void *A);
// 执行复数单精度埃尔米特矩阵乘法(level 2):A = alpha * X * Y^H + conj(alpha) * Y * X^H + A,
// 其中 A 是复数埃尔米特矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_cher2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
// 执行复数单精度埃尔米特矩阵乘法(packed 格式,level 2):A = alpha * X * Y^H + conj(alpha) * Y * X^H + A,
// 其中 A 是复数埃尔米特矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_chpr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *Ap);
// 执行复数双精度埃尔米特矩阵-向量乘法:Y = alpha * A * X + beta * Y,其中 A 是复数埃尔米特矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_zhemv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const void *alpha, const void *A,
const BLASINT lda, const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
// 执行复数双精度埃尔米特带状矩阵-向量乘法:Y = alpha * A * X + beta * Y,其中 A 是复数埃尔米特带状矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_zhbmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const BLASINT K, const void *alpha, const void *A,
const BLASINT lda, const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
// 执行复数双精度埃尔米特矩阵-向量乘法(packed 格式):Y = alpha * A * X + beta * Y,其中 A 是复数埃尔米特矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_zhpmv)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const void *alpha, const void *Ap,
const void *X, const BLASINT incX,
const void *beta, void *Y, const BLASINT incY);
// 执行复数双精度通用矩阵-向量乘法:A = alpha * X * Y^H + A,其中 A 是复数矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_zgeru)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
// 执行复数双精度通用矩阵-向量乘法(conjugate transposed 格式):A = alpha * X * Y^H + A,其中 A 是复数矩阵,X 和 Y 是复数向量
void BLASNAME(cblas_zgerc)(const enum CBLAS_ORDER order, const BLASINT M, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
// 执行复数双精度埃尔米特矩阵乘法:A = alpha * X * X^H + A,其中 A 是复数埃尔米特矩阵,X 是复数向量
void BLASNAME(cblas_zher)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const void *X, const BLASINT incX,
void *A, const BLASINT lda);
/*
* ===========================================================================
* Prototypes for level 3 BLAS
* ===========================================================================
*/
/*
* 原型定义了一些 Level 3 BLAS 函数,用于高效的矩阵运算,如矩阵乘法、矩阵向量乘法等。
*/
void BLASNAME(cblas_zhpr)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo,
const BLASINT N, const double alpha, const void *X,
const BLASINT incX, void *A);
/*
* 执行 Hermitian rank-1 update 操作,对复数 Hermitian 矩阵 A 进行更新,使用向量 X。
* Hermitian 矩阵 A 存储在 A 中,更新过程由 alpha 和 X 控制。
*/
void BLASNAME(cblas_zher2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *A, const BLASINT lda);
/*
* 执行 Hermitian rank-2 update 操作,对复数 Hermitian 矩阵 A 进行更新,使用向量 X 和 Y。
* Hermitian 矩阵 A 存储在 A 中,更新过程由 alpha、X 和 Y 控制。
*/
void BLASNAME(cblas_zhpr2)(const enum CBLAS_ORDER order, const enum CBLAS_UPLO Uplo, const BLASINT N,
const void *alpha, const void *X, const BLASINT incX,
const void *Y, const BLASINT incY, void *Ap);
/*
* 执行 Hermitian rank-2 update 操作,对复数 Hermitian 矩阵 A 进行更新,使用向量 X 和 Y。
* 更新后的结果存储在 Ap 中,更新过程由 alpha、X 和 Y 控制。
*/
/*
* Routines with standard 4 prefixes (S, D, C, Z)
*/
void BLASNAME(cblas_sgemm)(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N,
const BLASINT K, const float alpha, const float *A,
const BLASINT lda, const float *B, const BLASINT ldb,
const float beta, float *C, const BLASINT ldc);
/*
* 执行矩阵乘法运算 C = alpha * A * B + beta * C。
* A、B、C 分别是输入和输出矩阵,alpha 和 beta 是标量系数。
*/
void BLASNAME(cblas_ssymm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const BLASINT M, const BLASINT N,
const float alpha, const float *A, const BLASINT lda,
const float *B, const BLASINT ldb, const float beta,
float *C, const BLASINT ldc);
/*
* 执行对称矩阵乘法运算 C = alpha * A * B + beta * C 或者 C = alpha * B * A + beta * C,
* 具体操作取决于 Side 参数。A 是对称矩阵,B 和 C 是输入和输出矩阵。
*/
void BLASNAME(cblas_ssyrk)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const float alpha, const float *A, const BLASINT lda,
const float beta, float *C, const BLASINT ldc);
/*
* 执行对称矩阵乘积运算 C = alpha * A * A^T + beta * C 或者 C = alpha * A^T * A + beta * C,
* 具体操作取决于 Trans 参数。A 是输入矩阵,C 是输出矩阵。
*/
void BLASNAME(cblas_ssyr2k)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const float alpha, const float *A, const BLASINT lda,
const float *B, const BLASINT ldb, const float beta,
float *C, const BLASINT ldc);
/*
* 执行对称矩阵乘积运算 C = alpha * A * B^T + alpha * B * A^T + beta * C 或者
* C = alpha * A^T * B + alpha * B^T * A + beta * C,具体操作取决于 Trans 参数。
* A 和 B 是输入矩阵,C 是输出矩阵。
*/
void BLASNAME(cblas_strmm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const float alpha, const float *A, const BLASINT lda,
float *B, const BLASINT ldb);
/*
* 执行三角矩阵乘法运算 B = alpha * A * B 或者 B = alpha * B * A,
* 具体操作取决于 Side 参数。A 是三角矩阵,B 是输入和输出矩阵。
*/
// 调用 Level 3 BLAS 库中的 cblas_strsm 函数,用于解决形如 B = alpha * op(A) * B 或 B = alpha * B * op(A) 的矩阵方程
void BLASNAME(cblas_strsm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const float alpha, const float *A, const BLASINT lda,
float *B, const BLASINT ldb);
// 调用 Level 3 BLAS 库中的 cblas_dgemm 函数,执行一般矩阵乘法 C = alpha * op(A) * op(B) + beta * C
void BLASNAME(cblas_dgemm)(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N,
const BLASINT K, const double alpha, const double *A,
const BLASINT lda, const double *B, const BLASINT ldb,
const double beta, double *C, const BLASINT ldc);
// 调用 Level 3 BLAS 库中的 cblas_dsymm 函数,执行对称矩阵乘法 C = alpha * A * B + beta * C 或 C = alpha * B * A + beta * C
void BLASNAME(cblas_dsymm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const BLASINT M, const BLASINT N,
const double alpha, const double *A, const BLASINT lda,
const double *B, const BLASINT ldb, const double beta,
double *C, const BLASINT ldc);
// 调用 Level 3 BLAS 库中的 cblas_dsyrk 函数,执行对称矩阵乘法 C = alpha * op(A) * op(A)^T + beta * C 或 C = alpha * op(A)^T * op(A) + beta * C
void BLASNAME(cblas_dsyrk)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const double alpha, const double *A, const BLASINT lda,
const double beta, double *C, const BLASINT ldc);
// 调用 Level 3 BLAS 库中的 cblas_dsyr2k 函数,执行两个对称矩阵的乘法 C = alpha * op(A) * op(B)^T + alpha * op(B) * op(A)^T + beta * C
void BLASNAME(cblas_dsyr2k)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const double alpha, const double *A, const BLASINT lda,
const double *B, const BLASINT ldb, const double beta,
double *C, const BLASINT ldc);
// 调用 Level 3 BLAS 库中的 cblas_dtrmm 函数,用于解决形如 B = alpha * op(A) * B 或 B = alpha * B * op(A) 的三角矩阵方程
void BLASNAME(cblas_dtrmm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const double alpha, const double *A, const BLASINT lda,
double *B, const BLASINT ldb);
// 调用 Level 3 BLAS 库中的 cblas_dtrsm 函数,用于解决形如 B = alpha * op(A) * B 或 B = alpha * B * op(A) 的三角矩阵方程
void BLASNAME(cblas_dtrsm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const double alpha, const double *A, const BLASINT lda,
double *B, const BLASINT ldb);
void BLASNAME(cblas_cgemm)(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N,
const BLASINT K, const void *alpha, const void *A,
const BLASINT lda, const void *B, const BLASINT ldb,
const void *beta, void *C, const BLASINT ldc);
void BLASNAME(cblas_csymm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
const void *B, const BLASINT ldb, const void *beta,
void *C, const BLASINT ldc);
void BLASNAME(cblas_csyrk)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const void *alpha, const void *A, const BLASINT lda,
const void *beta, void *C, const BLASINT ldc);
void BLASNAME(cblas_csyr2k)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const void *alpha, const void *A, const BLASINT lda,
const void *B, const BLASINT ldb, const void *beta,
void *C, const BLASINT ldc);
void BLASNAME(cblas_ctrmm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
void *B, const BLASINT ldb);
void BLASNAME(cblas_ctrsm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_DIAG Diag, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
void *B, const BLASINT ldb);
void BLASNAME(cblas_zgemm)(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA,
const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N,
const BLASINT K, const void *alpha, const void *A,
const BLASINT lda, const void *B, const BLASINT ldb,
const void *beta, void *C, const BLASINT ldc);
/*
* BLASNAME(cblas_zsymm)函数:
* 实现复数对称矩阵乘法,计算 C := alpha * A * B + beta * C 或者 C := alpha * B * A + beta * C,依赖于 Side 参数
* Order:矩阵存储顺序
* Side:指定 A 出现在 B 的左侧还是右侧
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* M:矩阵 C 的行数
* N:矩阵 C 的列数
* alpha:复数标量,用于乘法操作
* A:复数对称矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* B:复数矩阵 B
* ldb:B 矩阵的列数
* beta:复数标量,用于乘法操作
* C:结果矩阵 C
* ldc:C 矩阵的列数
*/
/*
* BLASNAME(cblas_zsyrk)函数:
* 实现复数对称矩阵乘法,计算 C := alpha * A * A^T + beta * C 或者 C := alpha * A^T * A + beta * C,依赖于 Trans 参数
* Order:矩阵存储顺序
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* Trans:指定 A 是否进行转置操作
* N:矩阵 C 的阶数
* K:矩阵 A 的列数或行数,依赖于 Trans 参数
* alpha:复数标量,用于乘法操作
* A:复数矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* beta:复数标量,用于乘法操作
* C:结果矩阵 C
* ldc:C 矩阵的列数
*/
/*
* BLASNAME(cblas_zsyr2k)函数:
* 实现复数对称矩阵乘法,计算 C := alpha * A * B^T + alpha * B * A^T + beta * C 或者 C := alpha * A^T * B + alpha * B^T * A + beta * C,依赖于 Trans 参数
* Order:矩阵存储顺序
* Uplo:指定矩阵 A 和 B 的存储类型(上三角或下三角)
* Trans:指定 A 和 B 是否进行转置操作
* N:矩阵 C 的阶数
* K:矩阵 A 和 B 的列数或行数,依赖于 Trans 参数
* alpha:复数标量,用于乘法操作
* A:复数矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* B:复数矩阵 B
* ldb:B 矩阵的列数
* beta:复数标量,用于乘法操作
* C:结果矩阵 C
* ldc:C 矩阵的列数
*/
/*
* BLASNAME(cblas_ztrmm)函数:
* 实现复数矩阵的三角矩阵乘法,计算 B := alpha * op(A) * B 或者 B := alpha * B * op(A),依赖于 Side 和 TransA 参数
* Order:矩阵存储顺序
* Side:指定 op(A) 出现在 B 的左侧还是右侧
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* TransA:指定 A 是否进行转置操作
* Diag:指定是否使用 A 的对角线元素
* M:矩阵 B 的行数
* N:矩阵 B 的列数
* alpha:复数标量,用于乘法操作
* A:复数矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* B:结果矩阵 B
* ldb:B 矩阵的列数
*/
/*
* BLASNAME(cblas_ztrsm)函数:
* 实现复数矩阵的三角矩阵解方程,计算 B := alpha * op(A)^{-1} * B 或者 B := alpha * B * op(A)^{-1},依赖于 Side 和 TransA 参数
* Order:矩阵存储顺序
* Side:指定 op(A) 出现在 B 的左侧还是右侧
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* TransA:指定 A 是否进行转置操作
* Diag:指定是否使用 A 的对角线元素
* M:矩阵 B 的行数
* N:矩阵 B 的列数
* alpha:复数标量,用于乘法操作
* A:复数矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* B:结果矩阵 B
* ldb:B 矩阵的列数
*/
/*
* BLASNAME(cblas_chemm)函数:
* 实现复数 Hermite 矩阵的矩阵乘法,计算 C := alpha * A * B + beta * C 或者 C := alpha * B * A + beta * C,依赖于 Side 参数
* Order:矩阵存储顺序
* Side:指定 A 出现在 B 的左侧还是右侧
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* M:矩阵 C 的行数
* N:矩阵 C 的列数
* alpha:复数标量,用于乘法操作
* A:复数 Hermite 矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* B:复数矩阵 B
* ldb:B 矩阵的列数
* beta:复数标量,用于乘法操作
* C:结果矩阵 C
* ldc:C 矩阵的列数
*/
/*
* BLASNAME(cblas_cherk)函数:
* 实现复数 Hermite 矩阵的乘积与其转置的乘积的厄米矩阵,计算 C := alpha * A * A^H + beta * C 或者 C := alpha * A^H * A + beta * C,依赖于 Trans 参数
* Order:矩阵存储顺序
* Uplo:指定矩阵 A 的存储类型(上三角或下三角)
* Trans:指定 A 是否进行转置操作
* N:矩阵 C 的阶数
* K:矩阵 A 的列数或行数,依赖于 Trans 参数
* alpha:实数标量,用于乘法操作
* A:复数矩阵 A
* lda:A 矩阵的列数(对于 CUBLAS,通常为 A 矩阵的行数)
* beta:实数标量,用于乘法操作
* C:结果矩阵 C
* ldc:C 矩阵的列数
*/
void BLASNAME(cblas_cher2k)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const void *alpha, const void *A, const BLASINT lda,
const void *B, const BLASINT ldb, const float beta,
void *C, const BLASINT ldc);
void BLASNAME(cblas_zhemm)(const enum CBLAS_ORDER Order, const enum CBLAS_SIDE Side,
const enum CBLAS_UPLO Uplo, const BLASINT M, const BLASINT N,
const void *alpha, const void *A, const BLASINT lda,
const void *B, const BLASINT ldb, const void *beta,
void *C, const BLASINT ldc);
void BLASNAME(cblas_zherk)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const double alpha, const void *A, const BLASINT lda,
const double beta, void *C, const BLASINT ldc);
void BLASNAME(cblas_zher2k)(const enum CBLAS_ORDER Order, const enum CBLAS_UPLO Uplo,
const enum CBLAS_TRANSPOSE Trans, const BLASINT N, const BLASINT K,
const void *alpha, const void *A, const BLASINT lda,
const void *B, const BLASINT ldb, const double beta,
void *C, const BLASINT ldc);
void BLASNAME(cblas_xerbla)(BLASINT p, const char *rout, const char *form, ...);
.\numpy\numpy\_core\src\common\npy_config.h
/* blocklist */
/* 在 z/OS 上禁用已知有问题的函数 */
/* 在 MinGW 下禁用已知有问题的 MS 数学函数 */
/* 在 MSVC 下禁用已知有问题的数学函数 */
/* MSVC _hypot 在 32 位模式下影响浮点精度模式,参见 gh-9567 */
/* Intel C 编译器在 Windows 上对 64 位 longdouble 使用 POW */
/* powl 在 OS X 上会产生零除警告,参见 gh-8307 */
/* 由于精度丢失,禁用一些函数 */
/* 由于精度丢失,禁用一些函数 */
/* 由于分支切割,禁用一些函数 */
/* 由于分支切割,禁用一些函数 */
/* 由于分支切割,禁用一些函数 */
/* 由于分支切割和精度丢失,禁用一些函数 */
/* 由于分支切割,禁用一些函数 */
/* log2(exp2(i)) 会有几个 eps 的偏差 */
/* np.power(..., dtype=np.complex256) 不会报告溢出 */
/*
* Cygwin 使用 newlib,其复数对数函数实现比较简单。
*/
// 不支持低于 3.3 版本的 Cygwin,提示用户更新
/* 禁用有问题的 GNU 三角函数 */
/*
* 定义部分宏以确保在 GLIBC 2.18 以下的版本中不使用一些复杂数学函数,
* 因为这些函数可能在旧版本中不存在或者有不兼容的实现。
* 另外,针对 musl libc 进行类似的宏定义,这是一个独立的 C 库。
*/
/*
* 如果不是使用 GLIBC 标准 C 库,可能是使用 musl libc,这是另一个独立的 C 库。
* 在这种情况下,取消定义一些复杂数学函数,以避免潜在的兼容性问题。
*/
/*
* musl libc 中的 clog 函数对某些输入具有低精度。从 MUSL 1.2.5 版本开始,
* clog.c 中的第一个注释是 "// FIXME"。
* 参考 https://github.com/numpy/numpy/pull/24416
* 和 https://github.com/numpy/numpy/pull/24448
* 这里取消定义复数对数函数及其浮点数版本,可能是为了避免精度问题。
*/
.\numpy\numpy\_core\src\common\npy_cpuinfo_parser.h
/*
* Copyright (C) 2010 The Android Open Source Project
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
* OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
* AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
// arch/arm/include/uapi/asm/hwcap.h
// arch/arm64/include/uapi/asm/hwcap.h
/*
* Get the size of a file by reading it until the end. This is needed
* because files under /proc do not always return a valid size when
* using fseek(0, SEEK_END) + ftell(). Nor can they be mmap()-ed.
*/
static int
get_file_size(const char* pathname)
{
int fd, result = 0;
char buffer[256];
// 打开指定路径的文件,只读方式
fd = open(pathname, O_RDONLY);
if (fd < 0) {
return -1; // 如果打开失败,返回错误码 -1
}
// 循环读取文件内容计算文件大小
for (;;) {
int ret = read(fd, buffer, sizeof buffer);
if (ret < 0) {
if (errno == EINTR) {
continue; // 如果是被中断,继续读取
}
break; // 出现其他错误则跳出循环
}
if (ret == 0) {
break; // 读取到文件末尾,跳出循环
}
result += ret; // 累加已读取的字节数
}
close(fd); // 关闭文件描述符
return result; // 返回文件大小
}
/*
* Read the content of /proc/cpuinfo into a user-provided buffer.
* Return the length of the data, or -1 on error. Does *not*
* zero-terminate the content. Will not read more
* than 'buffsize' bytes.
*/
static int
read_file(const char* pathname, char* buffer, size_t buffsize)
{
int fd, count;
fd = open(pathname, O_RDONLY); // 打开指定路径的文件,只读模式
if (fd < 0) { // 如果打开文件失败
return -1; // 返回错误状态
}
count = 0; // 初始化计数器
// 循环读取文件内容,直到达到指定的缓冲区大小或出错
while (count < (int)buffsize) {
int ret = read(fd, buffer + count, buffsize - count); // 从文件中读取数据到缓冲区中
if (ret < 0) { // 如果读取操作返回错误
if (errno == EINTR) { // 如果是被中断信号中断
continue; // 继续读取
}
if (count == 0) { // 如果在开始读取前就出错
count = -1; // 返回错误状态
}
break; // 退出循环
}
if (ret == 0) { // 如果读取到文件末尾
break; // 退出循环
}
count += ret; // 更新已读取数据的字节数
}
close(fd); // 关闭文件
return count; // 返回读取到的数据长度或错误状态
}
/*
* Extract the content of a the first occurrence of a given field in
* the content of /proc/cpuinfo and return it as a heap-allocated
* string that must be freed by the caller.
*
* Return NULL if not found
*/
static char*
extract_cpuinfo_field(const char* buffer, int buflen, const char* field)
{
int fieldlen = strlen(field); // 计算字段的长度
const char* bufend = buffer + buflen; // 缓冲区结束位置
char* result = NULL; // 初始化结果指针为NULL
int len;
const char *p, *q;
/* Look for first field occurrence, and ensures it starts the line. */
p = buffer; // 从缓冲区开始查找
for (;;) {
p = memmem(p, bufend-p, field, fieldlen); // 在缓冲区中查找字段的第一次出现
if (p == NULL) { // 如果未找到字段
goto EXIT; // 跳转到退出处理
}
if (p == buffer || p[-1] == '\n') { // 确保字段在行的开头
break; // 找到符合条件的字段,退出循环
}
p += fieldlen; // 继续向后查找
}
/* Skip to the first column followed by a space */
p += fieldlen; // 跳过字段本身
p = memchr(p, ':', bufend-p); // 查找字段后的冒号
if (p == NULL || p[1] != ' ') { // 如果未找到冒号或冒号后不是空格
goto EXIT; // 跳转到退出处理
}
/* Find the end of the line */
p += 2; // 跳过冒号和空格
q = memchr(p, '\n', bufend-p); // 查找行末尾的换行符
if (q == NULL) { // 如果未找到换行符
q = bufend; // 将结束位置设置为缓冲区末尾
}
/* Copy the line into a heap-allocated buffer */
len = q - p; // 计算行的长度
result = malloc(len + 1); // 分配内存保存行数据,需由调用者释放
if (result == NULL) { // 如果内存分配失败
goto EXIT; // 跳转到退出处理
}
memcpy(result, p, len); // 复制行数据到结果缓冲区
result[len] = '\0'; // 添加字符串结尾标志
EXIT:
return result; // 返回提取的字段内容或NULL(未找到)
}
/*
* Checks that a space-separated list of items contains one given 'item'.
* Returns 1 if found, 0 otherwise.
*/
static int
has_list_item(const char* list, const char* item)
{
const char* p = list; // 指向列表起始位置
int itemlen = strlen(item); // 计算待查找项的长度
if (list == NULL) { // 如果列表为空
return 0; // 直接返回未找到
}
while (*p) { // 循环遍历列表
const char* q;
/* skip spaces */
while (*p == ' ' || *p == '\t') { // 跳过空格和制表符
p++;
}
/* find end of current list item */
q = p;
while (*q && *q != ' ' && *q != '\t') { // 查找当前列表项的末尾
q++;
}
if (itemlen == q-p && !memcmp(p, item, itemlen)) { // 比较当前项与目标项是否相等
return 1; // 找到目标项,返回1
}
/* skip to next item */
p = q; // 移动到下一个列表项
}
return 0; // 未找到目标项,返回0
}
static void setHwcap(char* cpuFeatures, unsigned long* hwcap) {
*hwcap |= has_list_item(cpuFeatures, "neon") ? NPY__HWCAP_NEON : 0; // 检查CPU特性中是否包含"neon",设置对应的标志位
}
*hwcap |= has_list_item(cpuFeatures, "half") ? NPY__HWCAP_HALF : 0;
*hwcap |= has_list_item(cpuFeatures, "vfpv3") ? NPY__HWCAP_VFPv3 : 0;
*hwcap |= has_list_item(cpuFeatures, "vfpv4") ? NPY__HWCAP_VFPv4 : 0;
*hwcap |= has_list_item(cpuFeatures, "asimd") ? NPY__HWCAP_ASIMD : 0;
*hwcap |= has_list_item(cpuFeatures, "fp") ? NPY__HWCAP_FP : 0;
*hwcap |= has_list_item(cpuFeatures, "fphp") ? NPY__HWCAP_FPHP : 0;
*hwcap |= has_list_item(cpuFeatures, "asimdhp") ? NPY__HWCAP_ASIMDHP : 0;
*hwcap |= has_list_item(cpuFeatures, "asimddp") ? NPY__HWCAP_ASIMDDP : 0;
*hwcap |= has_list_item(cpuFeatures, "asimdfhm") ? NPY__HWCAP_ASIMDFHM : 0;
static int
get_feature_from_proc_cpuinfo(unsigned long *hwcap, unsigned long *hwcap2) {
// 声明一个指向字符的指针cpuinfo,初始设为NULL
char* cpuinfo = NULL;
// 声明一个整型变量cpuinfo_len,用于存储读取的文件大小
int cpuinfo_len;
// 调用get_file_size函数获取/proc/cpuinfo文件的大小
cpuinfo_len = get_file_size("/proc/cpuinfo");
// 如果获取文件大小失败,则返回0
if (cpuinfo_len < 0) {
return 0;
}
// 分配cpuinfo_len大小的内存给cpuinfo,存储/proc/cpuinfo文件的内容
cpuinfo = malloc(cpuinfo_len);
// 如果内存分配失败,则返回0
if (cpuinfo == NULL) {
return 0;
}
// 重新调用read_file函数,读取/proc/cpuinfo文件的内容到cpuinfo中
cpuinfo_len = read_file("/proc/cpuinfo", cpuinfo, cpuinfo_len);
// 从cpuinfo中提取"Features"字段的值,存储在cpuFeatures中
char* cpuFeatures = extract_cpuinfo_field(cpuinfo, cpuinfo_len, "Features");
// 如果未能提取到cpuFeatures,则返回0
if(cpuFeatures == NULL) {
return 0;
}
// 调用setHwcap函数,解析cpuFeatures并设置hwcap的值
setHwcap(cpuFeatures, hwcap);
// 将hwcap的值加到hwcap2
*hwcap2 |= *hwcap;
// 如果cpuFeatures中包含"aes"项,则设置NPY__HWCAP2_AES到hwcap2
*hwcap2 |= has_list_item(cpuFeatures, "aes") ? NPY__HWCAP2_AES : 0;
// 如果cpuFeatures中包含"pmull"项,则设置NPY__HWCAP2_PMULL到hwcap2
*hwcap2 |= has_list_item(cpuFeatures, "pmull") ? NPY__HWCAP2_PMULL : 0;
// 如果cpuFeatures中包含"sha1"项,则设置NPY__HWCAP2_SHA1到hwcap2
*hwcap2 |= has_list_item(cpuFeatures, "sha1") ? NPY__HWCAP2_SHA1 : 0;
// 如果cpuFeatures中包含"sha2"项,则设置NPY__HWCAP2_SHA2到hwcap2
*hwcap2 |= has_list_item(cpuFeatures, "sha2") ? NPY__HWCAP2_SHA2 : 0;
// 如果cpuFeatures中包含"crc32"项,则设置NPY__HWCAP2_CRC32到hwcap2
*hwcap2 |= has_list_item(cpuFeatures, "crc32") ? NPY__HWCAP2_CRC32 : 0;
// 返回1,表示函数执行成功
return 1;
}
.\numpy\numpy\_core\src\common\npy_cpu_dispatch.c
// 定义宏,禁用过时的 NumPy API,并设置为当前 API 版本
// 定义宏,标识该文件属于多维数组模块
// 包含必要的头文件
// 初始化 CPU 分发追踪器
NPY_VISIBILITY_HIDDEN int
npy_cpu_dispatch_tracer_init(PyObject *mod)
{
// 如果 CPU 分发注册表已经初始化,则抛出运行时错误
if (npy_static_pydata.cpu_dispatch_registry != NULL) {
PyErr_Format(PyExc_RuntimeError, "CPU dispatcher tracer already initlized");
return -1;
}
// 获取模块的字典
PyObject *mod_dict = PyModule_GetDict(mod);
if (mod_dict == NULL) {
return -1;
}
// 创建一个新的字典作为注册表
PyObject *reg_dict = PyDict_New();
if (reg_dict == NULL) {
return -1;
}
// 将注册表添加到模块字典中
int err = PyDict_SetItemString(mod_dict, "__cpu_targets_info__", reg_dict);
Py_DECREF(reg_dict); // 减少字典的引用计数
if (err != 0) {
return -1;
}
// 将注册表赋给静态数据结构中的 CPU 分发注册表
npy_static_pydata.cpu_dispatch_registry = reg_dict;
return 0;
}
// CPU 分发追踪函数
NPY_VISIBILITY_HIDDEN void
npy_cpu_dispatch_trace(const char *fname, const char *signature,
const char **dispatch_info)
{
// 获取函数名对应的函数字典
PyObject *func_dict = PyDict_GetItemString(npy_static_pydata.cpu_dispatch_registry, fname);
if (func_dict == NULL) {
// 如果函数字典不存在,则创建一个新的函数字典
func_dict = PyDict_New();
if (func_dict == NULL) {
return;
}
// 将新创建的函数字典添加到注册表中
int err = PyDict_SetItemString(npy_static_pydata.cpu_dispatch_registry, fname, func_dict);
Py_DECREF(func_dict); // 减少函数字典的引用计数
if (err != 0) {
return;
}
}
// 为每个签名创建目标信息的字典
PyObject *sig_dict = PyDict_New();
if (sig_dict == NULL) {
return;
}
// 将签名信息字典添加到函数字典中
int err = PyDict_SetItemString(func_dict, signature, sig_dict);
Py_DECREF(sig_dict); // 减少签名信息字典的引用计数
if (err != 0) {
return;
}
// 添加当前调度的目标到签名信息字典中
PyObject *current_target = PyUnicode_FromString(dispatch_info[0]);
if (current_target == NULL) {
return;
}
err = PyDict_SetItemString(sig_dict, "current", current_target);
Py_DECREF(current_target); // 减少当前目标字符串的引用计数
if (err != 0) {
return;
}
// 添加可用目标信息到签名信息字典中
PyObject *available = PyUnicode_FromString(dispatch_info[1]);
if (available == NULL) {
return;
}
err = PyDict_SetItemString(sig_dict, "available", available);
Py_DECREF(available); // 减少可用目标字符串的引用计数
if (err != 0) {
return;
}
}
.\numpy\numpy\_core\src\common\npy_cpu_dispatch.h
/**
* This file is part of the NumPy CPU dispatcher.
*
* Please have a look at doc/reference/simd-optimizations.html
* To get a better understanding of the mechanism behind it.
*/
/*
* "altivec.h" header contains the definitions(bool, vector, pixel),
* usually in c++ we undefine them after including the header.
* It's better anyway to take them off and use built-in types(__vector, __pixel, __bool) instead,
* since c99 supports bool variables which may lead to ambiguous errors.
*/
// backup 'bool' before including 'npy_cpu_dispatch_config.h', since it may not defined as a compiler token.
#define NPY__CPU_DISPATCH_GUARD_BOOL
typedef bool npy__cpu_dispatch_guard_bool;
#endif
/**
* Including the main configuration header 'npy_cpu_dispatch_config.h'.
* This header is generated by the 'ccompiler_opt' distutils module and the Meson build system.
*
* For the distutils-generated version, it contains:
* - Headers for platform-specific instruction sets.
* - Feature #definitions, e.g. NPY_HAVE_AVX2.
* - Helper macros that encapsulate enabled features through user-defined build options
* '--cpu-baseline' and '--cpu-dispatch'. These options are essential for implementing
* attributes like `__cpu_baseline__` and `__cpu_dispatch__` in the NumPy module.
*
* For the Meson-generated version, it contains:
* - Headers for platform-specific instruction sets.
* - Helper macros that encapsulate enabled features through user-defined build options
* '--cpu-baseline' and '--cpu-dispatch'. These options remain crucial for implementing
* attributes like `__cpu_baseline__` and `__cpu_dispatch__` in the NumPy module.
* - Additional helper macros necessary for runtime dispatching.
*
* Note: In the Meson build, features #definitions are conveyed via compiler arguments.
*/
#include "npy_cpu_dispatch_config.h"
#ifndef NPY__CPU_MESON_BUILD
// Define helper macros necessary for runtime dispatching for distutils.
#include "npy_cpu_dispatch_distutils.h"
#endif
#if defined(NPY_HAVE_VSX) || defined(NPY_HAVE_VX)
#undef bool
#undef vector
#undef pixel
#ifdef NPY__CPU_DISPATCH_GUARD_BOOL
#define bool npy__cpu_dispatch_guard_bool
#undef NPY__CPU_DISPATCH_GUARD_BOOL
#endif
#endif
/**
* Initialize the CPU dispatch tracer.
*
* This function simply adds an empty dictionary with the attribute
* '__cpu_targets_info__' to the provided module.
*
* It should be called only once during the loading of the NumPy module.
* Note: This function is not thread-safe.
*
* @param mod The module to which the '__cpu_targets_info__' dictionary will be added.
* @return 0 on success.
*/
NPY_VISIBILITY_HIDDEN int
/**
* Initialize the CPU dispatch tracer for the given Python object module.
*
* This function initializes the CPU dispatch tracer for the specified Python object module.
* It prepares to insert data into the '__cpu_targets_info__' dictionary, mapping function names to dispatch information.
*
* Note: This function is declared but not defined here; it is expected to be defined elsewhere.
*
* Example:
* npy_cpu_dispatch_tracer_init(PyObject *mod);
*
* @param mod Python object module to initialize the CPU dispatch tracer for.
*/
npy_cpu_dispatch_tracer_init(PyObject *mod);
/**
* Insert data into the initialized '__cpu_targets_info__' dictionary.
*
* This function adds the function name as a key and another dictionary as a value.
* The inner dictionary holds the 'signature' as a key and splits 'dispatch_info' into another dictionary.
* The innermost dictionary contains the current enabled target as 'current' and available targets as 'available'.
*
* Note: This function should not be used directly; it should be used through the macro NPY_CPU_DISPATCH_TRACE(),
* which is responsible for filling in the enabled CPU targets.
*
* Example:
*
* const char *dispatch_info[] = {"AVX2", "AVX512_SKX AVX2 baseline"};
* npy_cpu_dispatch_trace("add", "bbb", dispatch_info);
*
* const char *dispatch_info[] = {"AVX2", "AVX2 SSE41 baseline"};
* npy_cpu_dispatch_trace("add", "BBB", dispatch_info);
*
* This will insert the following structure into the '__cpu_targets_info__' dictionary:
*
* numpy._core._multiarray_umath.__cpu_targets_info__
* {
* "add": {
* "bbb": {
* "current": "AVX2",
* "available": "AVX512_SKX AVX2 baseline"
* },
* "BBB": {
* "current": "AVX2",
* "available": "AVX2 SSE41 baseline"
* },
* },
* }
*
* @param func_name The name of the function.
* @param signature The signature of the function.
* @param dispatch_info The information about CPU dispatching.
*/
NPY_VISIBILITY_HIDDEN void
npy_cpu_dispatch_trace(const char *func_name, const char *signature,
const char **dispatch_info);
/**
* Macro to trace CPU dispatch for the specified function name and signature.
*
* This macro extracts the enabled CPU targets from the generated configuration file
* and calls 'npy_cpu_dispatch_trace()' to insert a new item into the '__cpu_targets_info__' dictionary.
*
* Example usage:
* #include "arithmetic.dispatch.h"
* NPY_CPU_DISPATCH_CALL(BYTE_add_ptr = BYTE_add);
* NPY_CPU_DISPATCH_TRACE("add", "bbb");
*
* @param FNAME The name of the function.
* @param SIGNATURE The signature of the function.
*/
#define NPY_CPU_DISPATCH_TRACE(FNAME, SIGNATURE) \
{ \
const char *dinfo[] = NPY_CPU_DISPATCH_INFO(); \
npy_cpu_dispatch_trace(FNAME, SIGNATURE, dinfo); \
} while(0)
.\numpy\numpy\_core\src\common\npy_cpu_dispatch_distutils.h
/**
* This header should be removed after support for distutils is removed.
* It provides helper macros required for CPU runtime dispatching,
* which are already defined within `meson_cpu/main_config.h.in`.
*
* The following macros are explained within `meson_cpu/main_config.h.in`,
* although there are some differences in their usage:
*
* - Dispatched targets must be defined at the top of each dispatch-able
* source file within an inline or multi-line comment block.
* For example: //@targets baseline SSE2 AVX2 AVX512_SKX
*
* - The generated configuration derived from each dispatch-able source
* file must be guarded with `
* For example:
*
*
*
*/
// 'NPY__CPU_TARGET_CURRENT': only defined by the dispatch-able sources
/**
* Defining the default behavior for the configurable macros of dispatch-able sources,
* 'NPY__CPU_DISPATCH_CALL(...)' and 'NPY__CPU_DISPATCH_BASELINE_CALL(...)'
*
* These macros are defined inside the generated config files that have been derived from
* the configuration statements of the dispatch-able sources.
*
* The generated config file takes the same name of the dispatch-able source with replacing
* the extension to '.h' instead of '.c', and it should be treated as a header template.
*/
&&"Expected config header of the dispatch-able source";
&&"Expected config header of the dispatch-able source";
/**
* We assume by default that all configuration statements contain 'baseline' option, however,
* if the dispatch-able source doesn't require it, then the dispatch-able source and following macros
* need to be guarded with '
*/
NPY_EXPAND(CB(__VA_ARGS__))
NPY__CPU_DISPATCH_CALL(NPY_CPU_DISPATCH_DECLARE_CHK_, NPY_CPU_DISPATCH_DECLARE_CB_, __VA_ARGS__) \
NPY__CPU_DISPATCH_BASELINE_CALL(NPY_CPU_DISPATCH_DECLARE_BASE_CB_, __VA_ARGS__)
// Preprocessor callbacks
// Placeholder macro for defining callback behavior based on dispatch targets
NPY_CAT(NPY_CAT(LEFT, _), TARGET_NAME) __VA_ARGS__;
LEFT __VA_ARGS__;
// 定义一个宏,展开为给定的左参数,后跟可变参数列表
// Dummy CPU runtime checking
// 定义一个宏,用于虚拟的 CPU 运行时检查,该宏为空
NPY__CPU_DISPATCH_CALL(NPY_CPU_DISPATCH_DECLARE_CHK_, NPY_CPU_DISPATCH_DECLARE_CB_, __VA_ARGS__)
// 定义一个宏,展开为调用 NPY__CPU_DISPATCH_CALL 宏,传入 NPY_CPU_DISPATCH_DECLARE_CHK_ 和 NPY_CPU_DISPATCH_DECLARE_CB_ 宏以及给定的可变参数列表
NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, NPY_CPU_DISPATCH_CALL_CB_, __VA_ARGS__) \
NPY__CPU_DISPATCH_BASELINE_CALL(NPY_CPU_DISPATCH_CALL_BASE_CB_, __VA_ARGS__)
// 定义一个宏,展开为调用 NPY__CPU_DISPATCH_CALL 和 NPY__CPU_DISPATCH_BASELINE_CALL 宏,传入 NPY_CPU_HAVE、NPY_CPU_DISPATCH_CALL_CB_ 和 NPY_CPU_DISPATCH_CALL_BASE_CB_ 宏以及给定的可变参数列表
// Preprocessor callbacks
(TESTED_FEATURES) ? (NPY_CAT(NPY_CAT(LEFT, _), TARGET_NAME) __VA_ARGS__) :
// 定义预处理器回调宏,根据 TESTED_FEATURES 条件展开为 LEFT_TARGET_NAME 或者空
(LEFT __VA_ARGS__)
// 定义预处理器基础回调宏,展开为 LEFT 加上给定的可变参数列表
NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, NPY_CPU_DISPATCH_CALL_XB_CB_, __VA_ARGS__) \
((void) 0 /* discarded expression value */)
// 定义一个宏,展开为调用 NPY__CPU_DISPATCH_CALL 和 ((void) 0) 的组合,传入 NPY_CPU_HAVE、NPY_CPU_DISPATCH_CALL_XB_CB_ 宏以及给定的可变参数列表,且忽略表达式的值
(TESTED_FEATURES) ? (void) (NPY_CAT(NPY_CAT(LEFT, _), TARGET_NAME) __VA_ARGS__) :
// 定义预处理器回调宏,根据 TESTED_FEATURES 条件展开为 void 类型的 LEFT_TARGET_NAME 或者空
(NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, NPY_CPU_DISPATCH_CALL_ALL_CB_, __VA_ARGS__) \
NPY__CPU_DISPATCH_BASELINE_CALL(NPY_CPU_DISPATCH_CALL_ALL_BASE_CB_, __VA_ARGS__))
// 定义一个宏,展开为调用 NPY__CPU_DISPATCH_CALL 和 NPY__CPU_DISPATCH_BASELINE_CALL 宏,传入 NPY_CPU_HAVE、NPY_CPU_DISPATCH_CALL_ALL_CB_ 和 NPY_CPU_DISPATCH_CALL_ALL_BASE_CB_ 宏以及给定的可变参数列表
// Preprocessor callbacks
((TESTED_FEATURES) ? (NPY_CAT(NPY_CAT(LEFT, _), TARGET_NAME) __VA_ARGS__) : (void) 0),
// 定义预处理器回调宏,根据 TESTED_FEATURES 条件展开为 LEFT_TARGET_NAME 或者空,并在否定情况下返回空
( LEFT __VA_ARGS__ )
// 定义预处理器基础回调宏,展开为 LEFT 加上给定的可变参数列表
{ \
NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, NPY_CPU_DISPATCH_INFO_HIGH_CB_, DUMMY) \
NPY__CPU_DISPATCH_BASELINE_CALL(NPY_CPU_DISPATCH_INFO_BASE_HIGH_CB_, DUMMY) \
"", \
NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, NPY_CPU_DISPATCH_INFO_CB_, DUMMY) \
NPY__CPU_DISPATCH_BASELINE_CALL(NPY_CPU_DISPATCH_INFO_BASE_CB_, DUMMY) \
""\
}
// 定义一个宏,展开为一个包含两个调用的代码块,分别调用 NPY__CPU_DISPATCH_CALL 和 NPY__CPU_DISPATCH_BASELINE_CALL 宏,传入 NPY_CPU_HAVE、NPY_CPU_DISPATCH_INFO_HIGH_CB_、NPY_CPU_DISPATCH_INFO_BASE_HIGH_CB_、NPY_CPU_DISPATCH_INFO_CB_ 和 NPY_CPU_DISPATCH_INFO_BASE_CB_ 宏以及 DUMMY 参数
(TESTED_FEATURES) ? NPY_TOSTRING(TARGET_NAME) :
// 定义预处理器高级信息回调宏,根据 TESTED_FEATURES 条件展开为 TARGET_NAME 的字符串化或空
(1) ? "baseline(" NPY_WITH_CPU_BASELINE ")" :
// 定义预处理器基础高级信息回调宏,展开为字符串 "baseline(NPY_WITH_CPU_BASELINE)"
// Preprocessor callbacks
NPY_TOSTRING(TARGET_NAME) " "
// 定义预处理器信息回调宏,将 TARGET_NAME 字符串化并添加空格
"baseline(" NPY_WITH_CPU_BASELINE ")"
// 定义预处理器基础信息回调宏,展开为字符串 "baseline(NPY_WITH_CPU_BASELINE)"
// 结束宏定义,用于条件编译
.\numpy\numpy\_core\src\common\npy_cpu_features.c
/*
* Include necessary headers for CPU feature detection and definition.
* These headers ensure that CPU baseline definitions are accessible.
*/
/******************** Private Definitions *********************/
// This array holds boolean values indicating whether each CPU feature is available.
// It is initialized during module initialization and remains immutable thereafter.
// It is not included in the global data struct due to shared usage across modules.
static unsigned char npy__cpu_have[NPY_CPU_FEATURE_MAX];
/******************** Private Declarations *********************/
// Function prototype for runtime CPU feature detection initialization
static void
npy__cpu_init_features(void);
/*
* Enable or disable CPU dispatched features at runtime based on environment variables
* `NPY_ENABLE_CPU_FEATURES` or `NPY_DISABLE_CPU_FEATURES`.
*
* Multiple features can be enabled or disabled, separated by space, comma, or tab.
* Raises an error if parsing fails or if a specified feature is not valid or could not be enabled/disabled.
*/
static int
npy__cpu_check_env(int disable, const char *env);
/* Ensure that CPU baseline features required by the build are supported at runtime */
static int
npy__cpu_validate_baseline(void);
/******************** Public Definitions *********************/
// Function to check if a specific CPU feature is available
NPY_VISIBILITY_HIDDEN int
npy_cpu_have(int feature_id)
{
// Check if the feature_id is within valid range
if (feature_id <= NPY_CPU_FEATURE_NONE || feature_id >= NPY_CPU_FEATURE_MAX)
return 0;
// Return the boolean value indicating if the feature is available
return npy__cpu_have[feature_id];
}
// Function to initialize CPU features detection at module initialization
NPY_VISIBILITY_HIDDEN int
npy_cpu_init(void)
{
// Initialize CPU features detection
npy__cpu_init_features();
// Validate CPU baseline features required by the build
if (npy__cpu_validate_baseline() < 0) {
return -1;
}
// Check if both enable and disable environment variables are set, which is not allowed
char *enable_env = getenv("NPY_ENABLE_CPU_FEATURES");
char *disable_env = getenv("NPY_DISABLE_CPU_FEATURES");
int is_enable = enable_env && enable_env[0];
int is_disable = disable_env && disable_env[0];
if (is_enable & is_disable) {
PyErr_Format(PyExc_ImportError,
"Both NPY_DISABLE_CPU_FEATURES and NPY_ENABLE_CPU_FEATURES "
"environment variables cannot be set simultaneously."
);
return -1;
}
// If either enable or disable environment variable is set, process it
if (is_enable | is_disable) {
if (npy__cpu_check_env(is_disable, is_disable ? disable_env : enable_env) < 0) {
return -1;
}
}
// Initialization successful
return 0;
}
// Structure definition to hold CPU features and their string representations
static struct {
enum npy_cpu_features feature;
char const *string;
} features[] = {{NPY_CPU_FEATURE_MMX, "MMX"},
{NPY_CPU_FEATURE_SSE, "SSE"},
{NPY_CPU_FEATURE_SSE2, "SSE2"},
{NPY_CPU_FEATURE_SSE3, "SSE3"},
{NPY_CPU_FEATURE_SSSE3, "SSSE3"},
{NPY_CPU_FEATURE_SSE41, "SSE41"},
{NPY_CPU_FEATURE_POPCNT, "POPCNT"},
{NPY_CPU_FEATURE_SSE42, "SSE42"},
{NPY_CPU_FEATURE_AVX, "AVX"},
{NPY_CPU_FEATURE_F16C, "F16C"},
{NPY_CPU_FEATURE_XOP, "XOP"},
{NPY_CPU_FEATURE_FMA4, "FMA4"},
{NPY_CPU_FEATURE_FMA3, "FMA3"},
{NPY_CPU_FEATURE_AVX2, "AVX2"},
{NPY_CPU_FEATURE_AVX512F, "AVX512F"},
{NPY_CPU_FEATURE_AVX512CD, "AVX512CD"},
{NPY_CPU_FEATURE_AVX512ER, "AVX512ER"},
{NPY_CPU_FEATURE_AVX512PF, "AVX512PF"},
{NPY_CPU_FEATURE_AVX5124FMAPS, "AVX5124FMAPS"},
{NPY_CPU_FEATURE_AVX5124VNNIW, "AVX5124VNNIW"},
{NPY_CPU_FEATURE_AVX512VPOPCNTDQ, "AVX512VPOPCNTDQ"},
{NPY_CPU_FEATURE_AVX512VL, "AVX512VL"},
{NPY_CPU_FEATURE_AVX512BW, "AVX512BW"},
{NPY_CPU_FEATURE_AVX512DQ, "AVX512DQ"},
{NPY_CPU_FEATURE_AVX512VNNI, "AVX512VNNI"},
{NPY_CPU_FEATURE_AVX512IFMA, "AVX512IFMA"},
{NPY_CPU_FEATURE_AVX512VBMI, "AVX512VBMI"},
{NPY_CPU_FEATURE_AVX512VBMI2, "AVX512VBMI2"},
{NPY_CPU_FEATURE_AVX512BITALG, "AVX512BITALG"},
{NPY_CPU_FEATURE_AVX512FP16 , "AVX512FP16"},
{NPY_CPU_FEATURE_AVX512_KNL, "AVX512_KNL"},
{NPY_CPU_FEATURE_AVX512_KNM, "AVX512_KNM"},
{NPY_CPU_FEATURE_AVX512_SKX, "AVX512_SKX"},
{NPY_CPU_FEATURE_AVX512_CLX, "AVX512_CLX"},
{NPY_CPU_FEATURE_AVX512_CNL, "AVX512_CNL"},
{NPY_CPU_FEATURE_AVX512_ICL, "AVX512_ICL"},
{NPY_CPU_FEATURE_AVX512_SPR, "AVX512_SPR"},
{NPY_CPU_FEATURE_VSX, "VSX"},
{NPY_CPU_FEATURE_VSX2, "VSX2"},
{NPY_CPU_FEATURE_VSX3, "VSX3"},
{NPY_CPU_FEATURE_VSX4, "VSX4"},
{NPY_CPU_FEATURE_VX, "VX"},
{NPY_CPU_FEATURE_VXE, "VXE"},
{NPY_CPU_FEATURE_VXE2, "VXE2"},
{NPY_CPU_FEATURE_NEON, "NEON"},
{NPY_CPU_FEATURE_NEON_FP16, "NEON_FP16"},
{NPY_CPU_FEATURE_NEON_VFPV4, "NEON_VFPV4"},
{NPY_CPU_FEATURE_ASIMD, "ASIMD"},
{NPY_CPU_FEATURE_FPHP, "FPHP"},
{NPY_CPU_FEATURE_ASIMDHP, "ASIMDHP"},
{NPY_CPU_FEATURE_ASIMDDP, "ASIMDDP"},
{NPY_CPU_FEATURE_ASIMDFHM, "ASIMDFHM"},
{NPY_CPU_FEATURE_SVE, "SVE"},
{NPY_CPU_FEATURE_RVV, "RVV"}};
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_features_dict(void)
{
PyObject *dict = PyDict_New();
if (dict) {
for(unsigned i = 0; i < sizeof(features)/sizeof(features[0]); ++i)
if (PyDict_SetItemString(dict, features[i].string,
npy__cpu_have[features[i].feature] ? Py_True : Py_False) < 0) {
Py_DECREF(dict);
return NULL;
}
}
return dict;
/******************** Private Definitions *********************/
/**
* 宏定义,用于在 PyList 对象中添加字符串项,将 FEATURE 转换为 PyUnicode 对象,
* 若转换失败则释放 LIST,并返回 NULL
*/
item = PyUnicode_FromString(NPY_TOSTRING(FEATURE)); \
if (item == NULL) { \
Py_DECREF(LIST); \
return NULL; \
} \
PyList_SET_ITEM(LIST, index++, item);
/**
* 返回包含 CPU 基线特性的 PyList 对象,
* 若未禁用优化且 NPY_WITH_CPU_BASELINE_N 大于 0,则创建包含基线特性数目的列表
* 否则返回一个空列表
*/
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_baseline_list(void)
{
PyObject *list = PyList_New(NPY_WITH_CPU_BASELINE_N), *item;
int index = 0;
if (list != NULL) {
// 调用宏展开,将基线特性添加到列表中
NPY_WITH_CPU_BASELINE_CALL(NPY__CPU_PYLIST_APPEND_CB, list)
}
return list;
return PyList_New(0);
}
/**
* 返回包含 CPU 分发特性的 PyList 对象,
* 若未禁用优化且 NPY_WITH_CPU_DISPATCH_N 大于 0,则创建包含分发特性数目的列表
* 否则返回一个空列表
*/
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_dispatch_list(void)
{
PyObject *list = PyList_New(NPY_WITH_CPU_DISPATCH_N), *item;
int index = 0;
if (list != NULL) {
// 调用宏展开,将分发特性添加到列表中
NPY_WITH_CPU_DISPATCH_CALL(NPY__CPU_PYLIST_APPEND_CB, list)
}
return list;
return PyList_New(0);
}
/**
* 内联函数,返回给定 CPU 特性的 ID,
* 如果该特性在通过 --cpu-baseline 配置的基线特性中,则返回其对应的 ID
* 否则返回 0
*/
static inline int
npy__cpu_baseline_fid(const char *feature)
{
NPY_WITH_CPU_BASELINE_CALL(NPY__CPU_FEATURE_ID_CB, feature)
return 0;
}
/**
* 内联函数,返回给定 CPU 特性的 ID,
* 如果该特性在通过 --cpu-dispatch 配置的分发特性中,则返回其对应的 ID
* 否则返回 0
*/
static inline int
npy__cpu_dispatch_fid(const char *feature)
{
NPY_WITH_CPU_DISPATCH_CALL(NPY__CPU_FEATURE_ID_CB, feature)
return 0;
}
/**
* 验证基线 CPU 特性的有效性,
* 若未禁用优化且 NPY_WITH_CPU_BASELINE_N 大于 0,则检查所需的特性是否支持,
* 若有不支持的特性,则抛出运行时异常,并返回 -1
*/
static int
npy__cpu_validate_baseline(void)
{
char baseline_failure[sizeof(NPY_WITH_CPU_BASELINE) + 1];
char *fptr = &baseline_failure[0];
// 宏展开,检查基线特性是否都被支持
if (!npy__cpu_have[NPY_CAT(NPY_CPU_FEATURE_, FEATURE)]) { \
const int size = sizeof(NPY_TOSTRING(FEATURE)); \
memcpy(fptr, NPY_TOSTRING(FEATURE), size); \
fptr[size] = ' '; fptr += size + 1; \
}
NPY_WITH_CPU_BASELINE_CALL(NPY__CPU_VALIDATE_CB, DUMMY) // 针对 MSVC 额外的参数
*fptr = '\0';
if (baseline_failure[0] != '\0') {
*(fptr-1) = '\0'; // 去掉最后的空格
// 抛出运行时异常,指示不支持的 CPU 特性
PyErr_Format(PyExc_RuntimeError,
"NumPy was built with baseline optimizations: \n"
"(" NPY_WITH_CPU_BASELINE ") but your machine "
"doesn't support:\n(%s).",
baseline_failure
);
return -1;
);
return -1;
return 0;
}
}
return 0;
}
static int
npy__cpu_check_env(int disable, const char *env) {
static const char *names[] = {
"enable", "disable",
"NPY_ENABLE_CPU_FEATURES", "NPY_DISABLE_CPU_FEATURES",
"During parsing environment variable: 'NPY_ENABLE_CPU_FEATURES':\n",
"During parsing environment variable: 'NPY_DISABLE_CPU_FEATURES':\n"
};
// 将 disable 转换为整数值 0 或 1
disable = disable ? 1 : 0;
// 根据 disable 的值选择相应的名字
const char *act_name = names[disable];
const char *env_name = names[disable + 2];
const char *err_head = names[disable + 4];
// 定义最大环境变量长度为 1024
size_t var_len = strlen(env) + 1;
// 检查环境变量长度是否超过最大长度
if (var_len > NPY__MAX_VAR_LEN) {
// 如果超过最大长度,抛出运行时错误
PyErr_Format(PyExc_RuntimeError,
"Length of environment variable '%s' is %zd, only %d accepted",
env_name, var_len, NPY__MAX_VAR_LEN
);
return -1;
}
// 复制环境变量内容到 features 数组中
char features[NPY__MAX_VAR_LEN];
memcpy(features, env, var_len);
// 定义两个字符串数组用于记录不存在和不支持的特性
char nexist[NPY__MAX_VAR_LEN];
char *nexist_cur = &nexist[0];
char notsupp[sizeof(NPY_WITH_CPU_DISPATCH) + 1];
char *notsupp_cur = ¬supp[0];
// 定义分隔符字符串
// 逗号和空格包括水平制表符、垂直制表符、回车符、换行符、换页符
const char *delim = ", \t\v\r\n\f";
// 使用 strtok 分割 features 字符串
char *feature = strtok(features, delim);
while (feature) {
// 检查特性是否属于基线优化
if (npy__cpu_baseline_fid(feature) > 0){
if (disable) {
// 如果试图禁用基线优化的特性,抛出运行时错误
PyErr_Format(PyExc_RuntimeError,
"%s"
"You cannot disable CPU feature '%s', since it is part of "
"the baseline optimizations:\n"
"(" NPY_WITH_CPU_BASELINE ").",
err_head, feature
);
return -1;
}
// 跳过这个特性继续处理下一个
goto next;
}
// 检查特性是否属于已分派的特性
int feature_id = npy__cpu_dispatch_fid(feature);
if (feature_id == 0) {
// 如果特性未被分派,记录到 nexist 数组中
int flen = strlen(feature);
memcpy(nexist_cur, feature, flen);
nexist_cur[flen] = ' '; nexist_cur += flen + 1;
// 跳过这个特性继续处理下一个
goto next;
}
// 检查特性是否由当前机器支持
if (!npy__cpu_have[feature_id]) {
// 如果当前机器不支持该特性,记录到 notsupp 数组中
int flen = strlen(feature);
memcpy(notsupp_cur, feature, flen);
notsupp_cur[flen] = ' '; notsupp_cur += flen + 1;
// 跳过这个特性继续处理下一个
goto next;
}
// 最后根据 disable 设置特性的状态为禁用或启用
npy__cpu_have[feature_id] = disable ? 0 : 2;
next:
// 继续处理下一个特性
feature = strtok(NULL, delim);
}
if (!disable){
// 禁用所有未标记的已分派特性
if(npy__cpu_have[NPY_CAT(NPY_CPU_FEATURE_, FEATURE)] != 0)\
{npy__cpu_have[NPY_CAT(NPY_CPU_FEATURE_, FEATURE)]--;}\
// 调用宏 NPY_WITH_CPU_DISPATCH_CALL 来禁用未标记的分派特性
NPY_WITH_CPU_DISPATCH_CALL(NPY__CPU_DISABLE_DISPATCH_CB, DUMMY) // extra arg for msvc
}
// 结束 nexist 数组以字符串形式
*nexist_cur = '\0';
if (nexist[0] != '\0') {
*(nexist_cur-1) = '\0'; // 去除末尾的空格
// 发出警告信息,指明无法使用某些 CPU 特性,因为它们不是分发优化的一部分
if (PyErr_WarnFormat(PyExc_ImportWarning, 1,
"%sYou cannot %s CPU features (%s), since "
"they are not part of the dispatched optimizations\n"
"(" NPY_WITH_CPU_DISPATCH ").",
err_head, act_name, nexist
) < 0) {
return -1; // 如果警告发生错误,返回 -1
}
}
// 定义一个消息格式,指明某些 CPU 特性不受支持
"%s" \
"You cannot %s CPU features (%s), since " \
"they are not supported by your machine.", \
err_head, act_name, notsupp
*notsupp_cur = '\0';
// 如果 notsupp 的第一个字符不是空字符
if (notsupp[0] != '\0') {
*(notsupp_cur-1) = '\0'; // 去除末尾的空格
// 如果禁用标志为假(即不禁用),则引发运行时错误,指明某些 CPU 特性不受支持
if (!disable){
PyErr_Format(PyExc_RuntimeError, NOTSUPP_BODY);
return -1; // 返回 -1 表示出错
}
}
// 如果未定义特定条件,发出警告并返回错误码
if (PyErr_WarnFormat(PyExc_ImportWarning, 1,
"%s"
"You cannot use environment variable '%s', since "
"the NumPy library was compiled with optimization disabled.",
"the NumPy library was compiled without any dispatched optimizations.",
err_head, env_name, act_name
) < 0) {
return -1;
}
// 返回成功状态
return 0;
}
/****************************************************************
* This section is reserved to defining @npy__cpu_init_features
* for each CPU architecture, please try to keep it clean. Ty
****************************************************************/
/***************** X86 ******************/
static int
npy__cpu_getxcr0(void)
{
// 调用平台特定的 _xgetbv 函数获取 XCR0 寄存器的值
return _xgetbv(0);
/* named form of xgetbv not supported on OSX, so must use byte form, see:
* https://github.com/asmjit/asmjit/issues/78
*/
unsigned int eax, edx;
// 使用汇编指令直接获取 XCR0 寄存器的值
__asm(".byte 0x0F, 0x01, 0xd0" : "=a"(eax), "=d"(edx) : "c"(0));
return eax;
// 默认情况下返回 0
return 0;
}
static void
npy__cpu_cpuid(int reg[4], int func_id)
{
// Microsoft 编译器下使用 __cpuidex 函数获取 CPUID 信息
__cpuidex(reg, func_id, 0);
// Intel 编译器下使用 __cpuid 函数获取 CPUID 信息
__cpuid(reg, func_id);
// 在 PIC 模式下,使用 xchg 指令保存和恢复 %ebx 寄存器,并调用 cpuid 指令获取 CPUID 信息
__asm__("xchg{l}\t{%%}ebx, %1\n\t"
"cpuid\n\t"
"xchg{l}\t{%%}ebx, %1\n\t"
: "=a" (reg[0]), "=r" (reg[1]), "=c" (reg[2]),
"=d" (reg[3])
: "a" (func_id), "c" (0)
);
// 直接调用 cpuid 指令获取 CPUID 信息
__asm__("cpuid\n\t"
: "=a" (reg[0]), "=b" (reg[1]), "=c" (reg[2]),
"=d" (reg[3])
: "a" (func_id), "c" (0)
);
// 默认情况下将寄存器数组清零
reg[0] = 0;
}
static void
npy__cpu_init_features(void)
{
// 将 CPU 特性标记数组清零
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
// 获取 CPUID 信息,判断平台支持情况
int reg[] = {0, 0, 0, 0};
npy__cpu_cpuid(reg, 0);
if (reg[0] == 0) {
// 对于不支持 CPUID 的平台,假设基本的 MMX、SSE、SSE2 特性支持
npy__cpu_have[NPY_CPU_FEATURE_MMX] = 1;
npy__cpu_have[NPY_CPU_FEATURE_SSE] = 1;
npy__cpu_have[NPY_CPU_FEATURE_SSE2] = 1;
npy__cpu_have[NPY_CPU_FEATURE_SSE3] = 1;
return;
}
// 查询并记录支持的 CPU 特性
npy__cpu_cpuid(reg, 1);
npy__cpu_have[NPY_CPU_FEATURE_MMX] = (reg[3] & (1 << 23)) != 0;
npy__cpu_have[NPY_CPU_FEATURE_SSE] = (reg[3] & (1 << 25)) != 0;
npy__cpu_have[NPY_CPU_FEATURE_SSE2] = (reg[3] & (1 << 26)) != 0;
npy__cpu_have[NPY_CPU_FEATURE_SSE3] = (reg[2] & (1 << 0)) != 0;
// 检查CPU是否支持SSSE3指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_SSSE3] = (reg[2] & (1 << 9)) != 0;
// 检查CPU是否支持SSE4.1指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_SSE41] = (reg[2] & (1 << 19)) != 0;
// 检查CPU是否支持POPCNT指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_POPCNT] = (reg[2] & (1 << 23)) != 0;
// 检查CPU是否支持SSE4.2指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_SSE42] = (reg[2] & (1 << 20)) != 0;
// 检查CPU是否支持F16C指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_F16C] = (reg[2] & (1 << 29)) != 0;
// 检查OSXSAVE位是否为0,如果是则返回,要求支持XSAVE指令集
if ((reg[2] & (1 << 27)) == 0)
return;
// 获取XCR0寄存器的值,判断是否支持AVX指令集
int xcr = npy__cpu_getxcr0();
if ((xcr & 6) != 6)
return;
// 检查CPU是否支持AVX指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_AVX] = (reg[2] & (1 << 28)) != 0;
// 如果CPU不支持AVX指令集则返回
if (!npy__cpu_have[NPY_CPU_FEATURE_AVX])
return;
// 检查CPU是否支持FMA3指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_FMA3] = (reg[2] & (1 << 12)) != 0;
// 第二次调用cpuid以获取扩展的AMD特性位
npy__cpu_cpuid(reg, 0x80000001);
// 检查CPU是否支持XOP指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_XOP] = (reg[2] & (1 << 11)) != 0;
// 检查CPU是否支持FMA4指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_FMA4] = (reg[2] & (1 << 16)) != 0;
// 第三次调用cpuid以获取扩展的AVX2和AVX512特性位
npy__cpu_cpuid(reg, 7);
// 检查CPU是否支持AVX2指令集,设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_AVX2] = (reg[1] & (1 << 5)) != 0;
// 如果CPU不支持AVX2指令集则返回
if (!npy__cpu_have[NPY_CPU_FEATURE_AVX2])
return;
// 检查AVX512 OS支持,设置相应的标志位
int avx512_os = (xcr & 0xe6) == 0xe6;
/**
* 在 darwin 上,支持 AVX512 的机器默认情况下,线程被创建时 AVX512 被 XCR0 掩码屏蔽,
* 并且使用 AVX 大小的保存区域。但是,AVX512 的能力通过 commpage 和 sysctl 公布。
* 更多信息请参考:
* - https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c
* - https://github.com/golang/go/issues/43089
* - https://github.com/numpy/numpy/issues/19319
*/
if (!avx512_os) {
npy_uintp commpage64_addr = 0x00007fffffe00000ULL;
npy_uint16 commpage64_ver = *((npy_uint16*)(commpage64_addr + 0x01E));
// 在版本大于 12 的情况下,读取 commpage64 的能力位
if (commpage64_ver > 12) {
npy_uint64 commpage64_cap = *((npy_uint64*)(commpage64_addr + 0x010));
avx512_os = (commpage64_cap & 0x0000004000000000ULL) != 0;
}
}
if (!avx512_os) {
return; // 如果没有检测到 AVX512 支持,则直接返回
}
npy__cpu_have[NPY_CPU_FEATURE_AVX512F] = (reg[1] & (1 << 16)) != 0;
npy__cpu_have[NPY_CPU_FEATURE_AVX512CD] = (reg[1] & (1 << 28)) != 0;
}
/***************** POWER ******************/
static void
npy__cpu_init_features(void)
{
// 将 npy__cpu_have 数组初始化为 0
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
unsigned int hwcap = getauxval(AT_HWCAP);
// 如果硬件没有 VSX 功能,则直接返回
if ((hwcap & PPC_FEATURE_HAS_VSX) == 0)
return;
hwcap = getauxval(AT_HWCAP2);
unsigned long hwcap;
elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap));
// 如果硬件没有 VSX 功能,则直接返回
if ((hwcap & PPC_FEATURE_HAS_VSX) == 0)
return;
elf_aux_info(AT_HWCAP2, &hwcap, sizeof(hwcap));
// 如果硬件支持 PPC 3.1 架构,则设置相应的 VSX 功能位
if (hwcap & PPC_FEATURE2_ARCH_3_1)
{
npy__cpu_have[NPY_CPU_FEATURE_VSX] =
npy__cpu_have[NPY_CPU_FEATURE_VSX2] =
npy__cpu_have[NPY_CPU_FEATURE_VSX3] =
npy__cpu_have[NPY_CPU_FEATURE_VSX4] = 1;
return;
}
// 设置基本的 VSX 功能位
npy__cpu_have[NPY_CPU_FEATURE_VSX] = 1;
npy__cpu_have[NPY_CPU_FEATURE_VSX2] = (hwcap & PPC_FEATURE2_ARCH_2_07) != 0;
npy__cpu_have[NPY_CPU_FEATURE_VSX3] = (hwcap & PPC_FEATURE2_ARCH_3_00) != 0;
npy__cpu_have[NPY_CPU_FEATURE_VSX4] = (hwcap & PPC_FEATURE2_ARCH_3_1) != 0;
// TODO: AIX, OpenBSD
// 如果不是在 Linux 或 FreeBSD 系统上,仅设置基本的 VSX 功能位
npy__cpu_have[NPY_CPU_FEATURE_VSX] = 1;
npy__cpu_have[NPY_CPU_FEATURE_VSX2] = 1;
npy__cpu_have[NPY_CPU_FEATURE_VSX3] = 1;
npy__cpu_have[NPY_CPU_FEATURE_VSX4] = 1;
}
/***************** ZARCH ******************/
// 定义静态函数,初始化 CPU 特性检测数组
static void
npy__cpu_init_features(void)
{
// 将 npy__cpu_have 数组初始化为零
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
// 获取当前进程的硬件特性信息
unsigned int hwcap = getauxval(AT_HWCAP);
// 如果未检测到 S390 Vector Extension,则直接返回
if ((hwcap & HWCAP_S390_VX) == 0) {
return;
}
// 如果支持 S390 Vector Extension 2,则设置相关特性标志位
if (hwcap & HWCAP_S390_VXRS_EXT2) {
npy__cpu_have[NPY_CPU_FEATURE_VX] =
npy__cpu_have[NPY_CPU_FEATURE_VXE] =
npy__cpu_have[NPY_CPU_FEATURE_VXE2] = 1;
return;
}
// 否则,仅设置 VX 和 VXE 的特性标志位
npy__cpu_have[NPY_CPU_FEATURE_VXE] = (hwcap & HWCAP_S390_VXE) != 0;
npy__cpu_have[NPY_CPU_FEATURE_VX] = 1;
}
/***************** ARM ******************/
// 定义内联函数,初始化 ARMv8 的 CPU 特性检测数组
static inline void
npy__cpu_init_features_arm8(void)
{
// 设置 NEON 和 ASIMD 相关特性的标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON] =
npy__cpu_have[NPY_CPU_FEATURE_NEON_FP16] =
npy__cpu_have[NPY_CPU_FEATURE_NEON_VFPV4] =
npy__cpu_have[NPY_CPU_FEATURE_ASIMD] = 1;
}
/*
* we aren't sure of what kind kernel or clib we deal with
* so we play it safe
*/
#include <stdio.h>
#include "npy_cpuinfo_parser.h"
#if defined(__linux__)
// 声明 getauxval 函数的弱符号,用于动态链接
__attribute__((weak)) unsigned long getauxval(unsigned long); // linker should handle it
#endif
#ifdef __FreeBSD__
// 声明 elf_aux_info 函数的弱符号,用于动态链接
__attribute__((weak)) int elf_aux_info(int, void *, int); // linker should handle it
// 定义 getauxval 函数的替代版本,用于 FreeBSD 平台
static unsigned long getauxval(unsigned long k)
{
unsigned long val = 0ul;
// 如果 elf_aux_info 未定义或调用失败,则返回默认值 0
if (elf_aux_info == 0 || elf_aux_info((int)k, (void *)&val, (int)sizeof(val)) != 0) {
return 0ul;
}
return val;
}
#endif
// 定义函数,用于初始化 Linux 平台下的 CPU 特性检测
static int
npy__cpu_init_features_linux(void)
{
unsigned long hwcap = 0, hwcap2 = 0;
#ifdef __linux__
// 如果 getauxval 函数存在,则使用其获取硬件特性信息
if (getauxval != 0) {
hwcap = getauxval(NPY__HWCAP);
#ifdef __arm__
hwcap2 = getauxval(NPY__HWCAP2);
#endif
} else {
// 否则,打开 /proc/self/auxv 文件逐行读取获取硬件特性信息
unsigned long auxv[2];
int fd = open("/proc/self/auxv", O_RDONLY);
if (fd >= 0) {
while (read(fd, &auxv, sizeof(auxv)) == sizeof(auxv)) {
if (auxv[0] == NPY__HWCAP) {
hwcap = auxv[1];
}
#ifdef __arm__
else if (auxv[0] == NPY__HWCAP2) {
hwcap2 = auxv[1];
}
#endif
// 检测到末尾标志,退出循环
else if (auxv[0] == 0 && auxv[1] == 0) {
break;
}
}
close(fd);
}
}
#else
// 对于非 Linux 平台,直接使用 getauxval 获取硬件特性信息
hwcap = getauxval(NPY__HWCAP);
#ifdef __arm__
hwcap2 = getauxval(NPY__HWCAP2);
#endif
#endif
// 如果未获取到有效的硬件特性信息,则返回失败
if (hwcap == 0 && hwcap2 == 0) {
#ifdef __linux__
/*
* 如果在 Linux 平台下编译:
* 尝试使用 /proc/cpuinfo 解析硬件特性,用于沙盒环境
* 如果失败,则使用编译器定义的默认值
*/
if (!get_feature_from_proc_cpuinfo(&hwcap, &hwcap2)) {
// 如果解析失败,返回 0
return 0;
}
#else
// 如果不在 Linux 平台下编译,直接返回 0
return 0;
#endif
#ifdef __arm__
// 如果编译目标是 ARM 架构
// 检测是否为 Arm8 (aarch32 状态),通过检查硬件特性标志位 hwcap2 来判断是否支持 AES、SHA1、SHA2、PMULL 和 CRC32
if ((hwcap2 & NPY__HWCAP2_AES) || (hwcap2 & NPY__HWCAP2_SHA1) ||
(hwcap2 & NPY__HWCAP2_SHA2) || (hwcap2 & NPY__HWCAP2_PMULL) ||
(hwcap2 & NPY__HWCAP2_CRC32))
{
hwcap = hwcap2;
#else
// 如果编译目标不是 ARM 架构
// 始终进入此分支,用于非 ARM 架构的情况
if (1)
{
// 如果硬件特性标志位 hwcap 不包含 NPY__HWCAP_FP 或 NPY__HWCAP_ASIMD,则返回 1
if (!(hwcap & (NPY__HWCAP_FP | NPY__HWCAP_ASIMD))) {
// 这种情况可能发生吗?也许被内核禁用了
// 顺便说一句,这会破坏 AARCH64 的基线
return 1;
}
#endif
// 根据硬件特性设置相应的标志位
npy__cpu_have[NPY_CPU_FEATURE_FPHP] = (hwcap & NPY__HWCAP_FPHP) != 0;
npy__cpu_have[NPY_CPU_FEATURE_ASIMDHP] = (hwcap & NPY__HWCAP_ASIMDHP) != 0;
npy__cpu_have[NPY_CPU_FEATURE_ASIMDDP] = (hwcap & NPY__HWCAP_ASIMDDP) != 0;
npy__cpu_have[NPY_CPU_FEATURE_ASIMDFHM] = (hwcap & NPY__HWCAP_ASIMDFHM) != 0;
npy__cpu_have[NPY_CPU_FEATURE_SVE] = (hwcap & NPY__HWCAP_SVE) != 0;
// 初始化 ARM8 架构的 CPU 特性
npy__cpu_init_features_arm8();
} else {
// 如果有 NEON 指令集支持,设置 NEON 相关的特性标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON] = (hwcap & NPY__HWCAP_NEON) != 0;
if (npy__cpu_have[NPY_CPU_FEATURE_NEON]) {
// 如果 NEON 可用,则设置 NEON_FP16 和 NEON_VFPV4 的标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON_FP16] = (hwcap & NPY__HWCAP_HALF) != 0;
npy__cpu_have[NPY_CPU_FEATURE_NEON_VFPV4] = (hwcap & NPY__HWCAP_VFPv4) != 0;
}
}
// 返回 1,表示初始化成功
return 1;
}
#endif
static void
npy__cpu_init_features(void)
{
// 初始化 npy__cpu_have 数组,全部置为 0
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
#ifdef __linux__
// 如果是在 Linux 平台,调用相应的初始化函数并返回
if (npy__cpu_init_features_linux())
return;
#endif
// 如果是在其他平台,没有其他需要执行的任务
// 之后的代码块处理 ARM64 或特定硬件特性的初始化
#if defined(NPY_HAVE_ASIMD) || defined(__aarch64__) || (defined(__ARM_ARCH) && __ARM_ARCH >= 8) || defined(_M_ARM64)
#if defined(NPY_HAVE_FPHP) || defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
// 如果支持 FPHP,设置相应标志位
npy__cpu_have[NPY_CPU_FEATURE_FPHP] = 1;
#endif
#if defined(NPY_HAVE_ASIMDHP) || defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
// 如果支持 ASIMDHP,设置相应标志位
npy__cpu_have[NPY_CPU_FEATURE_ASIMDHP] = 1;
#endif
#if defined(NPY_HAVE_ASIMDDP) || defined(__ARM_FEATURE_DOTPROD)
// 如果支持 ASIMDDP,设置相应标志位
npy__cpu_have[NPY_CPU_FEATURE_ASIMDDP] = 1;
#endif
#if defined(NPY_HAVE_ASIMDFHM) || defined(__ARM_FEATURE_FP16FML)
// 如果支持 ASIMDFHM,设置相应标志位
npy__cpu_have[NPY_CPU_FEATURE_ASIMDFHM] = 1;
#endif
#if defined(NPY_HAVE_SVE) || defined(__ARM_FEATURE_SVE)
// 如果支持 SVE,设置相应标志位
npy__cpu_have[NPY_CPU_FEATURE_SVE] = 1;
#endif
// 初始化 ARM8 架构的 CPU 特性
npy__cpu_init_features_arm8();
#else
#if defined(NPY_HAVE_NEON) || defined(__ARM_NEON__)
// 如果支持 NEON,设置 NEON 标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON] = 1;
#endif
#if defined(NPY_HAVE_NEON_FP16) || defined(__ARM_FP16_FORMAT_IEEE) || (defined(__ARM_FP) && (__ARM_FP & 2))
// 如果支持 NEON_FP16,根据 NEON 的可用性设置 NEON_FP16 标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON_FP16] = npy__cpu_have[NPY_CPU_FEATURE_NEON];
#endif
#if defined(NPY_HAVE_NEON_VFPV4) || defined(__ARM_FEATURE_FMA)
// 如果支持 NEON_VFPV4,根据 NEON 的可用性设置 NEON_VFPV4 标志位
npy__cpu_have[NPY_CPU_FEATURE_NEON_VFPV4] = npy__cpu_have[NPY_CPU_FEATURE_NEON];
#endif
#endif
}
#ifdef HWCAP_RVV
// 如果 HWCAP_RVV 已定义,则直接使用系统定义的硬件特性
#include <sys/auxv.h>
#ifndef HWCAP_RVV
// 如果未定义 HWCAP_RVV,则定义 COMPAT_HWCAP_ISA_V 为 'V' ISA 的位掩码
// 参考:https://github.com/torvalds/linux/blob/v6.8/arch/riscv/include/uapi/asm/hwcap.h#L24
#define COMPAT_HWCAP_ISA_V (1 << ('V' - 'A'))
#endif
static void
npy__cpu_init_features(void)
{
// 清空 npy__cpu_have 数组,准备记录 CPU 特性
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
// 从系统获取硬件特性值
unsigned int hwcap = getauxval(AT_HWCAP);
// 检查是否支持 'V' ISA,如果是则设置 RVV 特性
if (hwcap & COMPAT_HWCAP_ISA_V) {
npy__cpu_have[NPY_CPU_FEATURE_RVV] = 1;
}
}
/*********** Unsupported ARCH ***********/
#else
static void
npy__cpu_init_features(void)
{
/*
* 如果不支持当前架构,则清空 npy__cpu_have 数组以禁用所有 CPU 特性
* 这是为了确保在多次调用 npy__cpu_init_features 时,已禁用的特性不会受到影响
* 通过环境变量或其他方法禁用的特性,在此处被清除
* 可能在未来支持其他方法,如全局变量,详细了解请回到 npy__cpu_try_disable_env
*/
memset(npy__cpu_have, 0, sizeof(npy__cpu_have[0]) * NPY_CPU_FEATURE_MAX);
}
#endif
.\numpy\numpy\_core\src\common\npy_cpu_features.h
extern "C" {
enum npy_cpu_features
{
NPY_CPU_FEATURE_NONE = 0, // 定义枚举常量 NPY_CPU_FEATURE_NONE,值为 0,表示无 CPU 特性
// X86
NPY_CPU_FEATURE_MMX = 1, // 定义 MMX 特性的枚举值为 1
NPY_CPU_FEATURE_SSE = 2, // 定义 SSE 特性的枚举值为 2
NPY_CPU_FEATURE_SSE2 = 3, // 定义 SSE2 特性的枚举值为 3
NPY_CPU_FEATURE_SSE3 = 4, // 定义 SSE3 特性的枚举值为 4
NPY_CPU_FEATURE_SSSE3 = 5, // 定义 SSSE3 特性的枚举值为 5
NPY_CPU_FEATURE_SSE41 = 6, // 定义 SSE4.1 特性的枚举值为 6
NPY_CPU_FEATURE_POPCNT = 7, // 定义 POPCNT 特性的枚举值为 7
NPY_CPU_FEATURE_SSE42 = 8, // 定义 SSE4.2 特性的枚举值为 8
NPY_CPU_FEATURE_AVX = 9, // 定义 AVX 特性的枚举值为 9
NPY_CPU_FEATURE_F16C = 10, // 定义 F16C 特性的枚举值为 10
NPY_CPU_FEATURE_XOP = 11, // 定义 XOP 特性的枚举值为 11
NPY_CPU_FEATURE_FMA4 = 12, // 定义 FMA4 特性的枚举值为 12
NPY_CPU_FEATURE_FMA3 = 13, // 定义 FMA3 特性的枚举值为 13
NPY_CPU_FEATURE_AVX2 = 14, // 定义 AVX2 特性的枚举值为 14
NPY_CPU_FEATURE_FMA = 15, // AVX2 和 FMA3,提供向后兼容性,枚举值为 15
NPY_CPU_FEATURE_AVX512F = 30, // 定义 AVX-512F 特性的枚举值为 30
NPY_CPU_FEATURE_AVX512CD = 31, // 定义 AVX-512CD 特性的枚举值为 31
NPY_CPU_FEATURE_AVX512ER = 32, // 定义 AVX-512ER 特性的枚举值为 32
NPY_CPU_FEATURE_AVX512PF = 33, // 定义 AVX-512PF 特性的枚举值为 33
NPY_CPU_FEATURE_AVX5124FMAPS = 34, // 定义 AVX-5124FMAPS 特性的枚举值为 34
NPY_CPU_FEATURE_AVX5124VNNIW = 35, // 定义 AVX-5124VNNIW 特性的枚举值为 35
NPY_CPU_FEATURE_AVX512VPOPCNTDQ = 36, // 定义 AVX-512VPOPCNTDQ 特性的枚举值为 36
NPY_CPU_FEATURE_AVX512BW = 37, // 定义 AVX-512BW 特性的枚举值为 37
NPY_CPU_FEATURE_AVX512DQ = 38, // 定义 AVX-512DQ 特性的枚举值为 38
NPY_CPU_FEATURE_AVX512VL = 39, // 定义 AVX-512VL 特性的枚举值为 39
NPY_CPU_FEATURE_AVX512IFMA = 40, // 定义 AVX-512IFMA 特性的枚举值为 40
NPY_CPU_FEATURE_AVX512VBMI = 41, // 定义 AVX-512VBMI 特性的枚举值为 41
NPY_CPU_FEATURE_AVX512VNNI = 42, // 定义 AVX-512VNNI 特性的枚举值为 42
NPY_CPU_FEATURE_AVX512VBMI2 = 43, // 定义 AVX-512VBMI2 特性的枚举值为 43
NPY_CPU_FEATURE_AVX512BITALG = 44, // 定义 AVX-512BITALG 特性的枚举值为 44
NPY_CPU_FEATURE_AVX512FP16 = 45, // 定义 AVX-512FP16 特性的枚举值为 45
// X86 CPU Groups
// Knights Landing (F,CD,ER,PF)
NPY_CPU_FEATURE_AVX512_KNL = 101, // 定义 Knights Landing 特性组的枚举值为 101
// Knights Mill (F,CD,ER,PF,4FMAPS,4VNNIW,VPOPCNTDQ)
NPY_CPU_FEATURE_AVX512_KNM = 102, // 定义 Knights Mill 特性组的枚举值为 102
// Skylake-X (F,CD,BW,DQ,VL)
NPY_CPU_FEATURE_AVX512_SKX = 103, // 定义 Skylake-X 特性组的枚举值为 103
// Cascade Lake (F,CD,BW,DQ,VL,VNNI)
NPY_CPU_FEATURE_AVX512_CLX = 104, // 定义 Cascade Lake 特性组的枚举值为 104
// Cannon Lake (F,CD,BW,DQ,VL,IFMA,VBMI)
NPY_CPU_FEATURE_AVX512_CNL = 105, // 定义 Cannon Lake 特性组的枚举值为 105
// Ice Lake (F,CD,BW,DQ,VL,IFMA,VBMI,VNNI,VBMI2,BITALG,VPOPCNTDQ)
NPY_CPU_FEATURE_AVX512_ICL = 106, // 定义 Ice Lake 特性组的枚举值为 106
// Sapphire Rapids (Ice Lake, AVX-512FP16)
NPY_CPU_FEATURE_AVX512_SPR = 107, // 定义 Sapphire Rapids 特性组的枚举值为 107
// IBM/POWER VSX
// POWER7
NPY_CPU_FEATURE_VSX = 200, // 定义 POWER7 的 VSX 特性的枚举值为 200
// POWER8
NPY_CPU_FEATURE_VSX2 = 201, // 定义 POWER8 的 VSX2 特性的枚举值为 201
// POWER9
NPY_CPU_FEATURE_VSX3 = 202, // 定义 POWER9 的 VSX3 特性的枚举值为 202
// POWER10
NPY_CPU_FEATURE_VSX4 = 203, // 定义 POWER10 的 VSX4 特性的枚举值为 203
// ARM
NPY_CPU_FEATURE_NEON = 300, // 定义 ARM 的 NEON 特性的枚举值为 300
NPY_CPU_FEATURE_NEON_FP16 = 301, // 定义 ARM 的 NEON FP16 特性的枚举值为 301
// FMA
NPY_CPU_FEATURE_NEON_VFPV4 = 302, // 定义 ARM 的 NEON VFPV4 特性的枚举值为 302
// Advanced SIMD
NPY_CPU_FEATURE_ASIMD = 303, // 定义 ARM 的 Advanced SIMD 特性的枚举值为 303
// ARMv8.2 half-precision
NPY_CPU_FEATURE_FPHP = 304, // 定义 ARMv8.2 的 FPHP 特性的枚举值为 304
// ARMv8.2 half-precision vector arithm
NPY_CPU_FEATURE_ASIMDHP = 305,
// ARMv8.2 dot product
NPY_CPU_FEATURE_ASIMDDP = 306,
// ARMv8.2 single&half-precision multiply
NPY_CPU_FEATURE_ASIMDFHM = 307,
// Scalable Vector Extensions (SVE)
NPY_CPU_FEATURE_SVE = 308,
// IBM/ZARCH
NPY_CPU_FEATURE_VX = 350,
// Vector-Enhancements Facility 1
NPY_CPU_FEATURE_VXE = 351,
// Vector-Enhancements Facility 2
NPY_CPU_FEATURE_VXE2 = 352,
// RISC-V
NPY_CPU_FEATURE_RVV = 400,
// 定义的最大 CPU 特性值,用于边界检查或遍历
NPY_CPU_FEATURE_MAX
};
/*
* 初始化 CPU 特性
*
* 这个函数
* - 检测运行时的 CPU 特性
* - 检查基准 CPU 特性是否存在
* - 使用 'NPY_DISABLE_CPU_FEATURES' 来禁用可调度的特性
* - 使用 'NPY_ENABLE_CPU_FEATURES' 来启用可调度的特性
*
* 当以下情况发生时会设置 RuntimeError:
* - 构建时的 CPU 基准特性在运行时不受支持
* - 'NPY_DISABLE_CPU_FEATURES' 尝试禁用一个基准特性
* - 同时设置了 'NPY_DISABLE_CPU_FEATURES' 和 'NPY_ENABLE_CPU_FEATURES'
* - 'NPY_ENABLE_CPU_FEATURES' 尝试启用一个不被机器或构建支持的特性
* - 项目在没有任何特性优化支持的情况下尝试启用特性
*
* 当以下情况发生时会设置 ImportWarning:
* - 'NPY_DISABLE_CPU_FEATURES' 尝试禁用一个不被机器或构建支持的特性
* - 在项目没有任何特性优化支持的情况下,'NPY_DISABLE_CPU_FEATURES' 或 'NPY_ENABLE_CPU_FEATURES'
* 尝试禁用/启用一个特性
*
* 成功时返回 0,否则返回 -1
*/
NPY_VISIBILITY_HIDDEN int
npy_cpu_init(void);
/*
* 如果 CPU 特性不可用,则返回 0
* 注意:必须先调用 `npy_cpu_init`,否则将始终返回 0
*/
NPY_VISIBILITY_HIDDEN int
npy_cpu_have(int feature_id);
npy_cpu_have(NPY_CPU_FEATURE_
/*
* 返回一个新的字典,包含 CPU 特性名称及其运行时可用性
* 与 `npy_cpu_have` 类似,必须先调用 `npy_cpu_init`
*/
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_features_dict(void);
/*
* 返回一个新的 Python 列表,包含根据指定 '--cpu-baseline' 参数值
* 在编译器和平台支持的最小必需优化集合
*
* 此函数主要用于实现 umath 的 '__cpu_baseline__' 属性,
* 并且项目按照从最低到最高兴趣的顺序对项目进行排序
*
* 例如,根据默认的构建配置,并假设编译器支持所有相关优化,则返回的列表应该等效于:
*
* 在 x86 上:['SSE', 'SSE2']
* 在 x64 上:['SSE', 'SSE2', 'SSE3']
* 在 armhf 上:[]
* 在 aarch64 上:['NEON', 'NEON_FP16', 'NEON_VPFV4', 'ASIMD']
* 在 ppc64 上:[]
* 在 ppc64le 上:['VSX', 'VSX2']
* 在 s390x 上:[]
* 在其他架构或如果禁用了优化时:[]
*/
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_baseline_list(void);
/*
* Return a new a Python list contains the dispatched set of additional optimizations
* that supported by the compiler and platform according to the specified
* values to command argument '--cpu-dispatch'.
*
* This function is mainly used to implement umath's attribute '__cpu_dispatch__',
* and the items are sorted from the lowest to highest interest.
*
* For example, according to the default build configuration and by assuming the compiler
* support all the involved optimizations then the returned list should equivalent to:
*
* On x86: ['SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', ...]
* On x64: ['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', ...]
* On armhf: ['NEON', 'NEON_FP16', 'NEON_VPFV4', 'ASIMD', 'ASIMDHP', 'ASIMDDP', 'ASIMDFHM']
* On aarch64: ['ASIMDHP', 'ASIMDDP', 'ASIMDFHM']
* On ppc64: ['VSX', 'VSX2', 'VSX3', 'VSX4']
* On ppc64le: ['VSX3', 'VSX4']
* On s390x: ['VX', 'VXE', VXE2]
* On any other arch or if the optimization is disabled: []
*/
NPY_VISIBILITY_HIDDEN PyObject *
npy_cpu_dispatch_list(void);