用 KissFFT 替换手写 FFT 实现带通滤波器（C++ DLL + C# 调用）本文介绍了在工业视觉项目中用Kis

适用场景：工业视觉、图像预处理、频域滤波
技术栈：C++ / KissFFT / AVX2 / std::thread / C# P/Invoke

一、背景

在工业视觉项目中，带通滤波（Bandpass Filter）是一种常见的图像预处理手段——它可以保留特定频率范围内的纹理细节，同时压制低频背景和高频噪声，非常适合表面缺陷检测、周期纹理分析等场景。

频域滤波的流程固定：

原图 → 填充/镜像 → 正向FFT → 乘以掩码 → 逆向FFT → 裁剪 → 输出

项目初期用的是自己手写的 Cooley-Tukey FFT，功能上没问题，但存在两个缺点：

代码量大，维护成本高
没有经过充分优化，在大尺寸图像上耗时明显

本文记录了将手写 FFT 替换为 KissFFT 的完整过程，包括环境配置、代码改造、以及踩过的坑。

二、为什么选 KissFFT

市面上常见的 FFT 库对比如下：

库	优点	缺点
手写 Cooley-Tukey	无依赖	维护成本高，性能一般
KissFFT	轻量（仅3个文件）、无依赖、跨平台	不带内置 SIMD
FFTW	速度最快（自适应优化）	配置复杂，需要额外 dll
pffft	内置 SIMD	知名度低，文档少

KissFFT 的名字来自 Keep It Simple, Stupid，整个库只有三个文件：

kiss_fft.h
_kiss_fft_guts.h
kiss_fft.c

没有 CMake，没有依赖，复制进项目就能用，非常适合嵌入 DLL 项目中发布。

项目地址：github.com/mborgerding…

三、工程配置（Visual Studio）

3.1 获取 KissFFT

从 GitHub 下载源码，只需保留三个文件，将它们复制到 DLL 项目的源码目录：

YourDll/
├── pch.h
├── pch.cpp
├── bandpassFilter.h
├── bandpassFilter.cpp
├── kiss_fft.h           ← 新增
├── _kiss_fft_guts.h     ← 新增
└── kiss_fft.cpp         ← 原来是 .c，改名为 .cpp 再加入项目

3.2 关键配置：关闭预编译头

这是最容易踩的坑。MSVC 要求每个 .cpp 第一行必须是 #include "pch.h"，但 KissFFT 是第三方文件，它没有这一行，直接编译会报错。

解决方法：在解决方案资源管理器中，右键 kiss_fft.cpp → 属性 → C/C++ → 预编译头 → 改为**"不使用预编译头"**。

kiss_fft.cpp 属性
└── C/C++
    └── 预编译头
        └── 预编译头: 不使用预编译头   ← 改这里

bandpassFilter.cpp 保持默认（使用预编译头），无需修改。

3.3 启用 AVX2（极力推荐）

项目属性 → C/C++ → 代码生成 → 启用增强指令集 → AVX2 (/arch:AVX2)

四、头文件

头文件与原版完全相同，C 接口没有任何改动，C# 那边无需改动。

#ifndef BANDPASS_FILTER_H
#define BANDPASS_FILTER_H

#include <vector>
#include <cstdint>

#ifdef _WIN32
#define DLL_EXPORT __declspec(dllexport)
#else
#define DLL_EXPORT
#endif

class bandpass_filter {
public:
    void apply(
        const uint8_t* src, int width, int height,
        float largeDia, float smallDia,
        int stripes, float tolerance,
        bool autoscale, bool saturate,
        std::vector<uint8_t>& dst_out);
};

#ifdef __cplusplus
extern "C" {
#endif

    DLL_EXPORT int bandpass_filter_apply(
        const uint8_t* src,
        uint8_t*       dst,
        int width, int height,
        float largeDia, float smallDia,
        int   stripes,
        float tolerance,
        int   autoscale,
        int   saturate);

#ifdef __cplusplus
}
#endif

#endif

五、源文件核心改造

5.1 两个必须注意的 Windows 编译问题

问题一：min/max 宏冲突

Windows SDK 的 <windows.h> 会把 min/max 定义成宏，与 std::min/std::max 冲突，导致编译报错。解决方法有两个，代码里同时用了双重保险：

// 方法A：在所有 #include 之前定义，阻止宏注入
#define NOMINMAX

// 方法B：调用时加括号，阻止宏展开
(std::min)(a, b)   // ✅ 正确
std::min(a, b)     // ❌ 可能被宏替换为 min(a, b)

问题二：头文件路径

#include "kiss_fft.h" 找不到文件时，通常不是路径写错，而是 kiss_fft.cpp 的预编译头配置有问题（见 3.2 节），导致整个编译单元失败后报路径错误。先确认预编译头配置，再排查路径。

5.2 2D FFT 的实现策略

KissFFT 只提供一维变换接口，二维变换需要自己组合：先对所有行做 1D FFT，再对所有列做 1D FFT，这是标准的行列分解法。

正向 2D FFT：
  ┌────────────────┐      ┌────────────────┐      ┌────────────────┐
  │  原始数据       │  行→  │  行变换结果     │  列→  │  2D频域        │
  └────────────────┘      └────────────────┘      └────────────────┘

逆向 2D IFFT：相同步骤，方向取反，最后除以 N²

多线程分配方式：行变换时按行分块，列变换时按列分块，各线程持有独立的 buf_in/buf_out，无数据竞争：

// ============================================================
// bandpassFilter.cpp  ——  KissFFT 实现
//
// 依赖（三个文件复制到项目目录即可，无需其他配置）：
//   kiss_fft.h
//   _kiss_fft_guts.h
//   kiss_fft.cpp  （原 kiss_fft.c 改名为 .cpp）
//
// kiss_fft.cpp 必须单独设置：
//   VS 解决方案资源管理器 → 右键 kiss_fft.cpp
//   → 属性 → C/C++ → 预编译头 → "不使用预编译头"
//
// VS 项目属性（可选但推荐）：
//   C/C++ → 代码生成 → 启用增强指令集 → AVX2 (/arch:AVX2)
// ============================================================

// ★ 必须在所有 #include 之前，阻止 windows.h 注入 min/max 宏
#define NOMINMAX

#include "pch.h"
#include "bandpassFilter.h"
#include "kiss_fft.h"

#include <cmath>
#include <vector>
#include <algorithm>
#include <stdexcept>
#include <cstring>
#include <cstdint>
#include <thread>

#ifdef __AVX__
#   include <immintrin.h>
#endif

#ifndef M_PI
#define M_PI 3.141592653589793238462643383279502884
#endif

// ──────────────────────────────────────────────────────────────
// 工具函数
// ──────────────────────────────────────────────────────────────
static int get_thread_count()
{
    int n = static_cast<int>(std::thread::hardware_concurrency());
    return (n > 0) ? n : 4;
}

static inline int reflect_index(int i, int n)
{
    while (i < 0 || i >= n) {
        if (i < 0) i = -i - 1;
        else       i = 2 * n - 1 - i;
    }
    return i;
}

// ──────────────────────────────────────────────────────────────
// 2D FFT / IFFT  ——  KissFFT 实现
//
// 策略：行列分解法
//   正向：先对所有行做 1D FFT，再对所有列做 1D FFT
//   逆向：相同步骤，方向取反
//
// 归一化说明：
//   KissFFT 正向和逆向变换均不自动归一化。
//   原版手写代码在每次 1D 逆变换末尾除以 N，
//   行变换 ×(1/N) + 列变换 ×(1/N) 合计 1/N²。
//   本版在两次变换全部完成后统一乘以 1/N²，数学等价，
//   只做一次内存遍历，配合 AVX 更高效。
//
// 线程安全：
//   kiss_fft_cfg 创建后为只读，可安全共享给多线程。
//   每个线程持有独立的 bin/bout 缓冲，无数据竞争。
// ──────────────────────────────────────────────────────────────
static void fft2d_kiss(float* re, float* im, int N, bool inverse)
{
    const int num_threads = get_thread_count();

    // 创建配置（只读，全线程共享）
    kiss_fft_cfg cfg = kiss_fft_alloc(N, inverse ? 1 : 0, nullptr, nullptr);
    if (!cfg) throw std::runtime_error("kiss_fft_alloc failed");

    // ── 行变换（多线程）──────────────────────────────────────
    {
        std::vector<std::thread> threads;

        auto row_task = [&](int ystart, int yend) {
            // 每个线程独立缓冲，无竞争
            std::vector<kiss_fft_cpx> bin(N), bout(N);
            for (int y = ystart; y < yend; ++y) {
                float* rp = re + y * N;
                float* ip = im + y * N;
                for (int x = 0; x < N; ++x) {
                    bin[x].r = rp[x];
                    bin[x].i = ip[x];
                }
                kiss_fft(cfg, bin.data(), bout.data());
                for (int x = 0; x < N; ++x) {
                    rp[x] = bout[x].r;
                    ip[x] = bout[x].i;
                }
            }
        };

        const int chunk = (N + num_threads - 1) / num_threads;
        for (int t = 0; t < num_threads; ++t) {
            int s = t * chunk;
            int e = (std::min)(s + chunk, N);
            if (s < N) threads.emplace_back(row_task, s, e);
        }
        for (auto& th : threads) th.join();
    }

    // ── 列变换（多线程）──────────────────────────────────────
    {
        std::vector<std::thread> threads;

        auto col_task = [&](int xstart, int xend) {
            std::vector<kiss_fft_cpx> bin(N), bout(N);
            for (int x = xstart; x < xend; ++x) {
                for (int y = 0; y < N; ++y) {
                    bin[y].r = re[y * N + x];
                    bin[y].i = im[y * N + x];
                }
                kiss_fft(cfg, bin.data(), bout.data());
                for (int y = 0; y < N; ++y) {
                    re[y * N + x] = bout[y].r;
                    im[y * N + x] = bout[y].i;
                }
            }
        };

        const int chunk = (N + num_threads - 1) / num_threads;
        for (int t = 0; t < num_threads; ++t) {
            int s = t * chunk;
            int e = (std::min)(s + chunk, N);
            if (s < N) threads.emplace_back(col_task, s, e);
        }
        for (auto& th : threads) th.join();
    }

    kiss_fft_free(cfg);

    // ── 逆变换归一化：统一乘以 1/N²（AVX 加速）─────────────
    if (inverse) {
        const float inv_n2 = 1.0f / (static_cast<float>(N) * static_cast<float>(N));
        const int   total  = N * N;
#ifdef __AVX__
        {
            const __m256 sc = _mm256_set1_ps(inv_n2);
            int i = 0;
            for (; i + 7 < total; i += 8) {
                _mm256_storeu_ps(&re[i],
                    _mm256_mul_ps(_mm256_loadu_ps(&re[i]), sc));
                _mm256_storeu_ps(&im[i],
                    _mm256_mul_ps(_mm256_loadu_ps(&im[i]), sc));
            }
            for (; i < total; ++i) { re[i] *= inv_n2; im[i] *= inv_n2; }
        }
#else
        for (int i = 0; i < total; ++i) { re[i] *= inv_n2; im[i] *= inv_n2; }
#endif
    }
}

// ──────────────────────────────────────────────────────────────
// 百分位数（用于 autoscale）
// ──────────────────────────────────────────────────────────────
static float percentile(const float* data, int n, double p)
{
    if (n <= 0) return 0.0f;
    p = (std::max)(0.0, (std::min)(100.0, p));
    int k = static_cast<int>(std::floor((p / 100.0) * (n - 1)));
    k = (std::max)(0, (std::min)(k, n - 1));
    std::vector<float> w(data, data + n);
    std::nth_element(w.begin(), w.begin() + k, w.end());
    return w[k];
}

// ──────────────────────────────────────────────────────────────
// bandpass_filter::apply
// 逻辑与原版完全相同，仅将手写 fft2d 替换为 fft2d_kiss
// ──────────────────────────────────────────────────────────────
void bandpass_filter::apply(
    const uint8_t* src, int width, int height,
    float largeDia, float smallDia,
    int stripes, float tolerance,
    bool autoscale, bool saturate,
    std::vector<uint8_t>& dst_out)
{
    if (!src || width <= 0 || height <= 0)
        throw std::invalid_argument("invalid input");

    const int total_pixels = width * height;
    const int num_threads  = get_thread_count();

    // ── Step 1: uint8 → float [0,1]（AVX 加速）──────────────
    std::vector<float> src_float(total_pixels);
#ifdef __AVX__
    {
        const __m256 scale = _mm256_set1_ps(1.0f / 255.0f);
        int i = 0;
        for (; i + 7 < total_pixels; i += 8) {
            __m256i u8v = _mm256_cvtepu8_epi32(
                _mm_loadl_epi64(reinterpret_cast<const __m128i*>(&src[i])));
            _mm256_storeu_ps(&src_float[i],
                _mm256_mul_ps(_mm256_cvtepi32_ps(u8v), scale));
        }
        for (; i < total_pixels; ++i)
            src_float[i] = static_cast<float>(src[i]) / 255.0f;
    }
#else
    for (int i = 0; i < total_pixels; ++i)
        src_float[i] = static_cast<float>(src[i]) / 255.0f;
#endif

    // ── Step 2: 计算 FFT 填充尺寸 ────────────────────────────
    const int maxN = (std::max)(width, height);
    int N = 2;
    while (N < static_cast<int>(std::ceil(1.5 * maxN))) N <<= 1;

    const int offX = static_cast<int>(std::lround((N - width)  / 2.0));
    const int offY = static_cast<int>(std::lround((N - height) / 2.0));

    // ── Step 3: 镜像填充（多线程）────────────────────────────
    std::vector<float> pad(N * N, 0.0f);
    {
        std::vector<std::thread> threads;
        auto pad_task = [&](int ystart, int yend) {
            for (int y = ystart; y < yend; ++y) {
                const int sy = reflect_index(y - offY, height);
                for (int x = 0; x < N; ++x)
                    pad[y * N + x] =
                        src_float[sy * width + reflect_index(x - offX, width)];
            }
        };
        const int chunk = (N + num_threads - 1) / num_threads;
        for (int t = 0; t < num_threads; ++t) {
            int s = t * chunk;
            int e = (std::min)(s + chunk, N);
            if (s < N) threads.emplace_back(pad_task, s, e);
        }
        for (auto& th : threads) th.join();
    }

    // ── Step 4: 正向 2D FFT（KissFFT）───────────────────────
    std::vector<float> re = pad;
    std::vector<float> im(N * N, 0.0f);
    fft2d_kiss(re.data(), im.data(), N, false);

    // ── Step 5: 生成滤波器掩码（多线程）─────────────────────
    const double filterLarge = 2.0 * static_cast<double>(largeDia) / N;
    const double filterSmall = 2.0 * static_cast<double>(smallDia) / N;
    const double sL = filterLarge * filterLarge;
    const double sS = filterSmall * filterSmall;

    double sharpness = (std::max)(0.0, (std::min)(1.0,
        (100.0 - static_cast<double>(tolerance)) / 100.0));
    const double sStripe = sharpness * sharpness;

    // 预计算 1D 高斯权重（按频率索引 0…N-1）
    std::vector<float> rowLarge(N), rowSmall(N), stripeAxis(N);
    for (int i = 0; i < N; ++i) {
        const int    d  = (std::min)(i, N - i);
        const double d2 = static_cast<double>(d) * d;
        rowLarge[i]   = static_cast<float>(std::exp(-d2 * sL));
        rowSmall[i]   = static_cast<float>(std::exp(-d2 * sS));
        stripeAxis[i] = static_cast<float>(1.0 - std::exp(-d2 * sStripe));
    }

    std::vector<float> mask(N * N);
    {
        std::vector<std::thread> threads;
        const int chunk = (N + num_threads - 1) / num_threads;

        if (stripes == 1 || stripes == 2) {
            auto mask_task = [&](int ystart, int yend) {
                for (int y = ystart; y < yend; ++y) {
                    const float ryL     = rowLarge[y];
                    const float ryS     = rowSmall[y];
                    const float stripeY = stripeAxis[y];
                    float* row = mask.data() + y * N;
                    for (int x = 0; x < N; ++x) {
                        const float base   = (1.0f - ryL * rowLarge[x])
                                           * (ryS * rowSmall[x]);
                        const float stripe = (stripes == 1)
                                           ? stripeAxis[x] : stripeY;
                        row[x] = base * stripe;
                    }
                }
            };
            for (int t = 0; t < num_threads; ++t) {
                int s = t * chunk;
                int e = (std::min)(s + chunk, N);
                if (s < N) threads.emplace_back(mask_task, s, e);
            }
        } else {
            auto mask_task = [&](int ystart, int yend) {
                for (int y = ystart; y < yend; ++y) {
                    const float ryL = rowLarge[y];
                    const float ryS = rowSmall[y];
                    float* row = mask.data() + y * N;
                    for (int x = 0; x < N; ++x)
                        row[x] = (1.0f - ryL * rowLarge[x])
                               * (ryS * rowSmall[x]);
                }
            };
            for (int t = 0; t < num_threads; ++t) {
                int s = t * chunk;
                int e = (std::min)(s + chunk, N);
                if (s < N) threads.emplace_back(mask_task, s, e);
            }
        }
        for (auto& th : threads) th.join();
    }

    // ── Step 6: 应用掩码（AVX 加速）─────────────────────────
    {
        const int total = N * N;
#ifdef __AVX__
        int i = 0;
        for (; i + 7 < total; i += 8) {
            __m256 m = _mm256_loadu_ps(&mask[i]);
            _mm256_storeu_ps(&re[i],
                _mm256_mul_ps(_mm256_loadu_ps(&re[i]), m));
            _mm256_storeu_ps(&im[i],
                _mm256_mul_ps(_mm256_loadu_ps(&im[i]), m));
        }
        for (; i < total; ++i) { re[i] *= mask[i]; im[i] *= mask[i]; }
#else
        for (int i = 0; i < total; ++i) {
            re[i] *= mask[i];
            im[i] *= mask[i];
        }
#endif
    }

    // ── Step 7: 逆向 2D FFT（KissFFT，含 1/N² 归一化）───────
    fft2d_kiss(re.data(), im.data(), N, true);

    // ── Step 8: 裁剪回原始尺寸 ──────────────────────────────
    std::vector<float> cropped(width * height);
    for (int y = 0; y < height; ++y)
        std::memcpy(cropped.data() + y * width,
                    re.data() + (y + offY) * N + offX,
                    width * sizeof(float));

    // ── Step 9: 自动缩放并转回 uint8（AVX 加速）─────────────
    const int n = static_cast<int>(cropped.size());
    dst_out.resize(n);

    if (autoscale) {
        const double pct = saturate ? 1.0 : 0.0;
        const float lo = (pct > 0.0)
            ? percentile(cropped.data(), n, pct)
            : *std::min_element(cropped.begin(), cropped.end());
        const float hi = (pct > 0.0)
            ? percentile(cropped.data(), n, 100.0 - pct)
            : *std::max_element(cropped.begin(), cropped.end());

        if (hi <= lo) {
            for (int i = 0; i < n; ++i) {
                const float v = (std::max)(0.0f,
                    (std::min)(cropped[i] * 255.0f, 255.0f));
                dst_out[i] = static_cast<uint8_t>(v);
            }
        } else {
            const float scale = 255.0f / (hi - lo);
#ifdef __AVX__
            {
                const __m256 vlo  = _mm256_set1_ps(lo);
                const __m256 vsc  = _mm256_set1_ps(scale);
                const __m256 vz   = _mm256_setzero_ps();
                const __m256 v255 = _mm256_set1_ps(255.0f);
                int i = 0;
                for (; i + 7 < n; i += 8) {
                    __m256 v = _mm256_loadu_ps(&cropped[i]);
                    v = _mm256_mul_ps(_mm256_sub_ps(v, vlo), vsc);
                    v = _mm256_max_ps(v, vz);
                    v = _mm256_min_ps(v, v255);
                    __m256i vi = _mm256_cvtps_epi32(v);
                    vi = _mm256_packus_epi32(vi, vi);
                    vi = _mm256_permute4x64_epi64(vi, 0xD8);
                    vi = _mm256_packus_epi16(vi, vi);
                    _mm_storel_epi64(
                        reinterpret_cast<__m128i*>(&dst_out[i]),
                        _mm256_castsi256_si128(vi));
                }
                for (; i < n; ++i) {
                    const float v = (std::max)(0.0f,
                        (std::min)((cropped[i] - lo) * scale, 255.0f));
                    dst_out[i] = static_cast<uint8_t>(v);
                }
            }
#else
            for (int i = 0; i < n; ++i) {
                const float v = (std::max)(0.0f,
                    (std::min)((cropped[i] - lo) * scale, 255.0f));
                dst_out[i] = static_cast<uint8_t>(v);
            }
#endif
        }
    } else {
        for (int i = 0; i < n; ++i) {
            const float v = (std::max)(0.0f,
                (std::min)(cropped[i] * 255.0f, 255.0f));
            dst_out[i] = static_cast<uint8_t>(v);
        }
    }
}

// ──────────────────────────────────────────────────────────────
// C 接口（与原版完全相同，C# P/Invoke 无需改动）
// ──────────────────────────────────────────────────────────────
extern "C" DLL_EXPORT int bandpass_filter_apply(
    const uint8_t* src, uint8_t* dst,
    int width, int height,
    float largeDia, float smallDia,
    int stripes, float tolerance,
    int autoscale, int saturate)
{
    if (!src || !dst || width <= 0 || height <= 0) return -1;
    try {
        std::vector<uint8_t> dst_vec;
        bandpass_filter filter;
        filter.apply(src, width, height,
                     largeDia, smallDia, stripes, tolerance,
                     autoscale != 0, saturate != 0, dst_vec);
        std::memcpy(dst, dst_vec.data(), dst_vec.size());
        return 0;
    } catch (...) {
        return -1;
    }
}

5.3 归一化：最容易出错的细节

KissFFT 不自动归一化，这和某些其他库不同，必须手动处理，否则逆变换后的数值会比正确值大 N² 倍，图像输出全白。

原版手写代码在每次 1D 逆变换末尾都除以 N，行变换 ×(1/N) 再加列变换 ×(1/N)，合计 1/N²。改版统一在两次变换之后乘以 1/N²，数学上完全等价，但只做一次内存遍历，配合 AVX 更高效。

六、C# 调用端（无需修改）

这是替换 FFT 库的最大优势——C 接口完全没有变化，C# 侧代码一行都不用动：

[DllImport("Dll1.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern int bandpass_filter_apply(
    IntPtr src, IntPtr dst, int width, int height,
    float largeDia, float smallDia, int stripes,
    float tolerance, int autoscale, int saturate);

private static Mat BandpassFilterApply(Mat src,
    float largeDia = 30f, float smallDia = 4f, int stripes = 1,
    float tolerance = 5f, bool autoscale = true, bool saturate = true)
{
    Mat gray = src.Channels() == 1
        ? src.Clone()
        : src.CvtColor(ColorConversionCodes.BGR2GRAY);
    Mat dst = new Mat(gray.Rows, gray.Cols, MatType.CV_8UC1);

    unsafe
    {
        int ret = bandpass_filter_apply(
            (IntPtr)gray.DataPointer,
            (IntPtr)dst.DataPointer,
            gray.Cols, gray.Rows,
            largeDia, smallDia, stripes, tolerance,
            autoscale ? 1 : 0, saturate ? 1 : 0);

        if (ret != 0)
            throw new Exception($"bandpass_filter_apply failed, code = {ret}");
    }

    gray.Dispose();
    return dst;
}

调用时也和之前完全一致：

using Mat fft = BandpassFilterApply(img, stripes: 0);

七、常见问题

Q：替换后效果和原来一样吗？

完全一样。KissFFT 是标准的 Cooley-Tukey 算法实现，数值结果与手写版本在浮点误差范围内完全一致（差异在 1e-6 量级），视觉上无法区分。

Q：速度有提升吗？

KissFFT 本身的 FFT 计算速度和手写版相近（都是标量 Cooley-Tukey），主要收益来自代码可维护性和多线程行列分解。如果追求极致速度，可以进一步换用 FFTW（需要额外配置，速度可达 3–8 倍）。

Q：为什么用单精度（float）而不是双精度（double）？

图像数据本身是 8-bit（0–255），单精度浮点（约 7 位有效数字）已经完全够用，而且单精度内存占用减半，AVX 一次可处理 8 个 float（双精度只能处理 4 个），整体性能更好。

八、总结

项目	原版（手写FFT）	新版（KissFFT）
FFT 实现	自写 Cooley-Tukey	KissFFT
代码行数（FFT部分）	~80 行	~60 行
第三方依赖	无	3个文件（无需编译配置）
多线程	有	有（行列各并行）
AVX 加速	有（归一化段）	有（归一化段）
C 接口	不变	完全不变
C# 调用端	不变	完全不变
数值结果	基准	与原版一致（<1e-6 误差）

整个迁移过程最核心的收获是：当内部实现被 C 接口完整封装后，替换底层库对上层调用方完全透明，这也是分层设计的价值所在。

如果你的项目也有类似的频域处理需求，KissFFT 是一个低风险、易集成的选择，值得一试。

如有问题欢迎在评论区交流。