`cKDTree` 对两个 DataFrame 进行最近邻搜索可以通过构建 cKDTree 对两个 DataFrame

可以通过构建 cKDTree 对两个 DataFrame 进行最近邻搜索。假设 dataframe1 有一列 col1，dataframe2 有一列 col2 和 col3，我们希望找到 dataframe1 中 col1 每个值在 dataframe2 中 col2 列最近的值，并返回对应的 col3 列的值。以下是一个详细的步骤和代码示例：

示例数据

假设我们有以下两个 DataFrame：

import pandas as pd

data1 = {
    'col1': [1.5, 3.5, 5.5]
}
data2 = {
    'col2': [1, 2, 3, 4, 5],
    'col3': ['a', 'b', 'c', 'd', 'e']
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

使用 `cKDTree` 实现最近邻搜索

构建 cKDTree：用 df2 中的 col2 列构建一个 cKDTree。
查询最近邻：对 df1 中的每个 col1 列的值进行最近邻搜索。
获取对应值：找到对应的 col3 列的值。

import numpy as np
from scipy.spatial import cKDTree

# Step 1: 构建 cKDTree
tree = cKDTree(df2[['col2']].values)

# Step 2: 对 df1 的 col1 列进行最近邻搜索
distances, indices = tree.query(df1[['col1']].values)

# Step 3: 找到最近邻的 col3 值
nearest_col3 = df2.loc[indices, 'col3'].values

# 将最近邻的 col3 值添加到 df1
df1['nearest_col3'] = nearest_col3

print(df1)

结果

通过上述代码，可以得到 df1 中 col1 每个值在 df2 中 col2 列最近的值所对应的 col3 列的值。

   col1 nearest_col3
0   1.5            a
1   3.5            c
2   5.5            e

解释

构建 cKDTree：使用 df2 中的 col2 列的数据创建一个 cKDTree。
查询最近邻：对于 df1 中的每个 col1 列的值，使用 cKDTree.query 方法找到在 df2 中最接近的 col2 值的索引。
获取对应值：使用找到的索引从 df2 中获取对应的 col3 列的值，并将这些值添加到 df1 中。

这样，就可以在两个 DataFrame 之间高效地找到最近邻并获取对应的值。

`cKDTree` 对两个 DataFrame 进行最近邻搜索

示例数据

使用 cKDTree 实现最近邻搜索

结果

解释

使用 `cKDTree` 实现最近邻搜索