基于CUDA为胶囊网络实现TensorFlow定制操作编者按：瑞典皇家理工学院（KTH）研究工程师Jos van de

作者：Jos van de Wolfshaar 编译：weakish

编者按：瑞典皇家理工学院（KTH）研究工程师Jos van de Wolfshaar最近撰写了一篇教程，以胶囊网络为例，展示了如何基于CUDA实现TensorFlow定制操作。

介绍

最近我花了一点时间进行CUDA编程，为TensorFlow实现了一些定制操作。作为练习，我决定尝试定制胶囊网络中的一个操作，该操作本来需要一些reshape的奇技淫巧或者至少两三个中间TensorFlow操作才能完成。如果你不熟悉胶囊网络，可以看下这篇文章。论文原作者开源了胶囊网络代码。本文讨论的代码：https://github.com/jostosh/capsnet

文中涉及的很多概念（CUDA、TensorFlow定制操作、梯度测试）可以通过相应的文档学习，但我总觉得，看看这些独立的元素组合起来是什么样会很有启发。这也是本文背后的主要动机：展示一下如何从头到尾开发基于CUDA的TensorFlow定制操作。所以让我们开始吧。

胶囊预测

本文关注的运算如下：

其中，û_j|i是由胶囊i通过矩阵向量乘法W_iju_i“预测”的胶囊j的激活向量。矩阵W_ij的形状为[out_dim, in_dim] ，向量u_i为胶囊i的输出向量。单个胶囊层在一个batch内为所有i,j对、所有样本进行该运算。因此，张量W_ij 和u_i实际上的形状为 [batch_size, in_caps, in_dim]和 [in_caps, out_caps, out_dim, in_dim]。这意味着我们将创建这样一个运算，该运算接受这两个张量，然后计算输出张量 u_hat_ji，该输出张量的形状为[batch_size, in_caps, out_caps, out_dim] 。换句话说，我们需要为所有batch索引[0,1,...,batch_size-1]和所有输入胶囊 [0,1,...,in_caps-1]和输出胶囊 [0,1,...,out_caps-1]的组合计算矩阵向量乘积。

TensorFlow核实现

基于GPU实现定制操作最有价值。TensorFlow文档提供了为TensorFlow编写C++核的必要材料。之后你可以阅读CUDA by Example: An Introduction to General-Purpose GPU Programming 一书，或者Nvidia Developer的教程An Even Easier Introduction to CUDA。本文不会重复这些细节，但会提供一个实际的例子，希望能帮助你理解如何基于CUDA编写自己的TensorFlow操作。对我而言，了解一些CUDA比我预想的要容易，但压榨出所有性能很需要技巧，我将把优化工作留待以后。

Op注册

先让我们实现Op（操作）的前馈。首先，我们将注册Op：

REGISTER_OP("CapsulePrediction")
.Input("input: T")
.Input("weights: T")
.Output("output: T")
.Attr("T: type")
.SetShapeFn([](InferenceContext* ctx) {
    // 获取形状并确保维度正确
    ShapeHandle in_shape;
    ShapeHandle weights_shape;
    TF_RETURN_IF_ERROR(ctx->WithRank(ctx->input(0), 3, &in_shape));
    TF_RETURN_IF_ERROR(ctx->WithRank(ctx->input(1), 4, &weights_shape));
    // 创建并设定输出形状
    DimensionHandle out_d0, out_d1, out_d2, out_d3;
    std::vector<DimensionHandle> out_dims;
    out_dims.push_back(ctx->MakeDim(ctx->Dim(ctx->input(0), 0)));
    out_dims.push_back(ctx->MakeDim(ctx->Dim(ctx->input(1), 0)));
    out_dims.push_back(ctx->MakeDim(ctx->Dim(ctx->input(1), 1)));
    out_dims.push_back(ctx->MakeDim(ctx->Dim(ctx->input(1), 2)));
    ShapeHandle out_shape = ctx->MakeShape(out_dims);
    ctx->set_output(0, out_shape);
    return Status::OK();
});

目前我就定义了这些，以后可以通过添加..: T 规格为不同的dtype指定不同的TensorFlow核。 ShapeHandle、 DimensionHandle、 InferenceContext定义在 tensorflow命名空间。代码中的形状函数实现为一个lambda函数，该函数首先确保 ctx->input(0)（输入u_i）及 ctx->input(1)（权重W_ij）具备正确的秩。接着，我们根据输入张量决定输出张量的维度。Op输出的维度为 [batch_size, in_caps, out_caps, out_dim]，因此我们从u_i张量取得 batch_size、 in_caps，从W_ij张量取得 out_caps和 out_dim。

前馈胶囊预测

现在，让我们看下Op的核。TensorFlow的“核”指对应特定设备的Op实现。定义一个定制的核时，应该从TensorFlow的OpKernel 继承，并实现Compute方法：

                                                                                    
                                                                                        
                                                                                            class CapsulePredictionOp : public OpKernel
                                                                                        
                                                                                        
                                                                                            {
                                                                                        
                                                                                        
                                                                                             public:
                                                                                        
                                                                                        
                                                                                              explicit CapsulePredictionOp(OpKernelConstruction* ctx) : OpKernel(ctx) { }
                                                                                        
                                                                                        
                                                                                            
                                                                                        
                                                                                        
                                                                                              void Compute(OpKernelContext* ctx) override
                                                                                        
                                                                                        
                                                                                              {
                                                                                        
                                                                                        
                                                                                                // 获取输入
                                                                                        
                                                                                        
                                                                                                const Tensor& input = ctx->input(0);
                                                                                        
                                                                                        
                                                                                                const Tensor& weights = ctx->input(1);
                                                                                        
                                                                                        
                                                                                            
                                                                                        
                                                                                        
                                                                                                // 设定输出形状
                                                                                        
                                                                                        
                                                                                                const TensorShape& input_shape(input.shape());
                                                                                        
                                                                                        
                                                                                                TensorShape output_shape(weights.shape());
                                                                                        
                                                                                        
                                                                                                output_shape.InsertDim(0, input_shape.dim_size(0));
                                                                                        
                                                                                        
                                                                                                output_shape.RemoveDim(4);
                                                                                        
                                                                                        
                                                                                            
                                                                                        
                                                                                        
                                                                                                // 分配输出张量
                                                                                        
                                                                                        
                                                                                                Tensor* output = nullptr;
                                                                                        
                                                                                        
                                                                                                OP_REQUIRES_OK(ctx, ctx->allocate_output(0, output_shape, &output));
                                                                                        
                                                                                        
                                                                                            
                                                                                        
                                                                                        
                                                                                                // 获取本征（Eigen）张量并传给启动器
                                                                                        
                                                                                        
                                                                                                auto input_tensor   = input.tensor<float, 3>();
                                                                                        
                                                                                        
                                                                                                auto weights_tensor = weights.tensor<float, 4>();
                                                                                        
                                                                                        
                                                                                                auto output_tensor  = output->tensor<float, 4>();
                                                                                        
                                                                                        
                                                                                                launchCapsulePrediction(ctx->eigen_device(), input_tensor, weights_tensor,
                                                                                        
                                                                                        
                                                                                                  output_tensor);
                                                                                        
                                                                                        
                                                                                              }
                                                                                        
                                                                                        
                                                                                            };

上面的实现还没有涉及到CUDA，但我们很快就会涉及，所以不必担心。上面的代码仅仅基于输入的形状初始化输出形状，并分配内存。参数OpKernelContext 对象确保在当前使用设备上分配内存。在我们的情形中，这将是GPU。接着，我们通过tensor 方法获取Eigen张量并将它们传递给我们自己的 launchCapsulePrediction函数（具体干活的函数）。

运行内核

我们的launchCapsulePrediction 函数像字面上一样（至少在CUDA术语中）在GPU上运行代码。也许有点让人迷惑的是，CUDA把在设备上运行代码的函数称为核。而在TensorFlow术语中，核不一定是GPU实现。让我们不要太纠结这些术语，直接看代码吧：

void launchCapsulePrediction(
  const GPUDevice& d,
  typename TTypes<float, 3>::ConstTensor x,
  typename TTypes<float, 4>::ConstTensor weights,
  typename TTypes<float, 4>::Tensor out)
{
  // 获取维度
  const int64 batch_size  = x.dimension(0);
  const int64 in_caps     = x.dimension(1);
  const int64 in_dim      = x.dimension(2);
  const int64 out_dim     = weights.dimension(2);
  const int64 out_caps    = weights.dimension(1);
  // 维度一的尺寸
  const int64 w_d0 = out_caps * out_dim * in_dim;
  const int64 x_d0 = in_caps * in_dim;
  const int64 o_d0 = in_caps * out_caps * out_dim;
  // 维度二
  const int64 w_d1 = out_dim * in_dim;
  const int64 x_d1 = in_dim;
  const int64 o_d1 = out_caps * out_dim;
  // 维度三
  const int64 w_d2 = in_dim;
  const int64 o_d2 = out_dim;
  // 为前馈操作运行CUDA核
  CudaLaunchConfig config = GetCudaLaunchConfig(out.size(), d);
  capsulePredictionKernel
    <<<config.block_count, config.thread_per_block, 0, d.stream()>>>(
      x.data(), weights.data(), out.data(),
      o_d0, o_d1, o_d2, x_d0, x_d1, w_d0, w_d1, w_d2,
      in_dim, out.size());
}

函数参数中的TTypes 模板和int64类型在 tensorflow命名空间中定义。接下来的关于维度的部分应该差不多是自解释的。因为我们以一维数组的形式将张量数据传给具体的CUDA核，我们需要搞明白每个维度和每个核的内存尺寸。注意当我说“内存尺寸”的时候，我指的是每轴的浮点数的数目而不是字节数。考虑每个张量的第一轴的内存尺寸：

                                                                                            
                                                                                                
                                                                                                    // 维度一的尺寸
                                                                                                
                                                                                                
                                                                                                    const int64 w_d0 = out_caps * out_dim * in_dim;
                                                                                                
                                                                                                
                                                                                                    const int64 x_d0 = in_caps * in_dim;
                                                                                                
                                                                                                
                                                                                                    const int64 o_d0 = in_caps * out_caps * out_dim;

不错，所以我们可以直接使用之前决定的维度。代码告诉我们，w_d0 不过是out_caps、 out_dim、 in_dim的乘积。所以如果我们希望从索引W_i,j,k,l跳到W_i+1,j,k,l，我们应该将 w_d0加到一维索引。你也许已经想到了，索引j和 w_d1同理。

代码最后是CUDA核运行器：

// 为前馈操作运行CUDA核
CudaLaunchConfig config = GetCudaLaunchConfig(out.size(), d);
capsulePredictionKernel
<<<config.block_count, config.thread_per_block, 0, d.stream()>>>(
  x.data(), weights.data(), out.data(),
  o_d0, o_d1, o_d2, x_d0, x_d1, w_d0, w_d1, w_d2,
  in_dim, out.size());

这些语句涉及一些新概念。第一个语句使用GetCudaLaunchConfig 实例以确定块（block）的数目，和每块的线程（thread）的数目（见TensorFlow头文件tensorflow/core/util/cuda_kernel_helper.h ）。如果你打算开发自己的Op，你当然应该看看这个头文件。capsulePredictionKernel 函数在GPU上使用CUDA并行化，通过三折定界符<<<config.block_count, config.thread_per_block, 0, d.stream()>>> 运行（译者注：“<<<...>>>”是CUDA的C++语法扩展，用于指定核的参数）。运行核时，必须指定块数和每块的线程数。第三个参数0 就目前而言没什么关系，你要是打算实现自己的核，它最有可能的值为零。CUDA流d.stream() 可以看成GPU指令的管道。当你给流添加核的时候，流将确保当前核在下一个核被调用前终结。如果你希望并行运行两个独立的任务，你可以使用两个流。

线程与块

分配给同一调用的所有块可以并行运行。如果你运行了带N 个块的核，那么你可以把它想象为运行N 个核函数的独立实例。这个真方便呀！nvcc 编译器将确保核函数访问精确的块索引，这样核的特定的块实例就知道该处理接收到的数组的哪些部分。

一个块可以包含多个线程。线程不过是并行的另一个层次，所以它们可以并行运行。你也许会问，为什么搞出另一个层次？因为线程可以做到块做不到的事。线程可以共享内存，当你想要多次使用同一个块中的某个输入数组的相同值时，这很有用。访问共享内存要快很多，也是优化最终CUDA实现的许多方式之一。下面是两个并行层次的图解：

CUDA核

简单温习CUDA上的线程和块之后，我们终于可以来看看前馈胶囊预测的CUDA实现了：

__global__ void capsulePredictionKernel(
    const float* in, const float* weights, float* out,
    const int64 o_d0, const int64 o_d1, const int64 o_d2,
    const int64 x_d0, const int64 x_d1,
    const int64 w_d0, const int64 w_d1, const int64 w_d2,
    const int64 in_dim, const int64 output_size)
{
  CUDA_1D_KERNEL_LOOP(i, output_size)
  {
    // 所以这里我们有 out[b,ci,cj,e]
    const int64 b     = i / o_d0;
    const int64 ci    = (i % o_d0) / o_d1;
    const int64 cj    = (i % o_d1) / o_d2;
    const int64 e_out  = i % o_d2;
    // 接着我们可以看看如何计算in和W的数组索引
    int64 in_idx = b * x_d0 + ci * x_d1;
    int64 w_idx = ci * w_d0 + cj * w_d1 + e_out * w_d2;
    // 初始化结果
    float result = 0.0;
    for (int64 v = 0; v < in_dim; ++v)
      // 对`in`和`weights`而言，前馈计算的后续元素也处于内存中的后续位置
      result += ldg(in + in_idx++) * ldg(weights + w_idx++);
    // 写入结果
    out[i] = result;
  }
}

你首先注意到的可能是函数定义前的__global__ 修饰符。nvcc编译器使用这个修饰符使函数全局可用，意味着它可以由CPU运行，或者说，CUDA术语中的宿主代码。函数的参数从 launchCapsulePrediction函数继承了名称，因此它们应该不会导致太多困惑。 CUDA_1D_KERNEL_LOOP是 tensorflow/core/util/cuda_kernel_helper.h 定义的宏：

for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < output_size;
     i += blockDim.x * gridDim.x)

使用TensorFlow的这个宏运行CUDA核迫使我们以一个抽象而便利的方式思考：它给我们某个索引i ，该索引对应输出数组out 的第i个元素。核的不同块/线程实例将获取对应i 的各自的值。现在我们只需计算出另外两个数组in 和weights 的索引。为了计算这个索引，我们判定batch索引b ，输入胶囊索引ci ，输出胶囊索引cj 和输出胶囊元素索引e_out ：

// 所以这里我们有 out[b,ci,cj,e]
const int64 b     = i / o_d0;
const int64 ci    = (i % o_d0) / o_d1;
const int64 cj    = (i % o_d1) / o_d2;
const int64 e_out = i % o_d2;

一旦我们知道了每个轴中包含的元素数目，判定这些就变得很容易了。事实上，我们将内存尺寸作为函数的参数。至于其他数组，我们可以将 b、 ci、 cj、 e_out转换为相应的一维索引：

                                                                                                        
                                                                                                            
                                                                                                                // 接着我们可以看看如何计算in和W的数组索引
                                                                                                            
                                                                                                            
                                                                                                                int64 in_idx = b * x_d0 + ci * x_d1;
                                                                                                            
                                                                                                            
                                                                                                                int64 w_idx  = ci * w_d0 + cj * w_d1 + e_out * w_d2;

再一次，我们使用每根轴已经提供的内存尺寸得出一维索引。这些是(i)batch索引为b 的输入胶囊ci 和(ii)输入胶囊ci 、输出胶囊cj 、输出胶囊元素e_out 的权重的第一个输入胶囊元素。如果你熟悉Matlab，那么这基本上就是Matlab中的sub2ind 函数。

我们假定in 和W的最后轴对应输入胶囊元素。这意味着它们在内存的后续位置，因此构建遍历单个输入胶囊元素的循环很直截了当：

                                                                                                            
                                                                                                                
                                                                                                                    // 初始化结果
                                                                                                                
                                                                                                                
                                                                                                                    float result = 0.0;
                                                                                                                
                                                                                                                
                                                                                                                    for (int64 v = 0; v < in_dim; ++v)
                                                                                                                
                                                                                                                
                                                                                                                        // 对`in`和`weights`而言，前馈计算的后续元素也处于内存中的后续位置
                                                                                                                
                                                                                                                
                                                                                                                        result += ldg(in + in_idx++) * ldg(weights + w_idx++);
                                                                                                                
                                                                                                                
                                                                                                                    // 写入结果
                                                                                                                
                                                                                                                
                                                                                                                    out[i] = result;

ldg函数为只读数据缓存加载函数。它接受指向要读取的元素的指针。别忘了我们正计算矩阵向量乘积，也就是内积的一些集合。潜在的改进是使用共享内存，因为单个输入胶囊的值使用了很多次，但我们将把这一优化留待未来。

测试

我希望本文是一个为GPU开发TensorFlow定制操作（Op）的端到端的示例。这包括测试Op。下面是基于 numpy进行的前馈运算：

                                                                                                                    
                                                                                                                        
                                                                                                                            import tensorflow as tf
                                                                                                                        
                                                                                                                        
                                                                                                                            from ops.capsuleprediction import capsule_prediction
                                                                                                                        
                                                                                                                        
                                                                                                                            import numpy as np
                                                                                                                        
                                                                                                                        
                                                                                                                            from parameterized import parameterized
                                                                                                                        
                                                                                                                        
                                                                                                                            import itertools
                                                                                                                        
                                                                                                                        
                                                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                            class CapsulePredictionOpTest(tf.test.TestCase):
                                                                                                                        
                                                                                                                        
                                                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                                @staticmethod
                                                                                                                        
                                                                                                                        
                                                                                                                                def _numpy_capsule_prediction(x, weights):
                                                                                                                        
                                                                                                                        
                                                                                                                                    """ 使用numpy为x和weights生成输出 """
                                                                                                                        
                                                                                                                        
                                                                                                                                    batch_size, in_caps, in_dim = x.shape
                                                                                                                        
                                                                                                                        
                                                                                                                                    _, out_caps, out_dim, _ = weights.shape
                                                                                                                        
                                                                                                                        
                                                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                                    out_shape = (batch_size, in_caps, out_caps, out_dim)
                                                                                                                        
                                                                                                                        
                                                                                                                                    out = np.zeros(out_shape)
                                                                                                                        
                                                                                                                        
                                                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                                    for b in range(batch_size):
                                                                                                                        
                                                                                                                        
                                                                                                                                        for i in range(in_caps):
                                                                                                                        
                                                                                                                        
                                                                                                                                            for j in range(out_caps):
                                                                                                                        
                                                                                                                        
                                                                                                                                                for c in range(out_dim):
                                                                                                                        
                                                                                                                        
                                                                                                                                                    out[b, i, j, c] = np.dot(x[b, i], weights[i, j, c])
                                                                                                                        
                                                                                                                        
                                                                                                                                    return out

ops/capsuleprediction.py 包含capsule_prediction （实际负责从编译好的共享库中加载Op）。解读上面的函数应该很直接：我们遍历batch、输入胶囊、输出胶囊、输出胶囊元素，然后为输出中的每个组合计算点积。结果将用于验证Op的前馈运算。另一个值得一提的是我们继承了tf.test.TestCase 类。该类提供了一些用于TensorFlow测试的辅助函数。

测试前馈胶囊预测：

@parameterized.expand([
    (batch_size, in_caps, out_caps, in_dim, out_dim) for
    batch_size, in_caps, out_caps, in_dim, out_dim in
    itertools.product([4, 8], [4, 8], [4, 8], [4, 8], [4, 8])
])
def test_capsule_prediction_op(self,
    batch_size,
    in_caps, out_caps, in_dim, out_dim):
    """ 测试capsmatmul（胶囊矩阵乘法）操作 """
    x = np.random.rand(batch_size, in_caps, in_dim)
    weights = np.random.rand(in_caps, out_caps, out_dim, in_dim)
    truth = self._numpy_capsule_prediction(x, weights)
    with self.test_session() as sess:
        x_ph = tf.placeholder(tf.float32, x.shape)
        w_ph = tf.placeholder(tf.float32, weights.shape)
        ret = capsule_prediction(x_ph, w_ph)
        out = sess.run(ret, {x_ph: x, w_ph: weights})
    self.assertAllClose(truth, out)

这里用到了不少技巧。首先，parameterized 装饰器提供了使用不同参数调用测试的方式，其中每个测试都应该成功，并且会被pytest 认为是独立的测试。如果失败了，给定的输入也会在测试日志中显示，所以就我的经验而言，如有必要，使用这个装饰器确实加快了调试过程。parameterized.expand 期望一个元组列表。每个元组将被展开成函数的位置参数。我们可以使用itertools.product 很容易地生成许多维度不同的元组，itertools.product 获取所有参数的笛卡尔积。

随机初始化x 和weights 数组。我们创建的TensorFlow图很简单：它只包含两个占位符和capsule_prediction Op。返回值应该和_numpy_capsule_prediction 的值相同。让我们运行测试：

pytest 的一个很不错的特性是你可以使用-k 标识以选择特定的测试集合。万岁，所有测试都通过了！

反向胶囊预测

接下来是反向运算。你会注意到，下面碰到的一些概念我们已经在前面讨论过了。我们已经了解通过维度尺寸计算一维数组的正确索引的方法，我们已经写了一个CUDA核，我们注册了Op，也配置了测试。因此，从现在开始我会加快一点推进速度。我们还没有讨论的内容只剩下梯度的精确定义。首先考虑一个普通的密集层：

给定对应z的损失函数L的梯度，我们可以计算x和W的梯度：

其中w_i是i行的向量，z'是储存输出的局部梯度的向量。基于以上i的梯度，x的整个梯度可以通过下式计算：

胶囊预测操作的梯度计算同理，我们只需将z'替换为û'_j|i，然后将W替换为W_ij。

计算特定权重的梯度可能更简单：

因此储存梯度的矩阵为外积：

直觉上，这告诉我们只需选择来自两层的两个连接的神经元，并将输出神经元和输入神经元的局部梯度相乘。这意味着我们可以对胶囊预测层进行同样的操作。唯一的不同是设计的张量维度。

梯度OpKernel

我不会再用更多细节来打扰你了：

                                                                                                                                                                                        
                                                                                                                                                                                            
                                                                                                                                                                                                class CapsulePredictionGradOp : public OpKernel
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                {
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                 public:
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                  explicit CapsulePredictionGradOp(OpKernelConstruction* ctx) : OpKernel(ctx) { }
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                  void Compute(OpKernelContext* ctx) override
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                  {
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    // 获取输入张量
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    const Tensor& grad = ctx->input(0);
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    const Tensor& input = ctx->input(1);
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    const Tensor& weights = ctx->input(2);
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    // 获取形状以便分配输出
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    const TensorShape& input_shape(input.shape());
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    const TensorShape& weights_shape(weights.shape());
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    // 分配输出
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    Tensor* grad_input = nullptr;
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    Tensor* grad_weights = nullptr;
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, input_shape, &grad_input));
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    OP_REQUIRES_OK(ctx, ctx->allocate_output(1, weights_shape, &grad_weights));
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    // 获取本征（Eigen）张量并传给启动器
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    auto input_tensor         = input.tensor<float, 3>();
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    auto weights_tensor       = weights.tensor<float, 4>();
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    auto grad_tensor          = grad.tensor<float, 4>();
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    auto grad_input_tensor    = grad_input->tensor<float, 3>();
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    auto grad_weights_tensor  = grad_weights->tensor<float, 4>();
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    launchCapsulePredictionGrad(
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                      ctx->eigen_device<GPUDevice>(), input_tensor, weights_tensor, grad_tensor,
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                      grad_input_tensor, grad_weights_tensor
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                    );
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                  }
                                                                                                                                                                                            
                                                                                                                                                                                            
                                                                                                                                                                                                };

这里没有什么真正的新东西。一个重要的不同是我们现在需要分配两个输出张量：一个用于权重梯度，一个用于输入梯度。张量的梯度的形状和张量自身的形状相同。因此设定分配的张量的形状是小菜一碟。让我们查看下 launchCapsulePredictionGrad 函数：

void launchCapsulePredictionGrad(
  const GPUDevice& d,
  typename TTypes<float, 3>::ConstTensor input,
  typename TTypes<float, 4>::ConstTensor weights,
  typename TTypes<float, 4>::ConstTensor grad,
  typename TTypes<float, 3>::Tensor grad_input,
  typename TTypes<float, 4>::Tensor grad_weights)
{
  const int64 batch_size  = input.dimension(0);
  const int64 in_caps     = input.dimension(1);
  const int64 in_dim      = input.dimension(2);
  const int64 out_dim     = weights.dimension(2);
  const int64 out_caps    = weights.dimension(1);
  // 维度一的尺寸
  const int64 w_d0 = out_caps * out_dim * in_dim;
  const int64 x_d0 = in_caps * in_dim;
  const int64 o_d0 = in_caps * out_caps * out_dim;
  // 维度二
  const int64 w_d1 = out_dim * in_dim;
  const int64 x_d1 = in_dim;
  const int64 o_d1 = out_caps * out_dim;
  // 维度三
  const int64 w_d2 = in_dim;
  const int64 o_d2 = out_dim;
  // 运行输入梯度核
  CudaLaunchConfig config = GetCudaLaunchConfig(grad_input.size(), d);
  capsulePredictionInputGradKernel
    <<<config.block_count, config.thread_per_block, 0, d.stream()>>>(
      grad.data(), weights.data(), grad_input.data(),
      w_d0, x_d0, x_d1, o_d0, o_d1, out_caps, out_dim, in_dim,
      grad_input.size());
  // 运行权重梯度核
  config = GetCudaLaunchConfig(grad_weights.size(), d);
  capsulePredictionWeightsGradKernel
    <<<config.block_count, config.thread_per_block, 0, d.stream()>>>(
      grad.data(), input.data(), grad_weights.data(), batch_size,
      grad_weights.size(), w_d0, w_d1, w_d2, x_d0, x_d1, o_d0, o_d1, o_d2);
}

再一次，我们看到了相似的代码结构。我们获取维度，决定内存尺寸，最后，我们运行两个而不是一个核。

我很高兴你还在阅读！现在变得复杂一点了。瞧瞧输入梯度CUDA核：

                                                                                                                                                                                                
                                                                                                                                                                                                    
                                                                                                                                                                                                        __global__ void capsulePredictionInputGradKernel(
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const float* grad, const float* weights, float* grad_input,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 w_d0,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 x_d0, const int64 x_d1,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 o_d0, const int64 o_d1,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 out_caps,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 out_dim,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 in_dim,
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          const int64 output_size)
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                        {
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          CUDA_1D_KERNEL_LOOP(i, output_size)
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          {
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            // 所以这里我们有 in_grad[b,ci,e]
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            const int64 b     = i / x_d0;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            const int64 ci    = (i % x_d0) / x_d1;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            const int64 e_in  = i % x_d1;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                        
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            // 接着我们可以看看如何计算in和W的数组索引
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            int64 w_idx       = ci * w_d0 + e_in;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            int64 grad_idx    = b * o_d0 + ci * o_d1;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                        
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            // 初始化结果
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            float result      = 0.0;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            // 迭代cj和e_out，我们已经具备其他索引
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            for (int cj = 0; cj < out_caps; ++cj)
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            {
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                              for (int e_out = 0; e_out < out_dim; ++e_out)
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                              {
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                                // 增加grad_idx可以得到梯度的下一个元素
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                                result  += ldg(grad + grad_idx++) * ldg(weights + w_idx);
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                                // 访问下一个输出胶囊元素，可以得到权重的下一个元素，
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                                // 也就是说我们将w_idx加上in_dim
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                                w_idx   += in_dim;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                              }
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            }
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            // 写入结果
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                            grad_input[i] = result;
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                          }
                                                                                                                                                                                                    
                                                                                                                                                                                                    
                                                                                                                                                                                                        }

我给代码加上了不少注释，以提高其可读性。和我们之前的CUDA核类似，这个函数判定相关的轴索引： b 、ci 、e_in 。这些索引接着用于计算w 和grad （输出梯度）的一维索引。一个单一的输入神经元不仅用于矩阵向量乘积，同时涉及所有预测矩阵

因此，我们需要两个循环，一个处理输出胶囊，另一个处理独立的胶囊元素。我们现在必须在内循环中跳至下一个输出胶囊，而不能仅仅增加权重的索引。这意味着我们需要在每次迭代中给索引加上 in_dim 。

权重梯度CUDA核实现为：

                                                                                                                                                                                                                
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        __global__ void capsulePredictionWeightsGradKernel(
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          const float* grad, const float* input, float* grad_weights,
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          const int64 batch_size, const int64 output_size,
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          const int64 w_d0, const int64 w_d1, const int64 w_d2,
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          const int64 x_d0, const int64 x_d1,
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          const int64 o_d0, const int64 o_d1, const int64 o_d2
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        )
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        {
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          CUDA_1D_KERNEL_LOOP(i, output_size)
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          {
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            // 所以这里我们有 w[ci,cj,e_out,e_in]
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            const int64 ci    = i / w_d0;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            const int64 cj    = (i % w_d0) / w_d1;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            const int64 e_out = (i % w_d1) / w_d2;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            const int64 e_in  = i % w_d2;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            // 接下来，我们可以看下如何为in和grad计算数组索引
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            int64 input_idx   = ci * x_d1 + e_in;               // (b == 0)
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            int64 grad_idx    = ci * o_d1 + cj * o_d2 + e_out;  // (b == 0)
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            // 初始化结果
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            float result = 0.0;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            // 我们仅仅遍历b，因为我们已经有了其他索引
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            for (int64 b = 0; b < batch_size; b++)
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            {
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                              result += ldg(grad + grad_idx) * ldg(input + input_idx);
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                              // 跳至下个batch可以得到新元素
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                              input_idx += x_d0;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                              grad_idx  += o_d0;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            }
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                            grad_weights[i] = result;
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                          }
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                        }

这里我们运用了同样的技巧：我们计算了张量轴索引，我们计算了一维索引，然后遍历所需元素以得到输出。我之前没有提到的是，我们需要计算batch中的所有样本的每个权重的梯度的总和。给定0轴的输入张量和输出张量的内存尺寸，这是一件直截了当的事。

在将我们的梯度实现集成到TensorFlow前，我们需要注册梯度到我们的 CapsulePrediction Op：

@ops.RegisterGradient("CapsulePrediction")
def _capsule_prediction_grad(op, grad):
    """ 为胶囊预测操作计算梯度 """
    return op_module.capsule_prediction_grad(grad, op.inputs[0], op.inputs[1])

现在我们可以直接使用 tf.gradients ，之后梯度计算图会包含我们的梯度操作。太棒了！

测试梯度

我们已经到了最后阶段：测试梯度。实际上这没有听起来那么难。TensorFlow自带梯度测试工具，我们这里将使用它。我们给 CapsulePredictionOpTest 类加上如下方法：

@parameterized.expand([
    (batch_size, in_caps, out_caps, in_dim, out_dim) for
    batch_size, in_caps, out_caps, in_dim, out_dim in
    itertools.product([4, 8], [4, 8], [4, 8], [4, 8], [4, 8])
])
def test_capsule_prediction_weights_grad(self, batch_size, in_caps, out_caps,
                                         in_dim, out_dim):
    """ 测试对应权重的输出梯度 """
    x = np.random.rand(batch_size, in_caps, in_dim)
    weights = np.random.rand(in_caps, out_caps, out_dim, in_dim)
    out_shape = (batch_size, in_caps, out_caps, out_dim)
    with self.test_session():
        x_ph = tf.placeholder(tf.float32, x.shape)
        w_ph = tf.placeholder(tf.float32, weights.shape)
        fd = {x_ph: x, w_ph: weights}
        caps_out = capsule_prediction(x_ph, w_ph)
        grad_w = tf.test.compute_gradient(
            w_ph, weights.shape, caps_out, out_shape, extra_feed_dict=fd
        )
    self.assertAllClose(grad_w[0], grad_w[1], atol=1e-3, rtol=1e-3)
@parameterized.expand([
    (batch_size, in_caps, out_caps, in_dim, out_dim) for
    batch_size, in_caps, out_caps, in_dim, out_dim in
    itertools.product([4, 8], [4, 8], [4, 8], [4, 8], [4, 8])
])
def test_capsule_prediction_input_grad(self, batch_size, in_caps, out_caps,
                                       in_dim, out_dim):
    """ 测试对应x的输出梯度 """
    x = np.random.rand(batch_size, in_caps, in_dim)
    weights = np.random.rand(in_caps, out_caps, out_dim, in_dim)
    out_shape = (batch_size, in_caps, out_caps, out_dim)
    with self.test_session():
        x_ph = tf.placeholder(tf.float32, x.shape)
        w_ph = tf.placeholder(tf.float32, weights.shape)
        fd = {x_ph: x, w_ph: weights}
        caps_out = capsule_prediction(x_ph, w_ph)
        grad_x = tf.test.compute_gradient(
            x_ph, x.shape, caps_out, out_shape, extra_feed_dict=fd
        )
    self.assertAllClose(grad_x[0], grad_x[1], atol=1e-3, rtol=1e-3)

tf.test.compute_gradient 函数决定“理论”和数值上的梯度。数值梯度通过差分计算数值梯度，而我们的Op注册的梯度计算理论梯度。它们应该接近相等，所以我们使用继承自 tf.test.TestCase 的 assertAllClose 方法断言两者接近。下面是结果输出：

万岁！它成功了！

在MNIST分类上运行

在我的前一篇文章，我讨论了应用胶囊网络至MNIST分类问题。我们可以在代码中插入 capsule_prediction ：

                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    def
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                        digit_caps(incoming,
                                                                                                                                                                                                                                                                                            n_digit_caps,
                                                                                                                                                                                                                                                                                            dim_digit_caps,
                                                                                                                                                                                                                                                                                            name=
                                                                                                                                                                                                                                                                                            "DigitCaps"
                                                                                                                                                                                                                                                                                                ,
                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        neuron_axis=-
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                capsule_axis=-
                                                                                                                                                                                                                                                                                                2
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        routing_iters=
                                                                                                                                                                                                                                                                                                        3
                                                                                                                                                                                                                                                                                                            ):
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        """
                                                                                                                                                                                                                                                                                            数字胶囊层
                                                                                                                                                                                                                                                                                            """
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        with
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            tf.variable_scope(name):
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        获取前一层的胶囊数和维度
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     in_shape
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    incoming.shape.as_list()
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     n_primary_caps
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    in_shape[capsule_axis]
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     dim_primary_caps
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    in_shape[neuron_axis]
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        初始化所有权重矩阵
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        if
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            args.custom_op:
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     w_shape
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    [
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     n_primary_caps,
                                                                                                                                                                                                                                                                                    n_digit_caps,
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     dim_digit_caps,
                                                                                                                                                                                                                                                                                    dim_primary_caps
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     ]
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        else
                                                                                                                                                                                                                                                                                            :
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     w_shape
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    [
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     n_primary_caps,
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     n_digit_caps
                                                                                                                                                                                                                                                                                    *
                                                                                                                                                                                                                                                                                    dim_digit_caps,
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     dim_primary_caps
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     ]
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     W_ij
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.get_variable(
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        "weights"
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                shape=w_shape,
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     initializer=tf.keras.initializers.glorot_uniform()
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     )
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        初始化路由logit，
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        加上一个尺寸为1的第一轴，这样比较方便
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     b_ij
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.get_variable(
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        "logits"
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                shape=[
                                                                                                                                                                                                                                                                                                1
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        n_primary_caps,
                                                                                                                                                                                                                                                                                                        n_digit_caps],
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     initializer=tf.zeros_initializer(),
                                                                                                                                                                                                                                                                                    trainable=args.logits_trainable
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     )
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        if
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            args.custom_op:
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        定制操作
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     u_hat
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    capsule_prediction(incoming,
                                                                                                                                                                                                                                                                                    W_ij)
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        else
                                                                                                                                                                                                                                                                                            :
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        reshape和转置的奇技淫巧
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         u_i
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        tf.transpose(incoming,
                                                                                                                                                                                                                                                                                        (
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                2
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                                            ))
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     u_hat
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.matmul(W_ij,
                                                                                                                                                                                                                                                                                    u_i)
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     u_hat
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.reshape(
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         tf.transpose(u_hat,
                                                                                                                                                                                                                                                                                        (
                                                                                                                                                                                                                                                                                        2
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                                            )),
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         (-
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                n_primary_caps,
                                                                                                                                                                                                                                                                                                n_digit_caps,
                                                                                                                                                                                                                                                                                                dim_digit_caps)
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     )
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        def
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            capsule_out(b_ij):
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        """
                                                                                                                                                                                                                                                                                            给定logit
                                                                                                                                                                                                                                                                                            b_ij，计算该层的输出。
                                                                                                                                                                                                                                                                                            """
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         c_ij
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        tf.nn.softmax(b_ij,
                                                                                                                                                                                                                                                                                        axis=
                                                                                                                                                                                                                                                                                        2
                                                                                                                                                                                                                                                                                            )
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     s_j
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.reduce_sum(
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         tf.reshape(c_ij,
                                                                                                                                                                                                                                                                                        (-
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                n_primary_caps,
                                                                                                                                                                                                                                                                                                n_digit_caps,
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                1
                                                                                                                                                                                                                                                                                                    ))
                                                                                                                                                                                                                                                                                                        *
                                                                                                                                                                                                                                                                                                        u_hat,
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         axis=
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     )
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     v_j
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    squash(s_j)
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        return
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            v_j
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        def
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            routing_iteration(iter,
                                                                                                                                                                                                                                                                                                logits):
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        """
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     给定一组logit，
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     基于论文中定义的路由计算新logit。
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     """
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     v_j
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    capsule_out(logits)
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         a_ij
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        tf.reduce_sum(tf.expand_dims(v_j,
                                                                                                                                                                                                                                                                                        axis=
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            )
                                                                                                                                                                                                                                                                                                *
                                                                                                                                                                                                                                                                                                u_hat,
                                                                                                                                                                                                                                                                                                axis=
                                                                                                                                                                                                                                                                                                3
                                                                                                                                                                                                                                                                                                    )
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         logits
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        tf.reshape(logits
                                                                                                                                                                                                                                                                                        +
                                                                                                                                                                                                                                                                                        a_ij,
                                                                                                                                                                                                                                                                                        (-
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ,
                                                                                                                                                                                                                                                                                                n_primary_caps,
                                                                                                                                                                                                                                                                                                n_digit_caps))
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        return
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            [iter
                                                                                                                                                                                                                                                                                                +
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                1
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        logits]
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        计算路由
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         i
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        tf.constant(
                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                            )
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     routing_result
                                                                                                                                                                                                                                                                                    =
                                                                                                                                                                                                                                                                                    tf.while_loop(
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        lambda
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            i,
                                                                                                                                                                                                                                                                                                logits:
                                                                                                                                                                                                                                                                                                tf.less(i,
                                                                                                                                                                                                                                                                                                routing_iters),
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     routing_iteration,
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         [i,
                                                                                                                                                                                                                                                                                        tf.tile(b_ij,
                                                                                                                                                                                                                                                                                        tf.stack([tf.shape(incoming)[
                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                            ],
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                1
                                                                                                                                                                                                                                                                                                    ,
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                                            ]))]
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                     )
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         #
                                                                                                                                                                                                                                                                                        路由结果的第二个元素包含我们的最终logit
                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                         v_j
                                                                                                                                                                                                                                                                                        =
                                                                                                                                                                                                                                                                                        capsule_out(routing_result[
                                                                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                                                                            ])
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                        return
                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                            v_j

那么性能怎么样呢？好吧，结果发现使用定制操作进行训练不知怎么比使用reshape和移置的训练慢。就像我之前提到的那样，这些代码可以进一步优化。也许这会是我下一篇文章的内容。不管怎么样，感谢你读到结尾，欢迎建议和反馈。

原文地址：https://jostosh.github.io/posts/capscuda.html

相关阅读： CapsNet入门系列之一：胶囊网络背后的直觉

CapsNet入门系列之二：胶囊如何工作

CapsNet入门系列之三：囊间动态路由算法

CapsNet入门系列之四：胶囊网络架构

CapsNet入门系列番外：基于TensorFlow实现胶囊网络