土法神经网络 Part I：计算图这是本系列教程的第一章。本章将带领你入门深度神经网络的数学和算法基础。然后我们将效仿

原文：Deep Learning From Scratch I: Computational Graphs

翻译：孙一萌

审校：Kaiser

这是本系列教程的第一章。本章将带领你入门深度神经网络的数学和算法基础。然后我们将效仿 TensorFlow API，自己动手用 Python 实现一个神经网络库。

学习本章不需要任何机器学习或者神经网络的基础。但是，对于本科生级别的微积分、线性代数、基础算法和概率学，需要有一定的基础。如果学习过程中遇到困难，请写在评论里。

在本章结束时，你将会深入理解神经网络背后的数学知识，以及深度学习库在其背后所起的作用。

我会让代码尽可能的简单明了，相比于运行效率，这样更加易于理解。由于我们的 API 是效仿 TensorFlow 的，你在完成本章的学习后，自然会知道如何使用 TensorFlow 的 API，以及 TensorFlow 背后的运行机理（而不是花时间去学习某个全能、最高效的 API）。

计算图 Computational graphs

我们从计算图(computational graph)的理论开始，因为神经网络本身是计算图的一个特殊形式。

Computational graph 是有向图，其中的节点都对应着操作(Operation) 或者变量(Variable)。

Variable 可以把自己的值递送给 Operation，而 Operation 可以把自己的输出递送给其他的 Operation。这样的话，计算图中的每一个节点都定义了 graph 中的 Variable 的一个函数（本句意义可以参照“函数”的定义，大意为一种输入对应一种输出）。

递送入节点的、从节点中传出的值，被称为 tensor，这是个用于多维数组的词。因此，它包括标量、矢量、矩阵，也包括高阶的张量（tensor）。

下例中的 computational graph 把两个输入 x 和 y 相加，计算得总和 z。

本例中，x 和 y 是 z 的输入节点，z 是 x 和 y 的消耗者。z 因此定义了一个函数，即：

$z:R^2 -> R$ where $z(x,y) = x + y$

当计算变得越来越复杂时，computational graph 的概念就越显得重要了。比如，下面的 computational graph 定义了一个仿射变换：

操作 Operations

每一个 Operation 有三项特征：

一个计算函数：用于计算对于给定的输入，应当输出的值
输入节点(node)：可有多个，可以是 Variable 或者其他 Operation
consumer：可有多个，将 Operation 的输出作为它们的输入

实现代码：

class Operation:
    """Represents a graph node that performs a computation.

    An `Operation` is a node in a `Graph` that takes zero or
    more objects as input, and produces zero or more objects
    as output.
    """

    def __init__(self, input_nodes=[]):
        """Construct Operation
        """
        self.input_nodes = input_nodes

        # Initialize list of consumers (i.e. nodes that receive this operation's output as input)
        self.consumers = []

        # Append this operation to the list of consumers of all input nodes
        for input_node in input_nodes:
            input_node.consumers.append(self)

        # Append this operation to the list of operations in the currently active default graph
        _default_graph.operations.append(self)

    def compute(self):
        """Computes the output of this operation.
        "" Must be implemented by the particular operation.
        """
        pass

一些简单的 Operation

为了熟悉操作类（日后会需要），我们来实现一些简单的 Operation。在这两个Operation中，我们假定所有tensor都是NumPy数组，这样的话，元素加法和矩阵乘法（.点号）就不需要我们自己实现了。

加法

class add(Operation):
    """Returns x + y element-wise.
    """

    def __init__(self, x, y):
        """Construct add

        Args:
          x: First summand node
          y: Second summand node
        """
        super().__init__([x, y])

    def compute(self, x_value, y_value):
        """Compute the output of the add operation

        Args:
          x_value: First summand value
          y_value: Second summand value
        """
        return x_value + y_value

矩阵乘法

class matmul(Operation):
    """Multiplies matrix a by matrix b, producing a * b.
    """

    def __init__(self, a, b):
        """Construct matmul

        Args:
          a: First matrix
          b: Second matrix
        """
        super().__init__([a, b])

    def compute(self, a_value, b_value):
        """Compute the output of the matmul operation

        Args:
          a_value: First matrix value
          b_value: Second matrix value
        """
        return a_value.dot(b_value)

占位符 Placeholders

在计算图中，并非所有节点都是Operation，比如在仿射变化的graph 中， $A$ , $x$ 和 $b$ 都不是 Operation。相对地，它们是graph的输入，而且，如果我们想要计算 graph 的输出，就必须为它们各提供一个值。为了提供这样的值，我们引入 placeholder。

class placeholder:
    """Represents a placeholder node that has to be provided with a value
       when computing the output of a computational graph
    """

    def __init__(self):
        """Construct placeholder
        """
        self.consumers = []

        # Append this placeholder to the list of placeholders in the currently active default graph
        _default_graph.placeholders.append(self)

变量 Variables

在仿射变换的 graph 中， $x$ 与 $A$ 和 $b$ 有本质的不同。x 是 operation 的输入，而 A 和 b 是 operation 的参数，即它们是 graph 本身固有的。我们把 A 和 b 这样的参数称为 variable。

class Variable:
    """Represents a variable (i.e. an intrinsic, changeable parameter of a computational graph).
    """

    def __init__(self, initial_value=None):
        """Construct Variable

        Args:
          initial_value: The initial value of this variable
        """
        self.value = initial_value
        self.consumers = []

        # Append this variable to the list of variables in the currently active default graph
        _default_graph.variables.append(self)

Graph类

最后，我们需要一个把所有 operation, placeholder 和 variable 包含在一起的类。创建一个新的 graph 时，可以通过调用 as_default 方法来设置它的 _defaultgraph。

通过这个方式，我们不用每次都传入一个 graph 的引用，就可以创建 operation, placeholder 和 variable。

class Graph:
    """Represents a computational graph
    """

    def __init__(self):
        """Construct Graph"""
        self.operations = []
        self.placeholders = []
        self.variables = []

    def as_default(self):
        global _default_graph
        _default_graph = self

举例

现在我们来用上面列举的类，创建一个仿射变换的 computational graph：

# Create a new graph
Graph().as_default()

# Create variables
A = Variable([[1, 0], [0, -1]])
b = Variable([1, 1])

# Create placeholder
x = placeholder()

# Create hidden node y
y = matmul(A, x)

# Create output node z
z = add(y, b)

计算操作输出

既然已经学会了怎么创建计算图，我们就该考虑怎么计算 operation 的输出了。

创建一个会话(Session) 类，用来包括一个 operation 的执行。我们希望能够对 session 的实例调用 run 方法，能够传入需要计算的 operation，以及一个包含所有 placeholder 所需要的值的字典。

session = Session()
output = session.run(z, {
    x: [1, 2]
})

这里计算过程是这样的：

为了计算 operation 所代表的函数，我们需要按正确的顺序进行计算。比如，如果中间结果 y 还没计算出来，我们就不能先计算 z。因此我们必须确保 operation 执行顺序正确，只有这样才能确保在计算某个 operation之前，它所需要的输入节点的值都已经计算好了。这点可以通过后序树遍历实现。

import numpy as np


class Session:
    """Represents a particular execution of a computational graph.
    """

    def run(self, operation, feed_dict={}):
        """Computes the output of an operation

        Args:
          operation: The operation whose output we'd like to compute.
          feed_dict: A dictionary that maps placeholders to values for this session
        """

        # Perform a post-order traversal of the graph to bring the nodes into the right order
        nodes_postorder = traverse_postorder(operation)

        # Iterate all nodes to determine their value
        for node in nodes_postorder:

            if type(node) == placeholder:
                # Set the node value to the placeholder value from feed_dict
                node.output = feed_dict[node]
            elif type(node) == Variable:
                # Set the node value to the variable's value attribute
                node.output = node.value
            else:  # Operation
                # Get the input values for this operation from node_values
                node.inputs = [input_node.output for input_node in node.input_nodes]

                # Compute the output of this operation
                node.output = node.compute(*node.inputs)

            # Convert lists to numpy arrays
            if type(node.output) == list:
                node.output = np.array(node.output)

        # Return the requested node value
        return operation.output


def traverse_postorder(operation):
    """Performs a post-order traversal, returning a list of nodes
    in the order in which they have to be computed

    Args:
       operation: The operation to start traversal at
    """

    nodes_postorder = []

    def recurse(node):
        if isinstance(node, Operation):
            for input_node in node.input_nodes:
                recurse(input_node)
        nodes_postorder.append(node)

    recurse(operation)
    return nodes_postorder

测试一下上例里头我们写的类：

session = Session()
output = session.run(z, {
    x: [1, 2]
})
print(output)

矮油，不错哦。

如果有任何问题，欢迎评论交流。

土法神经网络 Part I：计算图

计算图 Computational graphs

操作 Operations

一些简单的 Operation

加法

矩阵乘法

占位符 Placeholders

变量 Variables

Graph类

举例

计算操作输出

推荐阅读

用PaddlePaddle调戏邮件诈骗犯（完结篇）

这评论有毒！——文本分类的一般套路

用Python和Keras搭建你自己的AlphaZero