先说明和自动微分关系不大的,比如一些优化,我觉得应该学,代码应该看一下,但是细学,这里很难用一篇博客说明白。大家可以跟着我的代码实现,从而更好的理解,一些无关紧要的步骤(因为重点是自动微分)就只给代码和一丢丢讲解。为什么这个代码这么细,因为我其他博客的代码需要用到这里写的自动微分 。
1.导数
导数就是表示某个瞬间的变化量,可以被定义为以下式子:
式子(1)表示的是函数的导数,在本科阶段已经学过了,现在用python去实现这个运算。
def numercial_diff(f,x):
h = 10e-50
return (f(x + h) - f(x)) / h
def f(x):
return x*x
x = 2
print(numercial_diff(f, x))
会输出0,但是按照我们的常识,在2处的导数应该是4。这是因为这个代码有一些缺陷,所以结果不尽人意。首先h = 10e-50尽管是我们想到的尽可能小的数,但是python存在舍入误差。展示如下:
import numpy as np
print(np.float32(10e-50))
这里输出为0,这是因为python会产生舍入误差(在计算机组成中我们曾经学习过,float(浮点数)尾数是有限长度的,超过这个长度的会被舍去),所以我们把h设置为其他值比如再试一下会接近答案。
但是此时结果还是会有误差,这是因为h不可能无限接近于0,真正要求的导数是在x = 2处的,但是在这里求的是在[x,x+h]之间某点的斜率(用罗尔定理可以证明),所以求的导数并不一致。
既然这样,我们为了减少这个误差,可以去算在[x - h, x + h]之间某点的斜率(这个操作也被称为中心差分),例子如下:
def numercial_diff(f,x):
h = 10e-4
return (f(x + h) - f(x - h)) / (2 * h)
def f(x):
return x*x
x = 2
print(numercial_diff(f, x))
这样还是会存在误差,如果我们提前知道f(x)的导数从而直接计算的话,得到的就没有误差。
def numercial_diff_squre(x):
return 2* x
def f(x):
return x*x
x = 2
print(numercial_diff_squre(x))
2.面向对象实现函数代码
这里先实现标量的运算
我们需要两个类,一个代表数值类,一个代表函数类,从而实现如下的运算:
也就是这个过程的运算。圆圈是数值类,方形是函数类。
class Variable:
# 每次创建Variable类都会自动运行__init__下的代码
def __init__(self,data):
self.data = data
class Function:
# __call__使得我们可以用Function()(这里写入__call__的参数)来使用函数
def __call__(self,input):
x = input.data
y= x ** 2
output = Variable(y)
return output
x = Variable(2)
y = Function()(x)
print(y.data)
但是我们平时用的函数很多,不光光是这一种,这里我们想让Function类有更大的泛用性,所以可以用继承,把Function类变成一个父类,其中Function函数主要进行函数运算,用forward函数实现。
class Variable:
# 每次创建Variable类都会自动运行__init__下的代码
def __init__(self,data):
self.data = data
class Function:
# __call__使得我们可以用Function()(这里写入__call__的参数)来使用函数
def __call__(self,input):
x = input.data
y= self.forward(x)
output = Variable(y)
return output
def forward(self,x):
raise NotImplementedError()
class Square(Function):
def forward(self, x):
return x ** 2
x = Variable(2)
y = Square()(x)
print(y.data)
3.反向传播运算过程简介
反向传播的理论基础就是链式法则(本科高等数学重点),如下例:
函数为,那么
其中,考虑到我们反向传播的实现,需要带这一项。
在代码中,我们按照下图的计算顺序进行计算:
,按照上图的顺序,上图中输出的地方才是y,这里我们先对y进行微分,然后对a进行微分,最后对x进行微分,而在正常计算中我们按照,x -> a ->y的顺序进行计算的。
4.在类中加入反向传播的计算
简单介绍了下反向传播(如果我现在日更,很快就会仔细介绍这个过程了),现在我们来在已有的代码加入反向传播。注意当我们计算的时候是从y -> a的过程,这里用到了实例a的data值,所以和递归思路类似,每一个函数实例需要保存前面的数值实例,来方便运算。
import numpy as np
class Variable:
def __init__(self,data):
self.data = data
self.grad = None #反向传播用到的梯度,保存当前的梯度大小
class Function:
def __call__(self,input):
x = input.data
y= self.forward(x)
output = Variable(y)
self.input = input #保存输入变量,进行计算反向传播的时候,需要用到前一个圆形类的data
return output
def forward(self,x):
raise NotImplementedError()
def backward(self,dy):#在Function中新加入
raise NotImplementedError()#在Function中新加入
class Square(Function):
def forward(self, x):
return x ** 2
def backward(self, gy):
x = self.input.gydata
return 2 * gy * x
class Exp(Function):
def forward(self, x):
return np.exp(x)
def backward(self, gy):
x = self.input.gydata
return gy * np.exp(x)
x = Variable(2)
A = Square()
a = A(x)
B = Exp()
b = B(a)
C = Square()
y = C(b)
print(y.data)
这里实现了和的函数实例。下面画一下其计算图。
下面进行反向传播的计算(接上面代码)。
y.grad = np.array(1.0) # dy/dy
b.grad = C.backward(y.grad) # dy/dy * dy/db
a.grad = B.backward(b.grad)# dy/dy * dy/db * db/da
x.grad = A.backward(a.grad)# dy/dy * dy/db * db/da * da/dx
5.反向传播的自动化
以上反向传播的过程十分不易用,我们想让反向传播的计算自动进行,就是y使用一个函数就可以算出x的grad值。首先建立可以进行反向传播的数据结构
import numpy as np
def as_array(x):#把标量变成ndarray类型
if np.isscalar(x):#新增
return np.array(x)#新增
return x#新增
class Variable:
def __init__(self,data):
def __init__(self,data):
if data is not None:#只允许numpy的ndarray数据类型,为了管理方便。(因为以后会牵扯矩阵运算,float,int等类型不能让他这么进来。
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None #可以在反向传播,类似递归,找到前面的方形函数实例,
def set_creator(self,func): #初始化前面的方形函数实例的函数
self.creator = func #新加的
class Function:
def __call__(self,input):
x = input.data
y= self.forward(x)
output = Variable(as_array(y)) #比如ndarray类型的x,进行x ** 2会变成np.float32类型,这时候需要重新变成ndarray类型
output.set_creator(self) #这里就是初始化output的creator实例,使其可以在反向传播中找到全面的方形函数
self.input = input
self.output = output #也保存出的数值实例
return output
def forward(self,x):
raise NotImplementedError()
def backward(self,dy):
raise NotImplementedError()
class Square(Function):
def forward(self, x):
return x ** 2
def backward(self, gy):
x = self.input.gydata
return 2 * gy * x
def square(x): #简化代码
return Square()(x) #简化代码
class Exp(Function):
def forward(self, x):
return np.exp(x)
def backward(self, gy):
x = self.input.gydata
return gy * np.exp(x)
def exp(x): #简化代码
return Exp()(x) #简化代码
x = Variable(np.array(2))
A = Square()
a = A(x)
B = Exp()
b = exp(a)
C = Square()
y = C(b)
#或者用简化的函数去完成实验
#a = square(x)
#b = exp(a)
#y = square(b)
# 检验建立的结构是否靠谱,assert后面如果不对,会报错
assert y.creator == C
assert y.creator.input == b
assert y.creator.input.creator == B
assert y.creator.input.creator.input == a
assert y.creator.input.creator.input.creator == A
assert y.creator.input.creator.input.creator.input == x
下面在圆形数值实例中实现反向传播,实现用y.backward()直接求出x.grad的效果。 在Variable类中加入下面backward函数
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data) #对初始的dy/dy设置为1
funcs = [self.creator]
while funcs: #如果funcs不为空就进入循环
f = funcs.pop()#取出一个方形的函数实例并在列表funcs中删除
x,y = f.input,f.output
x.grad = f.backward(y.grad)#计算当前的反向传播得到的值
if x.creator is not None:
funcs.append(x.creator)
使用类进行代码实验:
x = Variable(np.array(0.5))
a = square(x)
b = exp(a)
y = square(b)
y.backward()
print(x.grad)
结果为3.297442541400256,这里只用一个grad就实现了,这时候相信大家已经感觉出来,自动微分完成后,一切复杂的模型,弹指间灰飞烟灭,haha。
6.实现复杂的计算
当前我们可以实现只有一个输入和输出的正向传播和反向传播,但是在现实中可能存在多个输入和输出的正向和反向传播。如下图的x + y的函数就没法实现,
我们这时候需要让Function类中的运算支持多个输入和输出,可以使用可变长参数的技巧去完成代码。
class Function:
def __call__(self,*inputs):
xs = [x.data for x in inputs] #把可变长实例的值都拿出来放入xs中
ys = self.forward(xs) #通过一个列表输入,列表表示可能有一个或者多个输入,得到一个列表可能有一个或多个输出
outputs = [Variable(as_array(y)) for y in ys] #把输出列表中的值的值,变成一个存放数值实例variable的列表
for output in outputs: #设置每一个输出实例的creator为当前函数
output.set_creator(self) #进行单个输出实例的设置
self.inputs = inputs
self.outputs = outputs
return outputs if len(outputs) > 1 else outputs[0] #如果列表只有一个元素则不需要返回列表,只返回这个数就行
def forward(self,xs):
raise NotImplementedError()
def backward(self,gys):
raise NotImplementedError()
继续用Add函数来完成上面计算图的计算:
class Add(Function): #用来计算的函数
def forward(self, x0,x1):
y = x0 + x1
return y
def add(x0,x1):
return Add()(x0,x1)
x0,x1 = Variable(np.array(1)),Variable(np.array(2))
y = add(x0, x1)
print(y.data)
我们在上述代码的基础上来进一步实现反向传播,以为例 ,所以其反向传播,就是把上游的导数原封不动的传递给下游,所以我们来实现Add类的反向传播。
class Add(Function):
def forward(self, x0,x1):
y = x0 + x1
return y
def backward(self,gy): #反向传播的实现
return gy,gy
同时我们修改一下Variable类中的反向传播的代码,让其支持多变量的输入输出:
class Variable:
def __init__(self,data):
if data is not None:
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None
def set_creator(self,func):
self.creator = func
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = [self.creator]
while funcs:
f = funcs.pop()
gys = [output.grad for output in f.outputs] #把f的所有的输出值的实例得到的上游的导数取出
gxs = f.backward(*gys) #对函数进行反向传播,求出对应的导数
if not isinstance(gxs,tuple):#如果gxs不是元组,就变成元组,配合下面zip的使用
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):#把x和gx一一对应取出,并进行类似广度优先遍历。
x.grad = gx
if x.creator is not None:
funcs.append(x.creator)
然后在以上代码基础上我们实现一个函数的反向传播的正向传播,然后进行验证:
class Square(Function): #实现 y = x * x函数
def forward(self, x):
y = x ** 2
return y
def backward(self, gy):
x = self.inputs[0].data
gx = gy *x*2
return gx
def square(x):
return Square()(x)
x0 = Variable(np.array(1.0))
x1 = Variable(np.array(3.0))
y = add(square(x1),square(x0))
y.backward()
print(x0.grad,x1.grad)
现在我们可以实现多个输入和多个输出的函数的自动微分了!
但是不要骄傲,现在代码还存在一个问题,如果进行这个计算的时候,通过我们的手算,x.grad应该是2,但是我们的代码却出现了错误
x = Variable(np.array(1.0))
y = add(x,x)
y.backward()
print(x.grad)
输出会是1,那么,我们看下原先的代码,发现在Variable类的backward函数内
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = [self.creator]
while funcs:
f = funcs.pop()
gys = [output.grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
x.grad = gx #这里出现差错
if x.creator is not None:
funcs.append(x.creator)
在计算中出现同一个变量的计算的时候,第二次计算会把第一次计算覆盖,而不是叠加。 我们可以通过如下代码修改
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = [self.creator]
while funcs:
f = funcs.pop()
gys = [output.grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None: #如果第一次算x的梯度就直接取值,超过第一次就用加法
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
funcs.append(x.creator)
再运行代码会得到正确的结果。
此时如果我们想用多次x变量来进行计算呢?如下
x = Variable(np.array(1.0))
y = add(x,x)
y.backward()
print(x.grad)
y = add(add(x,x),x)
y.backward()
print(x.grad)
得到结果是2,5,其实应该是2,3(第二次计算实际上上x + x + x = y),我们如果想重复利用x需要去清除他之前计算的梯度,去避免第一次计算带来的影响。在Variable类中加入以下函数:
def cleargrad(self): #清除当前变量的grad
self.grad = None
然后进行测试:
x = Variable(np.array(1.0))
y = add(x,x)
y.backward()
print(x.grad)
x.cleargrad()
y = add(add(x,x),x)
y.backward()
print(x.grad)
这样可以得到正确结果。
那么对于标量的自动微分,我们还有最后一个解决的问题,就是我们的代码是用深度优先遍历,因为pop()操作类似从栈中拿数据,看Variable函数的backward代码:
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = [self.creator]
while funcs:
f = funcs.pop() #这里每一次会拿最后一个
gys = [output.grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
funcs.append(x.creator)#这里把数加入末尾
很明显这很像是一个深度优先遍历的过程,但是我们希望代码是广度优先遍历的。
画一个简单的多元计算的计算图
这里我们希望先进行b = tanh(a)和 c = pow(a)的反向传播,再进行a = pow(x)对应的反向传播,但是,我们的代码会先进行一个b = tanh(a)或 c = pow(a)的反向传播,再进行a = pow(x)的反向传播,再进行b = tanh(a)或 c = pow(a)的另一个反向传播。
我们希望他变成一个符合广度优先遍历的,我们可以把同一层的Variable和Function实例看成同辈,高辈的实例先进行完反向传播,底层的才能进行。那么我们就用代码实现:
对Function和Variable类的修改如下
class Variable:
def __init__(self,data):
if data is not None:
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None
self.generation = 0 #设置generation参数来决定层数
def set_creator(self,func):
self.creator = func
self.generation = func.generation + 1 #每一个Variable实例的层数是上个计算Function实例在的层数加1
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = [] #建立一个空白列表
seen_set = set() #建立一个set,防止重复访问计算
def add_func(f):#建立一个函数用来把计算的Function实例存入,并把所有的运算按照辈分排序,保证先取出来的是辈份大的运算。
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)#放入第一个实例
while funcs:
f = funcs.pop()
gys = [output.grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator) #这里也要修改
def cleargrad(self):
self.grad = None
class Function:
def __call__(self,*inputs):
xs = [x.data for x in inputs]
ys = self.forward(*xs)
if not isinstance(ys,tuple):
ys = (ys,)
outputs = [Variable(as_array(y)) for y in ys]
self.generation = max([x.generation for x in inputs]) #Function对应的层数是多个Variable中最大的层数
for output in outputs:
output.set_creator(self)
self.inputs = inputs
self.outputs = outputs
return outputs if len(outputs) > 1 else outputs[0]
def forward(self,xs):
raise NotImplementedError()
def backward(self,gys):
raise NotImplementedError()
使用测试的代码来验证:
x = Variable(np.array(2.0))
a = square(x)
y = add(square(a),square(a))
y.backward()
print(x.grad)
得到结果是64,那么我们现在可以实现复杂函数的自动微分了。
7.内存优化和代码优化
当前我们的代码中存在循环引用,循环引用会让gc推迟进行,我们可以引用weakref来解决这个问题。
class Variable:
def __init__(self,data):
if data is not None:
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None
self.generation = 0
def set_creator(self,func):
self.creator = func
self.generation = func.generation + 1
def backward(self):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs] #修改
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
def cleargrad(self):
self.grad = None
class Function:
def __call__(self,*inputs):
xs = [x.data for x in inputs]
ys = self.forward(*xs)
if not isinstance(ys,tuple):
ys = (ys,)
outputs = [Variable(as_array(y)) for y in ys]
self.generation = max([x.generation for x in inputs])
for output in outputs:
output.set_creator(self)
self.inputs = inputs
self.outputs = [weakref.ref(output) for output in outputs] #修改
return outputs if len(outputs) > 1 else outputs[0]
def forward(self,xs):
raise NotImplementedError()
def backward(self,gys):
raise NotImplementedError()
在反向传播过程中。只有终端变量的梯度是需要保存的,其他的变量的梯度,这样我们可以进一步优化代码,只修改Variable的backward函数
def backward(self,retain_grad = False ): #使用retain_grad参数判断非终端梯度是否需要保留
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
if not retain_grad: #判断是否删除梯度
for y in f.outputs:
y().grad = None #y是weakref
有些时候只需要正向传播,不需要反向传播。这时候Function之中的__call__函数内,inputs和outputs的内容就不需要Function实例保留。我们可以用一个标志位来实现,为了实现以后代码的泛用性,这个标志位,我们采用面向对象的方式实现。
大家可以先了解下contextlib的用法:链接
新加入的函数和Function类的__call__方法 如下:
import contextlib
class Config: #标志位的类
enable_backprop = True
@contextlib.contextmanager
def using_config(name,value): #使用contextlib
old_value = getattr(Config, name)#
setattr(Config, name, value)
try:
yield
finally:
setattr(Config, name, old_value)
def no_gard():
return using_config('enable_backprop',False)
class Function:
def __call__(self,*inputs):
xs = [x.data for x in inputs]
ys = self.forward(*xs)
if not isinstance(ys,tuple):
ys = (ys,)
outputs = [Variable(as_array(y)) for y in ys]
self.generation = max([x.generation for x in inputs])
if Config.enable_backprop:#判断是否需要反向传播4
for output in outputs:
output.set_creator(self)
self.inputs = inputs
self.outputs = [weakref.ref(output) for output in outputs]
return outputs if len(outputs) > 1 else outputs[0]
下面是测试使用代码:
测试config类对Variable类的优化
Config.enable_backprop = True
x = Variable(np.array(2.0))
a = square(x)
y = add(square(a),square(a))
y.backward()
Config.enable_backprop = False
x = Variable(np.array(2.0))
a = square(x)
y = add(square(a),square(a))
测试Function的模式切换:
with no_gard():
x = Variable((np.array(2.0)))
y = square(x)
内存优化完成之后,我们希望变量变得更加易用,比如给每个变量取名,还有numpy库常用的x.shape,len(x)等用法,还有做加减法的时候使用 a + b 而不是add(a + b)。优化的代码如下:(这次是全部代码)
import numpy as np
import weakref
import contextlib
class Config:
enable_backprop = True
@contextlib.contextmanager
def using_config(name,value):
old_value = getattr(Config, name)
setattr(Config, name, value)
try:
yield
finally:
setattr(Config, name, old_value)
def no_gard():
return using_config('enable_backprop',False)
def as_array(x):
if np.isscalar(x):
return np.array(x)
return x
def as_variable(obj):#把对应的不是variable的数据改成variable,可以让Function内的运算,variable和ndarray一起运算
if isinstance(obj, Variable):
return obj
return Variable(obj)
class Variable:
def __init__(self,data,name = None): #加入name,进行命名
if data is not None:
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None
self.name = name #初始化name
self.generation = 0
def __len__(self):#实现len(x)操作
return len(self.data)
def __repr__(self):#自定义print输出的字符串,print(x)得到的结果
if self.data is None:
return 'variable(None)'
p = str(self.data).replace('\n', '\n'+' '*9)
return 'variable('+p+')'
def __mul__(self,other):#运算符 重载,可以用 x1 * x2这样运算来操作Variable实例了。
return mul(self,other)
def __add__(self,other):
return add(self,other)
def __rmul__(self,other):
return mul(self,other)
def __radd__(self,other):#因为我们想让variable类的实例和其他类的实例一起运算,然后add的重载只能是左边是variable类,万一variable在右边就会出错,所以再重载一个variable类可以在右边的类。其他右计算同理。
return add(self,other)
def __neg__(self):
return neg(self)
def __sub__(self,other):
return sub(self,other)
def __rsub__(self,other):
return rsub(self,other)
def __truediv__(self,other):
return div(self,other)
def __rtruediv__(self,other):
return rdiv(self,other)
def __pow__(self,other):
return pow(self, other)
def set_creator(self,func):
self.creator = func
self.generation = func.generation + 1
def backward(self,retain_grad = False ):
if self.grad is None:
self.grad = np.ones_like(self.data)
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
if not retain_grad:
for y in f.outputs:
y().grad = None
def cleargrad(self):
self.grad = None
@property #实现x.shape操作
def shape(self):
return self.data.shape
@property
def size(self): #实现x.size操作
return self.data.size
@property
def dtype(self): #同上
return self.data.dtype
@property
def ndim(self):#同上
return self.data.ndim
class Function:
def __call__(self,*inputs):
inputs = [as_variable(x) for x in inputs] #把全部参数变成Variable实例的类型,可以让ndarray也正确进入运算
xs = [x.data for x in inputs]
ys = self.forward(*xs)
if not isinstance(ys,tuple):
ys = (ys,)
outputs = [Variable(as_array(y)) for y in ys]
self.generation = max([x.generation for x in inputs])
if Config.enable_backprop:#判断是否需要反向传播4
for output in outputs:
output.set_creator(self)
self.inputs = inputs
self.outputs = [weakref.ref(output) for output in outputs]
return outputs if len(outputs) > 1 else outputs[0]
def forward(self,xs):
raise NotImplementedError()
def backward(self,gys):
raise NotImplementedError()
class Mul(Function): #类似加法的正向反向传播写的乘法
def forward(self, x0,x1):
return x0 * x1
def backward(self, gy):
x0,x1 = self.inputs[0].data,self.inputs[1].data
return gy * x1,gy * x0
def mul(x0,x1):
x1 = as_array(x1)
return Mul()(x0,x1)
class Div(Function):#类似加法的正向反向传播写的除法
def forward(self, x0,x1):
return x0 / x1
def backward(self, gy):
x0,x1 = self.inputs[0].data,self.inputs[1].data
gx0 = gy / x1
gx1 = gy * (-x0/x1**2)
return gx0,gx1
def div(x0,x1):
x1 = as_array(x1)
return Div()(x0,x1)
def rdiv(x0,x1):
x1 = as_array(x1)
return Div()(x1,x0)
class Square(Function):
def forward(self,x):
return x ** 2
def backward(self, gy):
x = self.inputs[0].data
gx = 2 * x * gy
return gx
def square(x):
f = Square()
return f(x)
class Exp(Function):
def forward(self,x):
return np.exp(x)
def backward(self, gy):
x = self.inputs[0].data
gx = np.exp(x) * gy
return gx
def exp(x):
f = Exp()
return f(x)
class Add(Function):
def forward(self, x0,x1):
y = x0 + x1
return y
def backward(self,gy):
return gy,gy
def add(x0, x1):
x1 = as_array(x1)
return Add()(x0, x1)
class Neg(Function): #类似加法的正向反向传播写的
def forward(self, x):
return -x
def backward(self,gy):
return -gy
def neg(x):
return Neg()(x)
class Sub(Function):#类似加法的正向反向传播写的
def forward(self, x0,x1):
y = x0 - x1
return y
def backward(self,gy):
return gy,-gy
def sub(x0, x1):
x1 = as_array(x1)
return Sub()(x0, x1)
def rsub(x0,x1):
x1 = as_array(x1)
return Sub()(x1,x0)
class Pow(Function):#类似加法的正向反向传播写的
def __init__(self,c):
self.c = c
def forward(self, x):
y = x ** self.c
return y
def backward(self, gy):
x = self.inputs[0].data
gx = gy * self.c * (x ** (self.c-1))
return gx
def pow(x,c):
return Pow(c)(x)
#测试ndarray可以和variable实例运算和运算符的重载,这下我们可以用算四则运算的放水来计算我们的Variable实例了
x = Variable(np.array(2.0))
y = x + np.array(3.0)
z = np.array(3.0) + y
print(y)
现在我们可以计算非常复杂的函数的梯度了。大家大可以尝试各种复杂的函数的梯度运算。
8.自动生成计算图
大家可以先安装graphviz工具,我们将用这个工具生成计算图,学习链接 这个我直接给代码了,和自动微分关系不大。把下面代码加入咱们之前的代码里。
import os
import subprocess
def _dot_var(v,verbose = False):
dot_var ='{}[label="{}",color=orange,style=filled]\n'
name = ' ' if v.name is None else v.name
if verbose and v.data is not None:
if v.name is not None:
name +=': '
name += str(v.shape) + ' '+str(v.dtype)
return dot_var.format(id(v), name)
def _dot_func(f):
dot_func = '{}[label="{}",color=lightblue,style=filled,shape = box]\n'
txt = dot_func.format(id(f),f.__class__.__name__)
dot_edge = '{}->{}\n'
for x in f.inputs:
txt += dot_edge.format(id(x), id(f))
for y in f.outputs:
txt += dot_edge.format(id(f), id(y()))
return txt
def get_dot_graph(output,verbose=True):
txt = ''
funcs = []
seen_set =set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
add_func(output.creator)
txt += _dot_var(output,verbose)
while funcs:
func = funcs.pop()
txt += _dot_func(func)
for x in func.inputs:
txt += _dot_var(x,verbose)
if x.creator is not None:
add_func(x.creator)
return 'digraph g{\n'+txt +'}'
def plot_dot_graph(output,verbose = True,to_file = 'graph.png'):
dot_graph = get_dot_graph(output,verbose)
tmp_dir= os.path.join(os.path.expanduser('~'),'Jiflow')
if not os.path.exists(tmp_dir):
os.mkdir(tmp_dir)
graph_path = os.path.join(tmp_dir,'tmp_graph.dot')
with open(graph_path,'w') as f:
f.write(dot_graph)
extension = os.path.splitext(to_file)[1][1:]
cmd = 'dot {} -T {} -o {}'.format(graph_path,extension,to_file)
subprocess.run(cmd,shell=True)
测试:
x = Variable(np.array(2.0))
x.name = 'x'
y = x + np.array(3.0)
y.name = 'y'
y.backward()
plot_dot_graph(y)
可以得到图:
9.实现高阶导数的前向和反向传播
我们当前可以计算标量的一阶导数运算,还不能进行高阶导数运算。我们接下来就是让我们的代码可以进行高阶导数运算。 先打个二阶导数计算的例子: 比如 y = sinx的计算图(sinx的计算完全可以根据前面加减乘除等类来自己去写):
接下来看其一次求导的计算图,很明显二次导数,就是cosx乘以一个常数1,那么把这个过程求出的gx再反向传播到x,就得到二次导数了。
我们只需要把Variable实例的grad也变成Variable实例即可实现对grad对应实例的反向传播。还要把函数计算的类都直接用Variable实例重载的运算符等计算,不用取出数来计算,修改后的代码:
修改后的Variable的backward函数
def backward(self,retain_grad = False ):
if self.grad is None:
self.grad = Variable(np.ones_like(self.data)) #这里改一下grad就变成Variable类的实例了
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs]
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
if not retain_grad:
for y in f.outputs:
y().grad = None
对加减乘除等运算同样需要改变:(不需要进行数据的运算了,只需要进行Variable实例的运算即可)
class Mul(Function):
def forward(self, x0,x1):
return x0 * x1
def backward(self, gy):
x0,x1 = self.inputs #修改这里
return gy * x1,gy * x0
再优化下内存,如果不需要反向传播的话,梯度的操作也就不需要了,这里用一个类似上面contextlib的方法实现:以下是Variable的backward函数
def backward(self,retain_grad = False,create_graph = False): #加入flag参数create_graph,如果不需要梯度
if self.grad is None:
self.grad = Variable(np.ones_like(self.data))
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs]
with using_config('enable_backprop', create_graph):#对是否进行反向传播,进行判断
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
if not retain_grad:
for y in f.outputs:
y().grad = None
测试代码:
x = Variable(np.array(2))
x.name = 'x'
y = x ** 4 - 2 * x ** 2
y.name = 'y'
y.backward(retain_grad = True,create_graph = True)
print(x.grad)
gx = x.grad
x.cleargrad() #为了避免第一次求导和第二次就到和加起来,求完第一次,在第二次求梯度前清一下x的梯度
gx.backward()
print(x.grad)
求出结果为24 ,44
10.进行多维数组(张量)运算
现在我们可以对标量进行运算,也可以对张量的每一个元素进行运算但是对于一些广播,改变形状,对矩阵的全部元素求和等并不能进行合适的反向传播运算,不能进行自动微分。 我们这里就去实现这些。
首先是改变形状的函数,只要保证运算过后函数的形状不变即可。
class Reshape(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape=x.shape
return x.reshape(self.shape)
def backward(self, gy):
return reshape(gy, self.x_shape)
def reshape(x,shape):
if x.shape == shape:
return as_variable(x)
return Reshape(shape)(x)
然后来测试:
x = Variable(np.array([[1,2,3],[4,5,6]]))
y = reshape(x, (6,))
y.backward(retain_grad= True)
print(x.grad)
结果为:variable([[1 1 1] [1 1 1]])
因为会根据输出的y,也就是 6X1矩阵来生成一个值是全1的矩阵来等于dy/dy,然后反向传播来变回正向传播x的形状。就得到这个结果了。
优化下代码,希望用numpy类似的方法来转换格式,就是y = x.reshape(要转换的格式) 在Variable类中加入以下函数:
def reshape(self,*shape):
if len(shape) == 1 and isinstance(shape[0], (tuple,list)):
shape = shape[0]
return reshape(self, shape)
看一下转置操作,转职操作的反向传播也是转置操作:
首先实现转置操作的反向传播:
然后在Variable实现类似ndarray的x.T进行转置的操作
@property
def T(self):
return transpose(self)
def transpose(self):
return transpose(self)
再来看看广播操作的实现,广播操作的反向传播就是sum函数,广播可以把(x1,x2) + b变成(x1 + b,x2 + b)其中b就被广播成了(b,b)这时候广播操作的时候b的梯度是加了两次的,所以广播操作的反向传播,就是把形状变成之前的,然后多出来的b这种梯度求出来就是(gb,gb)两次传播到gb上,把两个gb加在一起就行了。同理这种加法运算的反向传播就是广播操作。
这里可以了解下这种加法运算也就是numpy的sum函数的用法:链接
然后现在开始实现广播运算和加法运算的正向反向传播了,先实现sum操作和sum和广播的正向和反向传播操作:
class Sum(Function):
def __init__(self,axis,keepdims):
self.axis = axis
self.keepdims = keepdims
def forward(self, x):
self.x_shape = x.shape
y = x.sum(axis = self.axis,keepdims = self.keepdims)
return y
def backward(self, gy):
gy = reshape_sum_backward(gy,self.x_shape,self.axis,self.keepdims)
gx = broadcast_to(gy,self.x_shape)
return gx
def sum(x,axis = None,keepdims = False):
return Sum(axis,keepdims)(x)
class BroadcastTo(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape = x.shape
return np.broadcast_to(x,x.shape)
def backward(self, gy):
return sum_to(gy,self.x_shape)
def broadcast_to(x,shape):
if x.shape == shape:
return as_variable(x)
return BroadcastTo(shape)(x)
class SumTo(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape = x.shape
return sum_to1(x,self.shape)
def backward(self, gy):
return broadcast_to(gy,self.x_shape)
def sum_to(x,shape):
if x.shape == shape:
return as_variable(x)
return SumTo(shape)(x)
#辅助
def reshape_sum_backward(gy, x_shape, axis, keepdims):
"""Reshape gradient appropriately for dezero.functions.sum's backward.
Args:
gy (dezero.Variable): Gradient variable from the output by backprop.
x_shape (tuple): Shape used at sum function's forward.
axis (None or int or tuple of ints): Axis used at sum function's
forward.
keepdims (bool): Keepdims used at sum function's forward.
Returns:
dezero.Variable: Gradient variable which is reshaped appropriately
"""
ndim = len(x_shape)
tupled_axis = axis
if axis is None:
tupled_axis = None
elif not isinstance(axis, tuple):
tupled_axis = (axis,)
if not (ndim == 0 or tupled_axis is None or keepdims):
actual_axis = [a if a >= 0 else a + ndim for a in tupled_axis]
shape = list(gy.shape)
for a in sorted(actual_axis):
shape.insert(a, 1)
else:
shape = gy.shape
gy = gy.reshape(shape) # reshape
return gy
def sum_to1(x, shape):
"""Sum elements along axes to output an array of a given shape.
Args:
x (ndarray): Input array.
shape:
Returns:
ndarray: Output array of the shape.
"""
ndim = len(shape)
lead = x.ndim - ndim
lead_axis = tuple(range(lead))
axis = tuple([i + lead for i, sx in enumerate(shape) if sx == 1])
y = x.sum(lead_axis + axis, keepdims=True)
if lead > 0:
y = y.squeeze(lead_axis)
return y
在Variable类中加入sum函数去实现numpy的操作
def sum(self,axis = None,keepdims=False):
return sum(self,axis,keepdims)
在这里进行测试:
x = Variable(np.array([[1,2,3],[4,5,6]]))
y = sum(x,axis = 0)
y.backward()
print(y)
print(x.grad)
x = Variable(np.random.randn(2,3,4,5))
y = x.sum(keepdims=True)
print(y.shape)
然后对运算可能因为张量进行广播等进行的形状改动。所以我们需要修改运算的代码: 修改加减乘除的类:
class Add(Function):
def forward(self, x0,x1):
self.x0_shape,self.x1_shape = x0.shape,x1.shape #保存当前的格式
y = x0 + x1
return y
def backward(self,gy):
gx0 ,gx1 = gy,gy
if self.x0_shape != self.x1_shape: #如果格式不一样大,会进行改变
gx0 = sum_to(gx0, self.x0_shape)
gx1 = sum_to(gx1, self.x1_shape)
return gx0,gx1
class Sub(Function):
def forward(self, x0,x1):
y = x0 - x1
return y
def backward(self,gy):
if self.x0_shape != self.x1_shape:
gx0 = gy
gx1 = -gy
gx0 = sum_to(gx0, self.x0_shape)
gx1 = sum_to(gx1, self.x1_shape)
return gx0,gx1
class Div(Function):
def forward(self, x0,x1):
return x0 / x1
def backward(self, gy):
x0,x1 = self.inputs
gx0 = gy / x1
gx1 = gy * (-x0/x1**2)
if x0.shape != x1.shape:
gx0 = sum_to(gx0, x0.shape)
gx1 = sum_to(gx1, x1.shape)
return gx0,gx1
class Mul(Function):
def forward(self, x0,x1):
return x0 * x1
def backward(self, gy):
x0,x1 = self.inputs
gx0 = gy * x1
gx1 = gy * x0
if x0.shape != x1.shape:
gx0 = sum_to(gx0, x0.shape)
gx1 = sum_to(gx1, x1.shape)
return gx0,gx1
最后我们来实现矩阵乘积的运算:
比如x是a × b的矩阵,W是b × c的矩阵,x · W= b ,b是a × c的矩阵,从而从y开始反向传播的也是a × c矩阵,然后y向x传播方向是y和W相乘,y只有乘才能得到需要的梯度,同理y向w传播,只有乘反向传播的a × c的矩阵,才能得到b×c的矩阵。故可得如下的矩阵运算的反向传播函数。
class MatMul(Function):
def forward(self, x,W):
y = x.dot(W)
return y
def backward(self, gy):
x,W = self.inputs
gx = matmul(gy,W.T)
gW = matmul(x.T,gy)
return gx,gW
def matmul(x,W):
return MatMul()(x,W)
测试代码:
x = Variable(np.random.randn(2,3))
W = Variable(np.random.randn(3,4))
y = matmul(x, W)
y.backward()
print(x.grad.shape)
print(W.grad.shape)
可以看到得到的x.grad是2 × 3矩阵,W.grad是3 × 4矩阵
11.完整代码
import numpy as np
import weakref
import contextlib
class Config:
enable_backprop = True
@contextlib.contextmanager
def using_config(name,value):
old_value = getattr(Config, name)
setattr(Config, name, value)
try:
yield
finally:
setattr(Config, name, old_value)
def no_gard():
return using_config('enable_backprop',False)
def as_array(x):
if np.isscalar(x):
return np.array(x)
return x
def as_variable(obj):
if isinstance(obj, Variable):
return obj
return Variable(obj)
class Variable:
def __init__(self,data,name = None):
if data is not None:
if not isinstance(data,np.ndarray):
raise TypeError('{} is not suported'.format(type(data)))
self.data = data
self.grad = None
self.creator = None
self.name = name
self.generation = 0
def __len__(self):
return len(self.data)
def __repr__(self):
if self.data is None:
return 'variable(None)'
p = str(self.data).replace('\n', '\n'+' '*9)
return 'variable('+p+')'
def __mul__(self,other):
return mul(self,other)
def __add__(self,other):
return add(self,other)
def __rmul__(self,other):
return mul(self,other)
def __radd__(self,other):
return add(self,other)
def __neg__(self):
return neg(self)
def __sub__(self,other):
return sub(self,other)
def __rsub__(self,other):
return rsub(self,other)
def __truediv__(self,other):
return div(self,other)
def __rtruediv__(self,other):
return rdiv(self,other)
def __pow__(self,other):
return pow(self, other)
def set_creator(self,func):
self.creator = func
self.generation = func.generation + 1
def backward(self,retain_grad = False,create_graph = False): #加入flag参数create_graph,如果不需要梯度
if self.grad is None:
self.grad = Variable(np.ones_like(self.data))
funcs = []
seen_set = set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
funcs.sort(key=lambda x:x.generation)
add_func(self.creator)
while funcs:
f = funcs.pop()
gys = [output().grad for output in f.outputs]
with using_config('enable_backprop', create_graph):#对是否进行反向传播,进行判断
gxs = f.backward(*gys)
if not isinstance(gxs,tuple):
gxs = (gxs,)
for x,gx in zip(f.inputs,gxs):
if x.grad == None:
x.grad = gx
else:
x.grad = gx + x.grad
if x.creator is not None:
add_func(x.creator)
if not retain_grad:
for y in f.outputs:
y().grad = None
def cleargrad(self):
self.grad = None
def reshape(self,*shape):
if len(shape) == 1 and isinstance(shape[0], (tuple,list)):
shape = shape[0]
return reshape(self, shape)
@property
def shape(self):
return self.data.shape
@property
def size(self):
return self.data.size
@property
def dtype(self):
return self.data.dtype
@property
def ndim(self):
return self.data.ndim
@property
def T(self):
return transpose(self)
def transpose(self):
return transpose(self)
def sum(self,axis = None,keepdims=False):
return sum(self,axis,keepdims)
class Function:
def __call__(self,*inputs):
inputs = [as_variable(x) for x in inputs]
xs = [x.data for x in inputs]
ys = self.forward(*xs)
if not isinstance(ys,tuple):
ys = (ys,)
outputs = [Variable(as_array(y)) for y in ys]
self.generation = max([x.generation for x in inputs])
if Config.enable_backprop:
for output in outputs:
output.set_creator(self)
self.inputs = inputs
self.outputs = [weakref.ref(output) for output in outputs]
return outputs if len(outputs) > 1 else outputs[0]
def forward(self,xs):
raise NotImplementedError()
def backward(self,gys):
raise NotImplementedError()
class Mul(Function):
def forward(self, x0,x1):
return x0 * x1
def backward(self, gy):
x0,x1 = self.inputs
gx0 = gy * x1
gx1 = gy * x0
if x0.shape != x1.shape:
gx0 = sum_to(gx0, x0.shape)
gx1 = sum_to(gx1, x1.shape)
return gx0,gx1
def mul(x0,x1):
x1 = as_array(x1)
return Mul()(x0,x1)
class Div(Function):
def forward(self, x0,x1):
return x0 / x1
def backward(self, gy):
x0,x1 = self.inputs
gx0 = gy / x1
gx1 = gy * (-x0/x1**2)
if x0.shape != x1.shape:
gx0 = sum_to(gx0, x0.shape)
gx1 = sum_to(gx1, x1.shape)
return gx0,gx1
def div(x0,x1):
x1 = as_array(x1)
return Div()(x0,x1)
def rdiv(x0,x1):
x1 = as_array(x1)
return Div()(x1,x0)
class Square(Function):
def forward(self,x):
return x ** 2
def backward(self, gy):
x, = self.inputs
gx = 2 * x * gy
return gx
def square(x):
f = Square()
return f(x)
class Exp(Function):
def forward(self,x):
return np.exp(x)
def backward(self, gy):
x, = self.inputs
gx = np.exp(x) * gy
return gx
def exp(x):
f = Exp()
return f(x)
class Add(Function):
def forward(self, x0,x1):
self.x0_shape,self.x1_shape = x0.shape,x1.shape
y = x0 + x1
return y
def backward(self,gy):
gx0 ,gx1 = gy,gy
if self.x0_shape != self.x1_shape:
gx0 = sum_to(gx0, self.x0_shape)
gx1 = sum_to(gx1, self.x1_shape)
return gx0,gx1
def add(x0, x1):
x1 = as_array(x1)
return Add()(x0, x1)
class Neg(Function):
def forward(self, x):
return -x
def backward(self,gy):
return -gy
def neg(x):
return Neg()(x)
class Sub(Function):
def forward(self, x0,x1):
y = x0 - x1
return y
def backward(self,gy):
if self.x0_shape != self.x1_shape:
gx0 = gy
gx1 = -gy
gx0 = sum_to(gx0, self.x0_shape)
gx1 = sum_to(gx1, self.x1_shape)
return gx0,gx1
def sub(x0, x1):
x1 = as_array(x1)
return Sub()(x0, x1)
def rsub(x0,x1):
x1 = as_array(x1)
return Sub()(x1,x0)
class Pow(Function):
def __init__(self,c):
self.c = c
def forward(self, x):
y = x ** self.c
return y
def backward(self, gy):
x, = self.inputs
gx = gy * self.c * (x ** (self.c-1))
return gx
def pow(x,c):
return Pow(c)(x)
class Reshape(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape=x.shape
return x.reshape(self.shape)
def backward(self, gy):
return reshape(gy, self.x_shape)
def reshape(x,shape):
if x.shape == shape:
return as_variable(x)
return Reshape(shape)(x)
class Transpose(Function):
def forward(self, x):
return np.transpose(x)
def backward(self, gy):
return transpose(gy)
def transpose(x):
return Transpose()(x)
class Sum(Function):
def __init__(self,axis,keepdims):
self.axis = axis
self.keepdims = keepdims
def forward(self, x):
self.x_shape = x.shape
y = x.sum(axis = self.axis,keepdims = self.keepdims)
return y
def backward(self, gy):
gy = reshape_sum_backward(gy,self.x_shape,self.axis,self.keepdims)
gx = broadcast_to(gy,self.x_shape)
return gx
def sum(x,axis = None,keepdims = False):
return Sum(axis,keepdims)(x)
class BroadcastTo(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape = x.shape
return np.broadcast_to(x,x.shape)
def backward(self, gy):
return sum_to(gy,self.x_shape)
def broadcast_to(x,shape):
if x.shape == shape:
return as_variable(x)
return BroadcastTo(shape)(x)
class SumTo(Function):
def __init__(self,shape):
self.shape = shape
def forward(self, x):
self.x_shape = x.shape
return sum_to1(x,self.shape)
def backward(self, gy):
return broadcast_to(gy,self.x_shape)
def sum_to(x,shape):
if x.shape == shape:
return as_variable(x)
return SumTo(shape)(x)
class MatMul(Function):
def forward(self, x,W):
y = x.dot(W)
return y
def backward(self, gy):
x,W = self.inputs
gx = matmul(gy,W.T)
gW = matmul(x.T,gy)
return gx,gW
def matmul(x,W):
return MatMul()(x,W)
#辅助
def reshape_sum_backward(gy, x_shape, axis, keepdims):
"""Reshape gradient appropriately for dezero.functions.sum's backward.
Args:
gy (dezero.Variable): Gradient variable from the output by backprop.
x_shape (tuple): Shape used at sum function's forward.
axis (None or int or tuple of ints): Axis used at sum function's
forward.
keepdims (bool): Keepdims used at sum function's forward.
Returns:
dezero.Variable: Gradient variable which is reshaped appropriately
"""
ndim = len(x_shape)
tupled_axis = axis
if axis is None:
tupled_axis = None
elif not isinstance(axis, tuple):
tupled_axis = (axis,)
if not (ndim == 0 or tupled_axis is None or keepdims):
actual_axis = [a if a >= 0 else a + ndim for a in tupled_axis]
shape = list(gy.shape)
for a in sorted(actual_axis):
shape.insert(a, 1)
else:
shape = gy.shape
gy = gy.reshape(shape) # reshape
return gy
def sum_to1(x, shape):
"""Sum elements along axes to output an array of a given shape.
Args:
x (ndarray): Input array.
shape:
Returns:
ndarray: Output array of the shape.
"""
ndim = len(shape)
lead = x.ndim - ndim
lead_axis = tuple(range(lead))
axis = tuple([i + lead for i, sx in enumerate(shape) if sx == 1])
y = x.sum(lead_axis + axis, keepdims=True)
if lead > 0:
y = y.squeeze(lead_axis)
return y
# 画图
import os
import subprocess
def _dot_var(v,verbose = False):
dot_var ='{}[label="{}",color=orange,style=filled]\n'
name = ' ' if v.name is None else v.name
if verbose and v.data is not None:
if v.name is not None:
name +=': '
name += str(v.shape) + ' '+str(v.dtype)
return dot_var.format(id(v), name)
def _dot_func(f):
dot_func = '{}[label="{}",color=lightblue,style=filled,shape = box]\n'
txt = dot_func.format(id(f),f.__class__.__name__)
dot_edge = '{}->{}\n'
for x in f.inputs:
txt += dot_edge.format(id(x), id(f))
for y in f.outputs:
txt += dot_edge.format(id(f), id(y()))
return txt
def get_dot_graph(output,verbose=True):
txt = ''
funcs = []
seen_set =set()
def add_func(f):
if f not in seen_set:
funcs.append(f)
seen_set.add(f)
add_func(output.creator)
txt += _dot_var(output,verbose)
while funcs:
func = funcs.pop()
txt += _dot_func(func)
for x in func.inputs:
txt += _dot_var(x,verbose)
if x.creator is not None:
add_func(x.creator)
return 'digraph g{\n'+txt +'}'
def plot_dot_graph(output,verbose = True,to_file = 'graph.png'):
dot_graph = get_dot_graph(output,verbose)
tmp_dir= os.path.join(os.path.expanduser('~'),'Jiflow')
if not os.path.exists(tmp_dir):
os.mkdir(tmp_dir)
graph_path = os.path.join(tmp_dir,'tmp_graph.dot')
with open(graph_path,'w') as f:
f.write(dot_graph)
extension = os.path.splitext(to_file)[1][1:]
cmd = 'dot {} -T {} -o {}'.format(graph_path,extension,to_file)
subprocess.run(cmd,shell=True)
# 测试
x = Variable(np.random.randn(2,3))
W = Variable(np.random.randn(3,4))
y = matmul(x, W)
y.backward()
print(x.grad.shape)
print(W.grad.shape)
12.参考
[1] 深度学习入门自制框架 斋藤康毅(建议大家买一本)