论文研读 (A3C) Asynchronous Methods for Deep Reinforcement Learning

2019-11-04 1,144 阅读1分钟

Abstract & Conclution

Abstract:

没有提出新的算法，提出的是新的算法框架
将四种算法都运用Asynchronous Methods的框架里,效果很好，特别是A3C (asynchronous advantage actor- critic)
A3C对于连续动作控制问题非常有成效，远远甩开DQN
平行多线程训练，只用了一个多核的CPU而没有用GPU,节省了计算资源(DQN用的是GPU)

Conclution:

experience replay 的缺点:

a, 需要大量的计算资源以及存储记忆资源
b,只能用于 off-policy 算法

Asynchronous Methods框架可以适用在RL的各种算法上：on/off-policy, value-base/policy-base, DT/CT
有experience replay的优点：

a,降低数据相关性，让数据变得stationary
很date effcient

experience replay + Asynchronous Methods框架可以大幅度提高date effcient
平行多线程训练，只用了一个多核的CPU而没有用GPU,节省了计算资源(DQN用的是GPU)

3这种超强的date effcient性能可以很好的应用在TORCS 问题 where interactingwith the environment is more expensive than updating the model for the architecture we used.

A3C算法可以通过改变advantage function得到许多变体算法
对于value-base算法，Asynchronous Methods框架可以reducing overestimation bias of Q-values

1.introduction

已经全部囊括在了abstract 和conclution中啦~

2.Related Work

讲了点前生今世，不是很关键的内容

3.Reinforcement Learning Background

见sutton书论文部分帮我粗略复习了一遍sutton书 hhh

[★]4.Asynchronous RL Framework