Hierarchical reinforcement learning 分层强化学习a novel deep recur

straw architecture :

a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting.

优点:

ATARI games 表现好
applied on any sequence data
applied on text prediction task

Hierarchical reinforcement learning

介绍

定义

分层强化学习是将复杂的强化学习问题分解成一些容易解决的子问题( sub-problem) ，通过分别解决这些子问题，最终解决原本的强化学问题的方法。

类别

基于选项(option) 的强化学习
基于分层抽象机( hierarchical of abstract machines) 的分层强化学习
基于MaxQ 函数分解( MaxQ value function decomposition) 的分层强化学习
端到端的( end to end) 的分层强化学习

基于选项的分层强化学习

option: 是一种对动作的抽象 (类似ADT)

$I$

$\pi$ ：policy， S×A->[0,1]

$β$

$μ$ :policy， S×O->[0,1]

$O$ : 表示所有option 的集合

paper: ahierarchical deep reinforcement learning

有一类比较困难的强化学习问题，其环境反馈是sparse和delayed的。这里的解决方法是构造一个两个层级的算法。这很符合人类完成一个复杂任务的模式，遇到一个复杂任务的时候，我们会把它拆解成一系列的小目标，然后逐个去实现这些小目标。