论文研读(DPG)Deterministic Policy Gradient Algorithms

800 阅读1分钟

论文研读(DPG)Deterministic Policy Gradient Algorithms

Abstract&Conclusion

DPG是(stochastic)policy gradient的变体,使用的是deterministic的target policy

值得注意的是,DPG有一个很吸引人的形式:他是动作值函数的平均梯度

it is the expected gradient of the action-value function.

导致在连续动作任务上,表现效果远优于stochastic policy gradient.