论文研读（DPG）Deterministic Policy Gradient Algorithms

Abstract&Conclusion

DPG是（stochastic）policy gradient的变体，使用的是deterministic的target policy！

值得注意的是，DPG有一个很吸引人的形式：他是动作值函数的平均梯度

it is the expected gradient of the action-value function.

导致在连续动作任务上，表现效果远优于stochastic policy gradient.