目录第一章绪论························································································11
1研究背景及意义·····································································11
2研究现状················································································21
3论文的主要工作及组织结构····················································3第二章背景知识介绍············································································52
1马尔科夫决策过程····································································52
2基于模型的动态规划方法··························································62
3基于蒙特卡罗方法的理论·························································62
4基于时间差分的强化学习方法··················································72
1SARSA学习··········································································82