Algorithms for Reinforcement Learning, my sleek book was published by Morgan & Claypool in July 2010.Download the most recent version in pdf (last update: July 8, 2017), or download the original from the publisher's webpage (if you have access). Or, buy a printed copy from Amazon.com for ca. USD 35.00,Amazon.ca for ca. CDN$ 42.02, or from Amazon.co.uk for GBP18.99. Faculty: write to info@morganclaypool.com to request your desk copy today! New! A Japanese translation by Sotetsu Koyamada is ready. The translation has a short supplementary material about the equivalence of the forward and backward views of TD lambda (by Dr. Koyama) and also on deep RL (by Sotetsu Koyamada). Amazon Asia link, Kyoritsu pub, errata. |
Preface
[/td][td]ix
[/td][/tr] [tr][td]Acknowledgments
[/td][td]xiii
[/td][/tr] [tr][td]1 Markov Decision Processes
[/td][td]1
[/td][/tr] [tr][td]1.1 Preliminaries
[/td][td]1
[/td][/tr] [tr][td]1.2 Markov Decision Processes
[/td][td]1
[/td][/tr] [tr][td]1.3 Value functions
[/td][td]6
[/td][/tr] [tr][td]1.4 Dynamic programming algorithms for solving MDPs
[/td][td]10
[/td][/tr] [tr][td]2 Value Prediction Problems
[/td][td]11
[/td][/tr] [tr][td]2.1 Temporal difference learning in finite state spaces
[/td][td]11
[/td][/tr] [tr][td]2.1.1 Tabular TD(0)
[/td][td]11
[/td][/tr] [tr][td]2.1.2 Every-visit Monte-Carlo
[/td][td]14
[/td][/tr] [tr][td]2.1.3 TD(lambda): Unifying Monte-Carlo and TD(0)
[/td][td]16
[/td][/tr] [tr][td]2.2 Algorithms for large state spaces
[/td][td]18
[/td][/tr] [tr][td]2.2.1 TD(lambda) with function approximation
[/td][td]22
[/td][/tr] [tr][td]2.2.2 Gradient temporal difference learning
[/td][td]25
[/td][/tr] [tr][td]2.2.3 Least-squares methods
[/td][td]27
[/td][/tr] [tr][td]2.2.4 The choice of the function space
[/td][td]33
[/td][/tr] [tr][td]3 Control
[/td][td]37
[/td][/tr] [tr][td]3.1 A catalog of learning problems
[/td][td]37
[/td][/tr] [tr][td]3.2 Closed-loop interactive learning
[/td][td]38
[/td][/tr] [tr][td]3.2.1 Online learning in bandits
[/td][td]38
[/td][/tr] [tr][td]3.2.2 Active learning in bandits
[/td][td]40
[/td][/tr] [tr][td]3.2.3 Active learning in Markov Decision Processes
[/td][td]41
[/td][/tr] [tr][td]3.2.4 Online learning in Markov Decision Processes
[/td][td]42
[/td][/tr] [tr][td]3.3 Direct methods
[/td][td]47
[/td][/tr] [tr][td]3.3.1 Q-learning in finite MDPs
[/td][td]47
[/td][/tr] [tr][td]3.3.2 Q-learning with function approximation
[/td][td]49
[/td][/tr] [tr][td]3.4 Actor-critic methods
[/td][td]52
[/td][/tr] [tr][td]3.4.1 Implementing a critic
[/td][td]54
[/td][/tr] [tr][td]3.4.2 Implementing an actor
[/td][td]56
[/td][/tr] [tr][td]4 For Further Exploration
[/td][td]63
[/td][/tr] [tr][td]4.1 Further reading
[/td][td]63
[/td][/tr] [tr][td]4.2 Applications
[/td][td]63
[/td][/tr] [tr][td]4.3 Software
[/td][td]64
[/td][/tr] [tr][td]A The Theory of Discounted Markovian Decision Processes
[/td][td]65
[/td][/tr] [tr][td]A.1 Contractions and Banacha
77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs
2025-02-13 11:03:14
复古怀旧电影风格温暖色调索尼Sony S-Log3视频调色LUT预设ROMAN HENSE – LUTs 24 for Sony S-Log3
2025-02-13 11:01:09
JUAN MELARA – P6K2Alexa PowerGrade AND LUTs V2 GEN 5
2025-02-13 10:58:24
469组终极照片调色LR预设视频LUT调色预设合集包 TheLutBay – The Ultimate Bundle
2025-02-13 10:56:32
诺兰《奥本海默》紧迫感幽闭恐惧症高级复古电影胶片风深黑色调后期色彩分级LUT预设 Tropic Colour – OPPENHEIMER LOOKS
2025-02-13 10:53:58
3DsMax建模插件集合:rapidTools v1.14+使用教程
2020-07-06 17:44:38
Proko-人体解剖高级付费版(中文字幕)256课
2020-12-21 18:34:01
VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计
2020-07-21 17:18:14
小武拉莫日系摄影后期第二期中文视频教程
2021-12-10 14:26:14
Mod Portfolio 3477506 画册模板 时尚杂志画册模版
2020-07-13 10:43:06
小武拉莫日系摄影后期第二期中文视频教程
2021-12-10 14:26:14
VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计
2020-07-21 17:18:14
MasterClass 大师班课程84套合集+中文字幕+持续更新+赠品会员
2021-01-26 16:03:27
加特林机枪模型 加特林机关枪 Minigun Hi-Poly
2019-07-31 11:06:07
日月星辰矢量图 星辰插画 星座矢量插画 宇宙空间矢量素材 Space set 3760063
2019-08-12 15:38:06
评论(0)