Algorithms for Reinforcement Learning

  • 705
  • 0
  • 0
  • 0
wolves-头像
Algorithms for Reinforcement Learning
收藏
  • Algorithms for Reinforcement Learning-缩略图
  • Algorithms for Reinforcement Learning-缩略图
  • 举报
  • 点赞
  • 0
  • 分享

素材介绍

Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning, my sleek book was published by Morgan & Claypool in July 2010.Download the most recent version in pdf (last update: July 8, 2017), or download the original from the publisher's webpage (if you have access). Or, buy a printed copy from Amazon.com for ca. USD 35.00,Amazon.ca for ca. CDN$ 42.02, or from Amazon.co.uk for GBP18.99. Faculty: write to info@morganclaypool.com to request your desk copy today! New! A Japanese translation by Sotetsu Koyamada is ready. The translation has a short supplementary material about the equivalence of the forward and backward views of TD lambda (by Dr. Koyama) and also on deep RL (by Sotetsu Koyamada). Amazon Asia link, Kyoritsu pub, errata.

Why this book?There exist a good number of really great books on Reinforcement Learning. So why a new book? I have to confess: The book arose from selfish reasons: I wanted a short book, which nevertheless contained the major ideas underlying state-of-the-art RL algorithms, a discussion of their relative strengths and weaknesses, with hints on what is known (and not known, but would be good to know) about these algorithms. If I succeeded, time will tell. Or, you can, by sending me an e-mail!AbstractReinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.Table of contents

Preface

[/td][td]

ix

[/td][/tr] [tr][td]

Acknowledgments

[/td][td]

xiii

[/td][/tr] [tr][td]

1 Markov Decision Processes

[/td][td]

1

[/td][/tr] [tr][td]

1.1 Preliminaries

[/td][td]

1

[/td][/tr] [tr][td]

1.2 Markov Decision Processes

[/td][td]

1

[/td][/tr] [tr][td]

1.3 Value functions

[/td][td]

6

[/td][/tr] [tr][td]

1.4 Dynamic programming algorithms for solving MDPs

[/td][td]

10

[/td][/tr] [tr][td]

2 Value Prediction Problems

[/td][td]

11

[/td][/tr] [tr][td]

2.1 Temporal difference learning in finite state spaces

[/td][td]

11

[/td][/tr] [tr][td]

2.1.1 Tabular TD(0)

[/td][td]

11

[/td][/tr] [tr][td]

2.1.2 Every-visit Monte-Carlo

[/td][td]

14

[/td][/tr] [tr][td]

2.1.3 TD(lambda): Unifying Monte-Carlo and TD(0)

[/td][td]

16

[/td][/tr] [tr][td]

2.2 Algorithms for large state spaces

[/td][td]

18

[/td][/tr] [tr][td]

2.2.1 TD(lambda) with function approximation

[/td][td]

22

[/td][/tr] [tr][td]

2.2.2 Gradient temporal difference learning

[/td][td]

25

[/td][/tr] [tr][td]

2.2.3 Least-squares methods

[/td][td]

27

[/td][/tr] [tr][td]

2.2.4 The choice of the function space

[/td][td]

33

[/td][/tr] [tr][td]

3 Control

[/td][td]

37

[/td][/tr] [tr][td]

3.1 A catalog of learning problems

[/td][td]

37

[/td][/tr] [tr][td]

3.2 Closed-loop interactive learning

[/td][td]

38

[/td][/tr] [tr][td]

3.2.1 Online learning in bandits

[/td][td]

38

[/td][/tr] [tr][td]

3.2.2 Active learning in bandits

[/td][td]

40

[/td][/tr] [tr][td]

3.2.3 Active learning in Markov Decision Processes

[/td][td]

41

[/td][/tr] [tr][td]

3.2.4 Online learning in Markov Decision Processes

[/td][td]

42

[/td][/tr] [tr][td]

3.3 Direct methods

[/td][td]

47

[/td][/tr] [tr][td]

3.3.1 Q-learning in finite MDPs

[/td][td]

47

[/td][/tr] [tr][td]

3.3.2 Q-learning with function approximation

[/td][td]

49

[/td][/tr] [tr][td]

3.4 Actor-critic methods

[/td][td]

52

[/td][/tr] [tr][td]

3.4.1 Implementing a critic

[/td][td]

54

[/td][/tr] [tr][td]

3.4.2 Implementing an actor

[/td][td]

56

[/td][/tr] [tr][td]

4 For Further Exploration

[/td][td]

63

[/td][/tr] [tr][td]

4.1 Further reading

[/td][td]

63

[/td][/tr] [tr][td]

4.2 Applications

[/td][td]

63

[/td][/tr] [tr][td]

4.3 Software

[/td][td]

64

[/td][/tr] [tr][td]

A The Theory of Discounted Markovian Decision Processes

[/td][td]

65

[/td][/tr] [tr][td]

A.1 Contractions and Banacha

wolves-头像
  • 166
  • 12380636
  • 77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs
    77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs
    • 192
    • 0
    • 0
    • 0
  • 复古怀旧电影风格温暖色调索尼Sony S-Log3视频调色LUT预设ROMAN HENSE – LUTs 24 for Sony S-Log3
    复古怀旧电影风格温暖色调索尼Sony S-Log3视频调色LUT预设ROMAN HENSE – LUTs 24 for Sony S-Log3
    • 214
    • 0
    • 0
    • 0
  • JUAN MELARA – P6K2Alexa PowerGrade AND LUTs V2 GEN 5
    JUAN MELARA – P6K2Alexa PowerGrade AND LUTs V2 GEN 5
    • 163
    • 0
    • 0
    • 0

评论(0)

  • 热评
  • 所有评论
还没有评论哦~
还没有评论哦~

关键词

  • td
  • 近期更新
  • 热评推荐
  • 热门点击
77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs

77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs

2025-02-13 11:03:14

复古怀旧电影风格温暖色调索尼Sony S-Log3视频调色LUT预设ROMAN HENSE – LUTs 24 for Sony S-Log3

复古怀旧电影风格温暖色调索尼Sony S-Log3视频调色LUT预设ROMAN HENSE – LUTs 24 for Sony S-Log3

2025-02-13 11:01:09

JUAN MELARA – P6K2Alexa PowerGrade AND LUTs V2 GEN 5

JUAN MELARA – P6K2Alexa PowerGrade AND LUTs V2 GEN 5

2025-02-13 10:58:24

469组终极照片调色LR预设视频LUT调色预设合集包 TheLutBay – The Ultimate Bundle

469组终极照片调色LR预设视频LUT调色预设合集包 TheLutBay – The Ultimate Bundle

2025-02-13 10:56:32

诺兰《奥本海默》紧迫感幽闭恐惧症高级复古电影胶片风深黑色调后期色彩分级LUT预设 Tropic Colour – OPPENHEIMER LOOKS

诺兰《奥本海默》紧迫感幽闭恐惧症高级复古电影胶片风深黑色调后期色彩分级LUT预设 Tropic Colour – OPPENHEIMER LOOKS

2025-02-13 10:53:58

3DsMax建模插件集合:rapidTools v1.14+使用教程

3DsMax建模插件集合:rapidTools v1.14+使用教程

2020-07-06 17:44:38

Proko-人体解剖高级付费版(中文字幕)256课

Proko-人体解剖高级付费版(中文字幕)256课

2020-12-21 18:34:01

VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计

VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计

2020-07-21 17:18:14

小武拉莫日系摄影后期第二期中文视频教程

小武拉莫日系摄影后期第二期中文视频教程

2021-12-10 14:26:14

Mod Portfolio 3477506 画册模板 时尚杂志画册模版

Mod Portfolio 3477506 画册模板 时尚杂志画册模版

2020-07-13 10:43:06

小武拉莫日系摄影后期第二期中文视频教程

小武拉莫日系摄影后期第二期中文视频教程

2021-12-10 14:26:14

VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计

VitaliStore - All Design Bundle Papercraft Sculptures Design 动物纸模模型 纸模型雕塑设计

2020-07-21 17:18:14

MasterClass 大师班课程84套合集+中文字幕+持续更新+赠品会员

MasterClass 大师班课程84套合集+中文字幕+持续更新+赠品会员

2021-01-26 16:03:27

加特林机枪模型 加特林机关枪 Minigun Hi-Poly

加特林机枪模型 加特林机关枪 Minigun Hi-Poly

2019-07-31 11:06:07

日月星辰矢量图 星辰插画 星座矢量插画 宇宙空间矢量素材 Space set 3760063

日月星辰矢量图 星辰插画 星座矢量插画 宇宙空间矢量素材 Space set 3760063

2019-08-12 15:38:06

标签云

  • td

相关资源/猜你喜欢