设计星素材分享平台 Algorithms for Reinforcement Learning

收藏
50积分下载

素材介绍

Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning, my sleek book was published by Morgan & Claypool in July 2010.Download the most recent version in pdf (last update: July 8, 2017), or download the original from the publisher's webpage (if you have access). Or, buy a printed copy from Amazon.com for ca. USD 35.00,Amazon.ca for ca. CDN$ 42.02, or from Amazon.co.uk for GBP18.99. Faculty: write to info@morganclaypool.com to request your desk copy today! New! A Japanese translation by Sotetsu Koyamada is ready. The translation has a short supplementary material about the equivalence of the forward and backward views of TD lambda (by Dr. Koyama) and also on deep RL (by Sotetsu Koyamada). Amazon Asia link, Kyoritsu pub, errata.

Why this book?There exist a good number of really great books on Reinforcement Learning. So why a new book? I have to confess: The book arose from selfish reasons: I wanted a short book, which nevertheless contained the major ideas underlying state-of-the-art RL algorithms, a discussion of their relative strengths and weaknesses, with hints on what is known (and not known, but would be good to know) about these algorithms. If I succeeded, time will tell. Or, you can, by sending me an e-mail!AbstractReinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.Table of contents

Preface

[/td][td]

[/td][/tr] [tr][td]

Acknowledgments

[/td][td]

xiii

[/td][/tr] [tr][td]

1 Markov Decision Processes

[/td][td]

[/td][/tr] [tr][td]

1.1 Preliminaries

[/td][td]

[/td][/tr] [tr][td]

1.2 Markov Decision Processes

[/td][td]

[/td][/tr] [tr][td]

1.3 Value functions

[/td][td]

[/td][/tr] [tr][td]

1.4 Dynamic programming algorithms for solving MDPs

[/td][td]

[/td][/tr] [tr][td]

2 Value Prediction Problems

[/td][td]

[/td][/tr] [tr][td]

2.1 Temporal difference learning in finite state spaces

[/td][td]

[/td][/tr] [tr][td]

2.1.1 Tabular TD(0)

[/td][td]

[/td][/tr] [tr][td]

2.1.2 Every-visit Monte-Carlo

[/td][td]

[/td][/tr] [tr][td]

2.1.3 TD(lambda): Unifying Monte-Carlo and TD(0)

[/td][td]

[/td][/tr] [tr][td]

2.2 Algorithms for large state spaces

[/td][td]

[/td][/tr] [tr][td]

2.2.1 TD(lambda) with function approximation

[/td][td]

[/td][/tr] [tr][td]

2.2.2 Gradient temporal difference learning

[/td][td]

[/td][/tr] [tr][td]

2.2.3 Least-squares methods

[/td][td]

[/td][/tr] [tr][td]

2.2.4 The choice of the function space

[/td][td]

[/td][/tr] [tr][td]

3 Control

[/td][td]

[/td][/tr] [tr][td]

3.1 A catalog of learning problems

[/td][td]

[/td][/tr] [tr][td]

3.2 Closed-loop interactive learning

[/td][td]

[/td][/tr] [tr][td]

3.2.1 Online learning in bandits

[/td][td]

[/td][/tr] [tr][td]

3.2.2 Active learning in bandits

[/td][td]

[/td][/tr] [tr][td]

3.2.3 Active learning in Markov Decision Processes

[/td][td]

[/td][/tr] [tr][td]

3.2.4 Online learning in Markov Decision Processes

[/td][td]

[/td][/tr] [tr][td]

3.3 Direct methods

[/td][td]

[/td][/tr] [tr][td]

3.3.1 Q-learning in finite MDPs

[/td][td]

[/td][/tr] [tr][td]

3.3.2 Q-learning with function approximation

[/td][td]

[/td][/tr] [tr][td]

3.4 Actor-critic methods

[/td][td]

[/td][/tr] [tr][td]

3.4.1 Implementing a critic

[/td][td]

[/td][/tr] [tr][td]

3.4.2 Implementing an actor

[/td][td]

[/td][/tr] [tr][td]

4 For Further Exploration

[/td][td]

[/td][/tr] [tr][td]

4.1 Further reading

[/td][td]

[/td][/tr] [tr][td]

4.2 Applications

[/td][td]

[/td][/tr] [tr][td]

4.3 Software

[/td][td]

[/td][/tr] [tr][td]

A The Theory of Discounted Markovian Decision Processes

[/td][td]

[/td][/tr] [tr][td]

A.1 Contractions and Banacha

wolves

166
14615887

DJordanMedia The Complete Editing Course Bundle Pack
- 507
- 0
- 0
- 0
Luke Stackpoole – Photography Masterclass – Master The Art Of Photography
- 754
- 0
- 0
- 0
77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs
- 785
- 0
- 0
- 0