Deep
Reinforcement Learning
By: Hyeong In Choi (ÃÖÇüÀÎ)
Venue: 129-104
Time: 7:30
p.m. Fridays
Lecture
1 (April 1, 2016)
-
Reinforcement Learning: What and Why
-
AlphaGo: Design and how
it works
-
Fundamental of Markov Decision Process (MDP)
Lecture
2 (April 8, 2016)
-
Bellman Equation
-
Value Iteration
-
Dynamic Programming
-
Monte Carlo
-
TD(¥ë)
Lecture
3 (April 22, 2016)
- Stochastic
Approximation Algorithm: Examples and Basic Theory
Lecture
4 (April 29, 2016)
-
Forward and Backward Views of TD(¥ë)
-
Model–Free Control
Policy Iteration Theorem
Greedy Policy
¥å-Greedy Policy
-
SARSA(¥ë)
Lecture
5 (May 13, 2016)
-
(Review) Looking at MDP from Bayesian Network
Viewpoint
-
Q-learning
Lecture
6 (May 20, 2016)
-
Function Approximation
State Value Functions and
Action Value Functions
-
On-Policy Learning and Off-Policy Learning
-
DQN and its Application to Atari Games
References
Richard S. Sutton and Andrew G. Barto:
Reinforcement Learning: An Introduction
Second edition, in progress
http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf
David Silver's UCL Reinforcement
Learning Course
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
Csaba Szepesvari:
Algorithms for Reinforcement Learning
https://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs-lecture.pdf
Dimitri P. Bertsekas
and John N. Tsitsiklis (1996)
Neuro-Dynamic Programming
Athena Scientific, Belmont MA