Deep Reinforcement Learning

 

 

By: Hyeong In Choi ()

Venue: 129-104

Time: 7:30 p.m. Fridays

      

Lecture 1 (April 1, 2016)

-       Reinforcement Learning: What and Why

-       AlphaGo: Design and how it works

-       Fundamental of Markov Decision Process (MDP)

 

Lecture 2 (April 8, 2016)

-       Bellman Equation

-       Value Iteration

-       Dynamic Programming

-       Monte Carlo

-       TD()

 

Lecture 3 (April 22, 2016)

-       Stochastic Approximation Algorithm: Examples and Basic Theory

 

Lecture 4 (April 29, 2016)

-       Forward and Backward Views of TD()

-       Model–Free Control

Policy Iteration Theorem
Greedy Policy

-Greedy Policy

-       SARSA()

 

Lecture 5 (May 13, 2016)

-       (Review) Looking at MDP from Bayesian Network Viewpoint

-       Q-learning

 

Lecture 6 (May 20, 2016)

-       Function Approximation
   State Value Functions and Action Value Functions

-       On-Policy Learning and Off-Policy Learning

-       DQN and its Application to Atari Games

 

 

 

References

 

Richard S. Sutton and Andrew G. Barto

Reinforcement Learning: An Introduction Second edition, in progress 

http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf

 

David Silver's UCL Reinforcement Learning Course

http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

 

Csaba Szepesvari:  

Algorithms for Reinforcement Learning

https://www.ualberta.ca/~szepesva/papers/RLAlgsInMDPs-lecture.pdf

 

Dimitri P. Bertsekas and John N. Tsitsiklis (1996)

Neuro-Dynamic Programming

Athena Scientific, Belmont MA