Seminar2016

Deep Reinforcement Learning

By: Hyeong In Choi (최형인)

Venue: 129-104

Time: 7:30 p.m. Fridays

Lecture 1 (April 1, 2016)

- Reinforcement Learning: What and Why

- AlphaGo: Design and how it works

- Fundamental of Markov Decision Process (MDP)

Lecture 2 (April 8, 2016)

- Bellman Equation

- Value Iteration

- Dynamic Programming

- Monte Carlo

- TD(λ)

Lecture 3 (April 22, 2016)

- Stochastic Approximation Algorithm: Examples and Basic Theory

Lecture 4 (April 29, 2016)

- Forward and Backward Views of TD(λ)

- Model–Free Control

Policy Iteration Theorem
Greedy Policy

ε-Greedy Policy

- SARSA(λ)

Lecture 5 (May 13, 2016)

- (Review) Looking at MDP from Bayesian Network Viewpoint

- Q-learning

Lecture 6 (May 20, 2016)

- Function Approximation
State Value Functions and Action Value Functions

- On-Policy Learning and Off-Policy Learning

- DQN and its Application to Atari Games

References

Richard S. Sutton and Andrew G. Barto:

Reinforcement Learning: An Introduction Second edition, in progress

David Silver's UCL Reinforcement Learning Course

Csaba Szepesvari:

Algorithms for Reinforcement Learning

Dimitri P. Bertsekas and John N. Tsitsiklis (1996)

Neuro-Dynamic Programming

Athena Scientific, Belmont MA