Studying Average Reward Reinforcement Learning via Anchored Value Iteration
김한나
27동 220호
0
261
05.26 14:25
구분 | 박사학위 논문 발표 |
---|---|
일정 | 2025-06-19(목) 12:00~13:00 |
세미나실 | 27동 220호 |
강연자 | 이종민 (서울대학교) |
담당교수 | 강명주 |
기타 |
Average-reward Markov decision processes (MDPs) provide a fundamental framework for long-term, steady-state decision-making. As reinforcement learning becomes central to deep learning and large-language-model research, interest in the average-reward setting has grown. However, compared with the discounted-reward counterpart, average-reward MDPs are harder to analyze, and the literature remains sparse. This thesis advances the study of average-reward reinforcement learning through Anchored Value Iteration (Anc-VI), presenting three main contributions in tabular setup, generative model setup, and offline RL setup.