Studying Average Reward Reinforcement Learning via Anchored Value Iteration

LIST

모드선택 :              
세미나 신청은 모드에서 세미나실 사용여부를 먼저 확인하세요

Studying Average Reward Reinforcement Learning via Anchored Value Iteration

김한나 0 261
구분 박사학위 논문 발표
일정 2025-06-19(목) 12:00~13:00
세미나실 27동 220호
강연자 이종민 (서울대학교)
담당교수 강명주
기타

Average-reward Markov decision processes (MDPs) provide a fundamental framework for long-term, steady-state decision-making. As reinforcement learning becomes central to deep learning and large-language-model research, interest in the average-reward setting has grown. However, compared with the discounted-reward counterpart, average-reward MDPs are harder to analyze, and the literature remains sparse. This thesis advances the study of average-reward reinforcement learning through Anchored Value Iteration (Anc-VI), presenting three main contributions in tabular setup, generative model setup, and offline RL setup.


    정원 :
    부속시설 :
세미나명