450

Date | 2024-08-23 |
---|---|

Speaker | 이다빈 |

Dept. | KAIST |

Room | 27-220 |

Time | 16:00-17:00 |

We study online reinforcement learning (RL) in Markov decision processes. In particular, we consider the notion of regret of a learning algorithm defined with respect to an optimal policy, which is fundamental in online learning theory. In this talk, we discuss three different settings: safe RL, RL with linear function approximation, and RL with multinomial logistic function approximation. We develop several algorithms and optimization frameworks, by which we improve upon the best known regret upper bounds for the three regimes. Moreover, we provide improved regret lower bounds for RL with multinomial logistic function approximation.

TEL 02-880-5857,6530,6531 / FAX 02-887-4694