Reinforcement learning (RL) with function approximation schemes has achieved remarkable success in a wide range of areas, including video games, Go, robotics, and autonomous driving. Such empirical progress has stimulated endeavors to expand our theoretical understanding of RL with function approximation. As a first step toward establishing theoretical foundations, linear function approximation frameworks have received significant attention. Among them, there has been a plethora of activities in linear Markov decision processes (MDPs) and linear mixture MDPs where the underlying transition kernel and the reward function are assumed to be parameterized as a linear function of some given feature mappings over state-action pairs or state-action-state triplets. In this talk, we develop the first provably efficient algorithm that guarantees the tightest regret upper bound for learning linear infinite-horizon average-reward MDPs. Moreover, we show a minimax optimal algorithm for learning infinite-horizon linear mixture MDPs under the Bellman optimality condition. As an extension, we also consider a concrete non-linear function class, the multinomial logistic (MNL) function approximation framework. For the MNL framework, we provide tight regret upper and lower bounds for both finite-horizon and infinite-horizon settings. This talk is based on joint works with Woojin Chae, Junyeop Kwon, Jaehyun Park (KAIST) and Kihyuk Hong, Yufan Zhang, Ambuj Tewari (Michigan).
Dabeen Lee is an assistant professor at the Department of Industrial and Systems Engineering (ISE) at the Korea Advanced Institute of Science and Technology (KAIST). Before joining KAIST, he was a post-doctoral researcher at the Discrete Mathematics Group at the Institute for Basic Science of Korea. Dabeen received his Ph.D. in the Algorithms, Combinatorics, and Optimization (ACO) program at the Tepper School of Business of Carnegie Mellon University. His research interests lie in designing algorithms and mathematical programming frameworks for broad areas of optimization theory spanning discrete, combinatorial, integer, convex, online, stochastic, robust, and distributionally robust optimization. His current research projects are on sequential decision-making with online learning frameworks.