講座名稱:Fast Linear Programming Based Approaches for Solving Markov Decision Processes
講座人:陳彩華 副教授
講座時(shí)間:11月18日14:30
講座地點(diǎn):南校區(qū)信遠(yuǎn)Ⅱ區(qū)206報(bào)告廳
講座人介紹:
陳彩華,副教授,南京大學(xué)理學(xué)博士,新加坡國(guó)立大學(xué)聯(lián)合培養(yǎng)博士,曾赴新加坡國(guó)立大學(xué)、香港中文大學(xué)等學(xué)習(xí)與訪問。主持/完成的基金包括國(guó)家自然科學(xué)基金面上項(xiàng)目、青年項(xiàng)目,江蘇省自然科學(xué)基金面上項(xiàng)目、青年項(xiàng)目,參與國(guó)家自然科學(xué)基金重點(diǎn)項(xiàng)目,代表作發(fā)表在《Mathematical Programming》,《SIAM Journal on Optimization》,《SIAM Journal on Imaging Science》及CVPR、NIPS等國(guó)際知名學(xué)術(shù)期刊與會(huì)議,其中多篇論文入選ESI高被引論文。獲華人數(shù)學(xué)家聯(lián)盟最佳論文獎(jiǎng)(2017、2018連續(xù)兩年),中國(guó)運(yùn)籌學(xué)會(huì)青年科技獎(jiǎng)(2018),南京大學(xué)青年五四獎(jiǎng)?wù)?2019),入選首批南京大學(xué)仲英青年學(xué)者(全校10人,2018)。
講座內(nèi)容:
Markov Decision Processes (MDPs) are widely applied in the study of sequential decision making, whose applications are almost endless. In practice, MDPs are often solved using dynamic programming method such as policy iteration and value iteration. Compared with the dynamic programming method, the linear programming based approach for MDPs is powerful in theory but often not quite e?icient in practice as a solution method. To improve its practicability, this paper studies the linear programming based approach for MDPs from a computational perspective. Specifically, we consider a discrete stage, infinite horizon discounted MDP, which has a nice property ensuring the existence of deterministic optimal policies. In this respect, we introduce a weakly and a strongly constrained sparse formulation for MDPs to find optimal deterministic policies approximately. By exploiting the sparse structure in the formulations, we propose a class of multi-block alternating direction methods of multipliers (ADMM) to solve MDPs e?iciently. The numerical study shows that, ADMM applied to the strongly constrained formulation achieves a comparable runtime as policy iteration, and beats other methods by a large margin. To the best of our knowledge, this is the first linear programming based method that achieves a similar performance as policy iteration. Theoretically, we also prove the convergence of the multi-block ADMM using recent techniques of convex and nonconvex optimization. These findings are encouraging that we take a first step in showing that linear programming based methods can be e?icient for solving MDPs, which offers a different perspective, including its elegant theory and great toolbox, to the study of sequential decision making.
主辦單位:數(shù)學(xué)與統(tǒng)計(jì)學(xué)院