亚洲综合日韩精品欧美综合区,超碰97人人做人人爱网站,无码人妻少妇狂喷V999AⅤ ,男人猛躁进女人免费播放

您當(dāng)前所在位置: 首頁 > 講座報(bào)告 > 正文
講座報(bào)告

A Causal View for Reward Redistribution in Reinforcement Learning

來源:電子工程學(xué)院          點(diǎn)擊:
報(bào)告人 杜雅麗 助理教授 時(shí)間 7月9日14:30
地點(diǎn) 北校區(qū)100號(hào)樓306 報(bào)告時(shí)間

講座名稱:A Causal View for Reward Redistribution in Reinforcement Learning

講座人:杜雅麗 助理教授

講座時(shí)間:7月9日14:30

地點(diǎn):北校區(qū)100號(hào)樓306


講座人介紹:

杜雅麗,倫敦大學(xué)國王學(xué)院助理教授。此前她是倫敦大學(xué)學(xué)院人工智能中心博士后研究員。主要研究興趣為強(qiáng)化學(xué)習(xí)、多智能體學(xué)習(xí)協(xié)作。 研究成果已廣泛發(fā)表在ICLR、ICML、NeurIPS以及AI Journal等頂級(jí)會(huì)議和期刊。她曾在 ACML2022, AAAI2023 大會(huì)上演講合作多智能體學(xué)習(xí)的教程 (Tutorial)。 曾多次擔(dān)任國際知名期刊的編輯和會(huì)議的審稿人或程序委員,AAMAS 2023組委會(huì), 擔(dān)任Journal of AAMAS (CCF B類) 特刊主編,IEEE Transactions on AI 副主編,擔(dān)任AAAI 2022/2023 高級(jí)程序委員。因在合作強(qiáng)化學(xué)習(xí)上的貢獻(xiàn),入選AAAI New Faculty Highlights programme (2023), Rising Stars in AI (KAUST 2023), WAIC云帆獎(jiǎng)(2023), KCL年度學(xué)術(shù)貢獻(xiàn)獎(jiǎng) (2022)。她的研究也受到英國工程和自然科學(xué)研究理事會(huì) (UKRI EPSRC)資助。


講座內(nèi)容:

In reinforcement learning, a significant challenge lies in identifying the state-action pairs responsible for delayed future rewards. Return Decomposition addresses this challenge by redistributing rewards from observed sequences while maintaining policy invariance. However, existing approaches lack interpretability. In this talk, we propose a novel framework called Generative Return Decomposition (GRD) that explicitly models the contributions of state and action from a causal perspective. GRD utilizes causal generative models to characterize the generation of Markovian rewards and trajectory-wise long-term return. By identifying the causal relations and unobservable Markovian rewards, GRD provides a compact representation for policy optimization within the most favorable subspace of the agent's state space. Theoretical analysis confirms the identifiability of the unobservable Markovian reward function and causal structure. Experimental results demonstrate the superior performance of GRD compared to state-of-the-art methods, while the provided visualization showcases the interpretability of our approach.


主辦單位:電子工程學(xué)院

123

南校區(qū)地址:陜西省西安市西灃路興隆段266號(hào)

郵編:710126

北校區(qū)地址:陜西省西安市太白南路2號(hào)

郵編:710071

訪問量:

版權(quán)所有:西安電子科技大學(xué)    建設(shè)與運(yùn)維:信息網(wǎng)絡(luò)技術(shù)中心     陜ICP備05016463號(hào)    陜公網(wǎng)安備61019002002681號(hào)