组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

3.0 2025-05-14 38 0 6322 KB 47 页 VIP免费 PDF

侵权投诉

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

0 人已下载

立即下载

摘要：

Shuai Li(https://shuaili8.github.io/)Shanghai Jiao Tong UniversityThis work is published at ICML 2024Oct13,2024https://arxiv.org/abs/2406.01386Driving questionCancombinatorialMABhandledecision-makingsystems with states?WhatistherelationshipbetweenRLandcombinatorialMAB?Main technicalresult●EpisodicRLisaninstanceofCMAB●Plug-inCMABalgorithms/analysisthatachievesnear-optimalregret●IntegratingRLstructureachievesminimax optimalregretConnectionPlug-inSolutionMinimaxOptimal•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RL•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RLmax?(?)?.?.?∈[?]●?(?):reward/utility fordecision?●[?]: mcandidatedecisionsDrivingRecSysLLMselectionObjective: maximize expectedtotalrewards?σ?=1???●Anagentand?arms●Eacharm?∈?hasarewarddistribution??withunknownmean??●Ineachro

展开>> 收起<<

组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

共 47 页,预览5页

还剩42页未读，继续阅读