组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

3.0 2025-05-14 37 0 6322 KB 47 页 PDF
侵权投诉
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
摘要:

Shuai Li(https://shuaili8.github.io/)Shanghai Jiao Tong UniversityThis work is published at ICML 2024Oct13,2024https://arxiv.org/abs/2406.01386Driving questionCancombinatorialMABhandledecision-makingsystems with states?WhatistherelationshipbetweenRLandcombinatorialMAB?Main technicalresult●EpisodicRLisaninstanceofCMAB●Plug-inCMABalgorithms/analysisthatachievesnear-optimalregret●IntegratingRLstructureachievesminimax optimalregretConnectionPlug-inSolutionMinimaxOptimal•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RL•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RLmax𝑟(𝑖)𝑠.𝑡.𝑖∈[𝑚]●𝑟(𝑖):reward/utility fordecision𝑖●[𝑚]: mcandidatedecisionsDrivingRecSysLLMselectionObjective: maximize expectedtotalrewards𝔼σ𝑡=1𝑇𝑋𝑡●Anagentand𝒎arms●Eacharm𝑖∈𝒎hasarewarddistribution𝑷𝒊withunknownmean𝒓𝒊●Ineachro

展开>> 收起<<
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

共 47 页,预览3页

还剩44页未读, 继续阅读

组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
声明:企商查报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
作者: 分类:机构报告 属性:47 页 大小:6322 KB 格式:PDF 时间:2025-05-14

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 3
客服
关注