组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

3.0 2025-05-14 38 0 6322 KB 47 页 VIP免费 PDF
侵权投诉
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
摘要:

Shuai Li(https://shuaili8.github.io/)Shanghai Jiao Tong UniversityThis work is published at ICML 2024Oct13,2024https://arxiv.org/abs/2406.01386Driving questionCancombinatorialMABhandledecision-makingsystems with states?WhatistherelationshipbetweenRLandcombinatorialMAB?Main technicalresult●EpisodicRLisaninstanceofCMAB●Plug-inCMABalgorithms/analysisthatachievesnear-optimalregret●IntegratingRLstructureachievesminimax optimalregretConnectionPlug-inSolutionMinimaxOptimal•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RL•Background of multi-armed bandit (MAB) and combinatorial MAB (CMAB)•Episodic RL and its connection with CMAB•CMAB view of solving episodic RLmax?(?)?.?.?∈[?]●?(?):reward/utility fordecision?●[?]: mcandidatedecisionsDrivingRecSysLLMselectionObjective: maximize expectedtotalrewards?σ?=1???●Anagentand?arms●Eacharm?∈?hasarewarddistribution??withunknownmean??●Ineachro

展开>> 收起<<
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅

共 47 页,预览5页

还剩42页未读, 继续阅读

组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
组合多变量多臂土匪及其在情景强化学习等领域的应用-李帅
声明:企商查报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
属性:47 页 大小:6322 KB 格式:PDF 时间:2025-05-14

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 3
客服
关注