DeepSeekR1技术报告

3.0 2025-05-13 77 0 1351 KB 22 页 PDF
侵权投诉
DeepSeekR1技术报告
DeepSeekR1技术报告
DeepSeekR1技术报告
DeepSeekR1技术报告
DeepSeekR1技术报告
摘要:

DeepSeek-R1:IncentivizingReasoningCapabilityinLLMsviaReinforcementLearningDeepSeek-AIresearch@deepseek.comAbstractWeintroduceourfirst-generationreasoningmodels,DeepSeek-R1-ZeroandDeepSeek-R1.DeepSeek-R1-Zero,amodeltrainedvialarge-scalereinforcementlearning(RL)withoutsuper-visedfine-tuning(SFT)asapreliminarystep,demonstratesremarkablereasoningcapabilities.ThroughRL,DeepSeek-R1-Zeronaturallyemergeswithnumerouspowerfulandintriguingreasoningbehaviors.However,itencounterschallengessuchaspoorreadability,andlanguagemixing.Toaddresstheseissuesandfurtherenhancereasoningperformance,weintroduceDeepSeek-R1,whichincorporatesmulti-stagetrainingandcold-startdatabeforeRL.DeepSeek-R1achievesperformancecomparabletoOpenAI-o1-1217onreasoningtasks.Tosupporttheresearchcommunity,weopen-sourceDeepSeek-R1-Zero,DeepSeek-R1,andsixdensemodels(1.5B,7B,8B,14B,32B,70B)distilledfromDeepSeek-R1basedonQwenandLlama.AIME 2024(Pass@1)Codeforces(Percentile)GPQA Diamond(Pass@1)MATH-500(Pass@1)MMLU(Pass@1)SWE-bench Verified(

展开>> 收起<<
DeepSeekR1技术报告

共 22 页,预览3页

还剩19页未读, 继续阅读

声明:企商查报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
作者: 分类: 属性:22 页 大小:1351 KB 格式:PDF 时间:2025-05-13

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 3
客服
关注