DeepSeekV3技术报告

3.0 2025-05-13 64 0 1778 KB 53 页 PDF
侵权投诉
DeepSeekV3技术报告
DeepSeekV3技术报告
DeepSeekV3技术报告
DeepSeekV3技术报告
DeepSeekV3技术报告
摘要:

DeepSeek-V3TechnicalReportDeepSeek-AIresearch@deepseek.comAbstractWepresentDeepSeek-V3,astrongMixture-of-Experts(MoE)languagemodelwith671Btotalparameterswith37Bactivatedforeachtoken.Toachieveefficientinferenceandcost-effectivetraining,DeepSeek-V3adoptsMulti-headLatentAttention(MLA)andDeepSeekMoEarchitec-tures,whichwerethoroughlyvalidatedinDeepSeek-V2.Furthermore,DeepSeek-V3pioneersanauxiliary-loss-freestrategyforloadbalancingandsetsamulti-tokenpredictiontrainingobjectiveforstrongerperformance.Wepre-trainDeepSeek-V3on14.8trilliondiverseandhigh-qualitytokens,followedbySupervisedFine-TuningandReinforcementLearningstagestofullyharnessitscapabilities.ComprehensiveevaluationsrevealthatDeepSeek-V3outperformsotheropen-sourcemodelsandachievesperformancecomparabletoleadingclosed-sourcemodels.Despiteitsexcellentperformance,DeepSeek-V3requiresonly2.788MH800GPUhoursforitsfulltraining.Inaddition,itstrainingprocessisremarkablystable.Throughouttheentiretrainingprocess,wedidnotexperienceanyirrecoverabl

展开>> 收起<<
DeepSeekV3技术报告

共 53 页,预览3页

还剩50页未读, 继续阅读

声明:企商查报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
作者: 分类: 属性:53 页 大小:1778 KB 格式:PDF 时间:2025-05-13

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 3
客服
关注