Yifan Zhang:本科北大元培、硕士清华姚班,现为普林斯顿大学 AI Lab Fellow,研究方向是大语言模型推理与强化学习
这哥们儿不是DeepSeek的员工,但估计和DeepSeek关系挺密切的,之前几次预测都说对了。
这次他披露的:
V4 1.6T, V4-Lite 285B
Attention: DSA2 (NSA + DSA),
head-dim 512 Sparse MQA + SWA
MoE: Fused MoE Mega-Kernel with 6 active in 384 experts
Residual: Hyper-Connections
Optimizer: Muon
Pretrain context length: 32K
RL: GRPO with corrected KL
Final Context Length: 1M
Modality: Text only
DeepSeek V4估计发布也快了,到时候验证吧
12 个帖子 - 11 位参与者