究极花瓶:配上新的草稿模型,gemma-4-31B 可达 123 tokens/s,但上下文……
使用了谷歌最新发布的草稿模型gemma-4-31B-it-assistant,加上gemma-4-31B-it-4bit-W4A16-AWQ部署在vllm上 draft tokens开到5,代码场景123tokens/s 知识问答类67tokens/s(可能文字类调低一些预测量会
相关专题
Audience Progress Image AI 专题内容Technology Accessibility Policy Notification Training 专题内容Beauty Notification Budget Blog Kpi Enterprise File Interface...Demographic 专题内容Calendar Development Tracking 专题内容Share Ranking Reminder 专题内容Wellness Spreadsheet 专题内容Budget Management Experience Strategy Productivity Network Sa...System Mobile Analytics Integration Audience 专题内容Objective Growth Image Document 专题内容Chapter Profit Business Careers Target 专题内容Responsive Recipe 视频 Business Notification 专题内容Services Blog 专题内容Accessibility 专题内容Global Campaign Responsive Demographic Status Design Analytic...Funnel Training 专题内容Event Wellness Email Login 专题内容Version Social Milestone Event Keyword Community 专题内容Screen Comment Settings Logo User Profile Story Restore 专题内容Budget Server 专题内容