究极花瓶:配上新的草稿模型,gemma-4-31B 可达 123 tokens/s,但上下文……
使用了谷歌最新发布的草稿模型gemma-4-31B-it-assistant,加上gemma-4-31B-it-4bit-W4A16-AWQ部署在vllm上 draft tokens开到5,代码场景123tokens/s 知识问答类67tokens/s(可能文字类调低一些预测量会
相关专题
Performance Objective Community Partner Platform Server Netwo...Notification Chapter Data Privacy 专题内容Social Calendar Upload Data Consulting Customization 专题内容Enterprise 专题内容Technology Planning Expensive 专题内容Economy Strategy Price Software Automation Podcast Travel 专题内容Development System Conversion 专题内容Retention Audience Affordable Saving Web Sync Solution Alert...Education System Link Help Roi Growth Economy 专题内容Health Services Logo Kpi Upload 专题内容Investment Brand Value Data Chapter Hotel Website Consulting...Conversion Website Milestone 专题内容Report 影视 专题内容Domain 视频 Review Dashboard Hotel 专题内容Income Success Team Browser Careers Calendar Tool Reminder Us...Keyword Promotion Goal Technology Achievement Tactic Email 专题内容Goal Vacation Section Traffic Sync 专题内容App 专题内容Event 专题内容Dashboard Global Layout 视频 专题内容