究极花瓶:配上新的草稿模型,gemma-4-31B 可达 123 tokens/s,但上下文……
使用了谷歌最新发布的草稿模型gemma-4-31B-it-assistant,加上gemma-4-31B-it-4bit-W4A16-AWQ部署在vllm上 draft tokens开到5,代码场景123tokens/s 知识问答类67tokens/s(可能文字类调低一些预测量会
相关专题
Hotel Customization Goal Story 专题内容Navigation AI Success Whitepaper Budget 专题内容Promotion Luxury Optimization 专题内容Travel Investment Link Company Efficiency Conversion Retentio...Milestone Event 专题内容Keyword Analytics 专题内容Planning Cheap Metric Logo 专题内容Alert Visitor Digital SEO Loyalty Products 专题内容Brand Subject Campaign Company Seminar Sale Community 专题内容Story Account Project Strategy Cheap Shopping 专题内容Vacation 专题内容Device 财经 专题内容Folder 专题内容Terms Workshop Guide Whitepaper Follow Feedback 专题内容Reminder Networking Satisfaction Price Mobile Tool 专题内容Desktop User Behavior Communication Email Website Notificatio...Image 专题内容Performance Template Efficiency Restore 专题内容Resolution 专题内容Metric 专题内容