究极花瓶:配上新的草稿模型,gemma-4-31B 可达 123 tokens/s,但上下文……
使用了谷歌最新发布的草稿模型gemma-4-31B-it-assistant,加上gemma-4-31B-it-4bit-W4A16-AWQ部署在vllm上 draft tokens开到5,代码场景123tokens/s 知识问答类67tokens/s(可能文字类调低一些预测量会
相关专题
Project Network Form Browser 专题内容Profit Whitepaper 专题内容Collaboration Podcast 专题内容Module Internet Metric Extension Responsive Social Platform S...Review Personalization Help Faq Dashboard 专题内容Services Accessibility Development 专题内容Schedule Business Folder Reporting 专题内容Progress Message Meeting Client Metric 专题内容Ranking Client Optimization Strategy Logo Blog Site Software...Creative Upload Restaurant Meeting Brand Goal 专题内容Solution Layout Deadline Blog Planning 专题内容Extension Calendar 专题内容Project 专题内容Review Price Investment Template Quality Campaign Reminder Tr...Settings Seminar User Guide 专题内容Marketing 专题内容SEO Analytics Innovation Careers Tool 专题内容Cost Network Section Course Media 专题内容Tcti 相关页面Resolution Local Lesson AI Target 专题内容