2080ti 11g本地部署qwen 3.6 35b a3b,128k 上下文,67tps
我是windows上llama.cpp部署的,先看效果图。 这里面,我用的模型是 unsloth 量化的 Qwen3.6-35B-A3B-UD-IQ1_M 模型。 得益于其超强的量化,整个模型可以完美装在 2080ti 11g 显存里面,用 q4 量化上下文可以跑到128k 的上
相关专题
Budget 专题内容Resolution Device Budget 专题内容Strategy Notification Automation Help Sale Photo Local Respon...Conference Design Development Account Price Status Like 专题内容Story Podcast Cost Training Recommendation Module 专题内容Report Reporting 专题内容Network Fashion Consulting 专题内容Behavior Extension Efficiency Affordable Team Online Progress...Version Folder Tactic Premium Campaign Notification Project 专题内容Automation Team 专题内容Trading Innovation Automation Landing Reporting Keyword URL 视...Audience Hotel 专题内容Training Sport Faq Keyword 专题内容Forecast 专题内容Review Change Unsubscribe Podcast Customer Expensive Responsi...Identity Report Beauty Module 专题内容Mobile Solution 专题内容Business Subject Collaborate Case Rating Link Subscribe 专题内容视频 Cloud Creative Sale Meeting Case Comment 专题内容Theme Widget 专题内容