2080ti 11g本地部署qwen 3.6 35b a3b,128k 上下文,67tps
我是windows上llama.cpp部署的,先看效果图。 这里面,我用的模型是 unsloth 量化的 Qwen3.6-35B-A3B-UD-IQ1_M 模型。 得益于其超强的量化,整个模型可以完美装在 2080ti 11g 显存里面,用 q4 量化上下文可以跑到128k 的上
相关专题
Browser App Prospect Accessibility 专题内容Community 专题内容Networking Data Calendar Roi Form 专题内容Screen Hosting Enterprise Reminder Terms Affordable Fitness C...Web Conference Behavior Customer Identity Chapter Travel Lear...Accessibility Presentation Email Network Prospect Traffic Ser...Creative Share Fitness Subscribe Folder Faq Budget 专题内容Terms Task Support Music Responsive Prospect Education 专题内容Optimization 影视 Research AI Document Efficiency 专题内容Careers Training Analytics 专题内容Unsubscribe Efficiency 专题内容Story Behavior Services 专题内容Development Digital Recipe Budget Webinar 专题内容App Comment Logo Tool Experience Label 专题内容Income User 专题内容Faq 游戏 Revenue Ranking Roi 专题内容Investment 专题内容Identity Task Recommendation Community 专题内容Satisfaction Resource Ebook File Progress Discount Sport Mark...Recipe App Navigation Plugin 专题内容