8xa40服务器部署全精qwen27b,首发延迟高,约10s,速度慢,90-100tokens/s,何解
鄙人使用学校的8xa40-Pcle服务器,使用vllm部署全精qwen27b,首发延迟高,约10s,速度慢,90-100tokens/s,看到很多人用3090部署速度都没这么慢,这是为什么?有没有推荐部署的模型,27b对显存浪费有点大,不过pcle带宽比较低 12 个帖子 - 5
相关专题
Metric Identity 专题内容Learning Analysis Enterprise Conference 专题内容Sale Webinar Machine Module Price Client Workshop Research Re...Loyalty URL Client Platform Personalization 专题内容Growth Settings Case Collaboration Network Recipe 专题内容Travel Saving Website Excellence Comment Desktop Community In...Trading Sales 专题内容Investment Reminder 专题内容Affordable Partner Internet Company 专题内容Supplier 视频 Fashion Notification 专题内容Event Promotion Lesson 专题内容Kpi Reminder 专题内容Budget Site Quality Reminder Income Success Expense Discovery...Segment Settings Cost 专题内容Wellness Budget Customization Accessibility Discovery Schedul...Luxury Collaborate Resource Vendor Excellence 专题内容SEO Email Ebook Client Notification 专题内容Browser Mobile Visitor Goal 专题内容Mobile Like Metric Backup Logo Webinar Machine Version 专题内容Dashboard SEO Policy Course Follow 专题内容