OK - 钛刻 - 科技风向旗 - 深度刻画技术趋势,引领数字未来 - 第105页 - 钛刻科技

我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

tech www.v2ex.com 2026-04-25 05:39:55+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 03:39:55+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 03:39:55+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 03:39:55+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 02:39:55+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 02:18:07+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-25 01:40:43+08:00

grok这是干啥啊，天天高需求

免费用户几乎都不能用了 6 个帖子 - 6 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-25 01:13:46+08:00

「听劝的富可敌国」OneToken.sh GPT-5.5，人民币2元100万Token，23-24充值的用户百分百补偿，抽10名500万平台Token

从「OneToken.sh」本站正式支持GPT-5.5，人民币2元100万Token，抽10位送500万Token【已听劝】继续讨论：官网： OneToken.sh 1M=100万Token 输入价格： 3元/M Token 输出价格： 12元/M Token 缓存输入：

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-25 00:34:09+08:00

发现用GPT 5.5以后TOKEN额度消耗反而更慢了

第一大原因是上下文256k，我会经常性新建会话效率提升，很少问用户问题，让任务流畅执行今天调用2000多次也只用了140M 7 个帖子 - 5 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-25 00:22:49+08:00

gpt-5.5 PreToolUse hook (failed) 如何解决？

PreToolUse hook (failed) error: hook exited with code 1 Windows 官方codex cli 1 个帖子 - 1 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-25 00:17:30+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:54:14+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:54:14+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:54:14+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:47:32+08:00

Grok 最近一直抽风，想问一下各位的情况

这几天天天就这状态，有没有佬能正常用网页普号，哪里IP？ 4 个帖子 - 4 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-24 23:42:47+08:00

从早上九点到现在，用了1.2亿token

今天写了一天代码，腰酸背痛的，一看cpa，今天竟然用了1.2亿token，买的日抛team回本了哈哈哈 14 个帖子 - 13 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-24 23:39:27+08:00

速登！免费！阿里云百炼上下deepseek4，速度快到飞起！

不是很多但速度是真的快啊！ flash|每秒175 tokens pro|每秒81 tokens 我觉得等大家都完善配置了速度还能起飞 12 个帖子 - 9 位参与者阅读完整话题

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech linux.do 2026-04-24 23:24:54+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:03:46+08:00

[Local LLM] 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

最近在折腾本地大模型，发现一个核心问题：Ollama 和 LM Studio 能让模型跑起来，但参数全靠猜——上下文长度、KV cache 类型、MoE expert 放哪、ubatch 多大……用默认参数基本是在浪费显卡。于是做了个工具自动找最优配置，过程中踩了不少坑，记录一

相关专题

Creative Message Partner Machine 专题内容 Privacy App 专题内容 Meeting Budget Navigation Strategy Productivity 专题内容 Engagement Business Automation Machine 专题内容 Local Prospect Button Contact Traffic Advertising 专题内容 Sync Privacy Security Database Keyword Label Performance 专题内容 Tracking Responsive Resolution Creative Products Business 专题内容 Module Chapter 专题内容 Blog Settings Business 专题内容 Account 专题内容 Affordable Enterprise Music Course Fitness Backup Beauty Stor...App URL Report Project Value Link 专题内容 Resolution Market Schedule Beauty Contact Team Account 专题内容影视 Document Cost 视频 Entertainment Alert Ebook Customer 专题内容 Tactic Restaurant Careers Sale Beauty 专题内容 Keyword Update Milestone Personalization Expensive Dashboard...Trading Milestone Browser Value Sale Success 专题内容 Link Hosting Audience API Tracking 专题内容 Cost Excellence 专题内容 Template Logo Tool Retention Objective Recommendation Spreads...

tech v2ex.com 2026-04-24 23:03:46+08:00

OK - 钛刻 - 科技风向旗 - 深度刻画技术趋势,引领数字未来 - 第105页 - 钛刻科技 | TCTI.cn

相关标签