先免责声明一下,我没有用过国产模型厂商官方的 API,我遇到的所有问题都是在使用中转 API 时遇到的,如果官方 API 没有我提到的这些问题,你可以忽略此贴。
第一个是工具调用的格式问题。
我的 agent 中的一些工具需要接收复杂类型的参数,比如一个字符串数组。
例如我的 agent 项目中定义的 ask_user 工具:
@built_in_tool(validate=True, defaults=BuiltInToolDefaults(needs_user_interaction=True))
def ask_user(self,
question: Annotated[str,
"""
The clear and concise question to ask the user.
IMPORTANT:
If this is a multiple-choice question, DO NOT list options here (no A/B/C, no 1/2/3, no bullet points).
The options must be passed separately via the 'options' parameter.
"""],
options: Annotated[list[str] | None,
"""
Specific options for the user to choose from.
If provided, the user is forced to choose one option.
"""] = None
) -> str:
"""
Ask the user for missing information.
When SHOULD to use:
- The following action is **irreversible** (file deletion, sending messages, making payments, system configuration changes, etc.) and the user has not explicitly confirmed intent
- A required parameter cannot be inferred from context and has no safe default
- A necessary tool use has failed 3 consecutive times
- The task has two (or more) valid interpretations with meaningfully different outcomes, and assumption cost is high
When NOT to use:
- Simple conversation or quick factual questions
- The user already provided clear, detailed requirements
- Stylistic preferences when a reasonable default exists
- The task has been already clarified earlier in the conversation
CORRECT & INCORRECT Usage Examples:
[WANT: Ask for clarification on format]
INCORRECT: ask_user(question="Should the report be PDF, Word, or HTML?")
CORRECT: ask_user(question="What format would you like for the report?", options=["PDF", "Word", "HTML"])
[WANT: Open-ended question]
CORRECT: ask_user(question="What specific time period should I search for?")
"""
...
我在使用 glm-5 和 qwen3.6-plus 时,这两个模型在调用类似工具时几乎总是传一个字符串,其中包含一个 JSON 字符串数组,即:
ask_user("question", "[\"option1\", \"option2\"]")
而不是正确的格式:
ask_user("question", ["option1", "option2"])
即使模型看到了 validation error,甚至我直接告诉模型应该怎么传,模型依然会传错误格式(所以我认为可能问题的根源不在模型)
第二点是今天我在给我的 agent 加新功能时遇到的。
简单来说,我给我的 agent 的 read_file 工具加了一个语义读取的功能,agent 可以调用 flash model 来获取文件中需要的内容而不需要读整个文件。
@built_in_tool(validate=True, defaults=BuiltInToolDefaults(auto_approve=True))
async def read_file(self,
path: Annotated[str,
"The path of the file to read (relative to the current working directory)."],
offset: Annotated[int, "The line number to start reading from (1-based)."] = 1,
max_lines: Annotated[int, "The maximum number of lines to read."] = 2000,
semantic: Annotated[str | None,
"""
When provided, this parameter will be used as prompt to pass to another model to semantically analyse the ENTIRE file content (offset and max_lines are ignored).
PREFER this over normal reading when:
- The file is likely to be long (e.g. PDF, Word documents, reports)
- The goal is to understand, summarize, or extract information from the file
NOTE: When the analysis failed, it will fallback to normal reading mode and returns raw file content instead — check the `mode` attribute to know which mode was used.
"""] = None,
) -> str:
"""
Read the contents of a file at the specified path.
For text files, this tool will directly return the file content;
for .pdf, .docx, .pptx, .xlsx, .epub files, this tool will convert the file to markdown format and return the markdown text.
Returns:
A XML string with the file content and metadata as attributes:
- start_line: The line number of the first line returned (1-based), only exist for "normal" mode
- end_line: The line number of the last line returned (1-based), only exist for "normal" mode
- total_lines: The total number of lines in the file
- mode: The actual mode used, can be one of "normal" and "semantic"
Normal mode example:
<file_content start_line="1" end_line="50" total_lines="312" mode="normal">
def foo():
...
</file_content>
Semantic mode example (read_file(path="report.pdf", semantic="Summarize the key findings")):
<file_content total_lines="312" mode="semantic">
The report identifies three main findings: ...
</file_content>
To read the next chunk in normal mode, pass end_line + 1 as the offset in the next call.
"""
我测试了 glm-5 和 gpt-5.2,gpt 在我第一次测试时就会主动调用 semantic 参数,而 glm 在我测试的几次中都不会主动调用。我的测试提示词也很简单:
帮我看一下 "requirements/Individual Project Handbook.pdf" 中对于 project report 有哪些要求
3 个帖子 - 3 位参与者