感觉国产模型在工具调用上还是差一些

先免责声明一下,我没有用过国产模型厂商官方的 API,我遇到的所有问题都是在使用中转 API 时遇到的,如果官方 API 没有我提到的这些问题,你可以忽略此贴。 第一个是工具调用的格式问题。 我的 agent 中的一些工具需要接收复杂类型的参数,比如一个字符串数组。 例如我的 agent 项目中定义...
感觉国产模型在工具调用上还是差一些
感觉国产模型在工具调用上还是差一些

先免责声明一下,我没有用过国产模型厂商官方的 API,我遇到的所有问题都是在使用中转 API 时遇到的,如果官方 API 没有我提到的这些问题,你可以忽略此贴。


第一个是工具调用的格式问题。
我的 agent 中的一些工具需要接收复杂类型的参数,比如一个字符串数组。
例如我的 agent 项目中定义的 ask_user 工具:

    @built_in_tool(validate=True, defaults=BuiltInToolDefaults(needs_user_interaction=True))
    def ask_user(self,
                 question: Annotated[str,
                    """
                    The clear and concise question to ask the user.

                    IMPORTANT:
                    If this is a multiple-choice question, DO NOT list options here (no A/B/C, no 1/2/3, no bullet points).
                    The options must be passed separately via the 'options' parameter.
                    """],
                 options: Annotated[list[str] | None,
                    """
                    Specific options for the user to choose from.
                    If provided, the user is forced to choose one option.
                    """] = None
                 ) -> str:
        """
        Ask the user for missing information.

        When SHOULD to use:
            - The following action is **irreversible** (file deletion, sending messages, making payments, system configuration changes, etc.) and the user has not explicitly confirmed intent
            - A required parameter cannot be inferred from context and has no safe default
            - A necessary tool use has failed 3 consecutive times
            - The task has two (or more) valid interpretations with meaningfully different outcomes, and assumption cost is high

        When NOT to use:
            - Simple conversation or quick factual questions
            - The user already provided clear, detailed requirements
            - Stylistic preferences when a reasonable default exists
            - The task has been already clarified earlier in the conversation

        CORRECT & INCORRECT Usage Examples:
            [WANT: Ask for clarification on format]
            INCORRECT: ask_user(question="Should the report be PDF, Word, or HTML?")
            CORRECT:   ask_user(question="What format would you like for the report?", options=["PDF", "Word", "HTML"])

            [WANT: Open-ended question]
            CORRECT:   ask_user(question="What specific time period should I search for?")
        """
        ...

我在使用 glm-5 和 qwen3.6-plus 时,这两个模型在调用类似工具时几乎总是传一个字符串,其中包含一个 JSON 字符串数组,即:

ask_user("question", "[\"option1\", \"option2\"]")

而不是正确的格式:

ask_user("question", ["option1", "option2"])

即使模型看到了 validation error,甚至我直接告诉模型应该怎么传,模型依然会传错误格式(所以我认为可能问题的根源不在模型)


第二点是今天我在给我的 agent 加新功能时遇到的。
简单来说,我给我的 agent 的 read_file 工具加了一个语义读取的功能,agent 可以调用 flash model 来获取文件中需要的内容而不需要读整个文件。

    @built_in_tool(validate=True, defaults=BuiltInToolDefaults(auto_approve=True))
    async def read_file(self,
                        path: Annotated[str,
                          "The path of the file to read (relative to the current working directory)."],
                        offset: Annotated[int, "The line number to start reading from (1-based)."] = 1,
                        max_lines: Annotated[int, "The maximum number of lines to read."] = 2000,
                        semantic: Annotated[str | None,
                            """
                            When provided, this parameter will be used as prompt to pass to another model to semantically analyse the ENTIRE file content (offset and max_lines are ignored).
                            PREFER this over normal reading when:
                            - The file is likely to be long (e.g. PDF, Word documents, reports)
                            - The goal is to understand, summarize, or extract information from the file
                            NOTE: When the analysis failed, it will fallback to normal reading mode and returns raw file content instead — check the `mode` attribute to know which mode was used.
                            """] = None,
                        ) -> str:
        """
        Read the contents of a file at the specified path.
        For text files, this tool will directly return the file content;
        for .pdf, .docx, .pptx, .xlsx, .epub files, this tool will convert the file to markdown format and return the markdown text.

        Returns:
            A XML string with the file content and metadata as attributes:
            - start_line: The line number of the first line returned (1-based), only exist for "normal" mode
            - end_line: The line number of the last line returned (1-based), only exist for "normal" mode
            - total_lines: The total number of lines in the file
            - mode: The actual mode used, can be one of "normal" and "semantic"

            Normal mode example:
                <file_content start_line="1" end_line="50" total_lines="312" mode="normal">
                def foo():
                    ...
                </file_content>

            Semantic mode example (read_file(path="report.pdf", semantic="Summarize the key findings")):
                <file_content total_lines="312" mode="semantic">
                The report identifies three main findings: ...
                </file_content>

        To read the next chunk in normal mode, pass end_line + 1 as the offset in the next call.
        """

我测试了 glm-5 和 gpt-5.2,gpt 在我第一次测试时就会主动调用 semantic 参数,而 glm 在我测试的几次中都不会主动调用。我的测试提示词也很简单:

帮我看一下 "requirements/Individual Project Handbook.pdf" 中对于 project report 有哪些要求

3 个帖子 - 3 位参与者

阅读完整话题

来源: linux.do查看原文