qwen3vl的tools设计验证
2025-12-12
深度学习
00

目录

tools工作流说明
请求json和返回json的长相
一次调用多个工具
模型可以理解的字符串的长相
VLM收到JSON后解析为模型可以理解的字符串】

tools工作流说明

tools工作流:

  • 客户端请求是openai式json。
  • vllm收到json后解析为模型可以理解的字符串。
  • 模型推理,得到结果字符串。
  • vllm解析结果字符串为json。
  • vllm返回openai式json。

请求json和返回json的长相

部署一个qwen3vl模型:

bash
展开代码
python -m vllm.entrypoints.openai.api_server \ --model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \ --served-model-name gpt \ --host 0.0.0.0 \ --port 8000 \ --trust-remote-code \ --max-model-len 8192 \ --gpu-memory-utilization 0.9 \ --tensor-parallel-size 1 \ --api-key "123" --enable-auto-tool-choice --tool-call-parser hermes

请求数据:

bash
展开代码
# 部署的模型服务地址 base_url = "http://100.96.168.186:8000/v1" # 请求头 headers = { "Content-Type": "application/json", "Authorization": "Bearer 123" # 使用部署时设置的 api-key } # 构建请求体 - 包含工具定义 request_data = { "model": "gpt", "messages": [ { "role": "user", "content": "请查询北京今天的天气情况" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "获取指定城市的天气信息", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "城市名称,例如:北京、上海" }, "date": { "type": "string", "description": "日期,格式:YYYY-MM-DD,默认为今天" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "温度单位,摄氏度或华氏度" } }, "required": ["city"] } } }, { "type": "function", "function": { "name": "search_web", "description": "在互联网上搜索信息", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "搜索关键词" }, "max_results": { "type": "integer", "description": "最大返回结果数量", "default": 5 } }, "required": ["query"] } } } ], "tool_choice": "auto", # 自动选择工具 "temperature": 0.7, "max_tokens": 2048 }

返回数据:

bash
展开代码
{ "id": "chatcmpl-ac7bc87341614d479f5e5bca93a9df4e", "object": "chat.completion", "created": 1765514379, "model": "gpt", "choices": [ { "index": 0, "message": { "role": "assistant", "content": null, "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [ { "id": "chatcmpl-tool-5f1efff241e04159939d7a8e10312d67", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\": \"北京\", \"date\": \"2023-11-14\", \"unit\": \"celsius\"}" } } ], "reasoning_content": null }, "logprobs": null, "finish_reason": "tool_calls", "stop_reason": null, "token_ids": null } ], "service_tier": null, "system_fingerprint": null, "usage": { "prompt_tokens": 311, "total_tokens": 353, "completion_tokens": 42, "prompt_tokens_details": null }, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null }

这个返回其实就是模型理解了含义,输出要调用工具:

bash
展开代码
ID: chatcmpl-tool-5f1efff241e04159939d7a8e10312d67 类型: function 函数名: get_weather 参数: {"city": "北京", "date": "2023-11-14", "unit": "celsius"} 解析后的参数: { "city": "北京", "date": "2023-11-14", "unit": "celsius" }

一次调用多个工具

bash
展开代码
================================================================================ 测试 2: 多个工具调用(测试是否支持一次调用多个工具) ================================================================================ ================================================================================ 测试多个工具调用... ================================================================================ 请求 JSON: { "model": "gpt", "messages": [ { "role": "user", "content": "请帮我查询北京和上海的天气,并且搜索一下这两个城市的最新新闻" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "获取指定城市的天气信息", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "城市名称,例如:北京、上海" }, "date": { "type": "string", "description": "日期,格式:YYYY-MM-DD,默认为今天" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ], "description": "温度单位,摄氏度或华氏度" } }, "required": [ "city" ] } } }, { "type": "function", "function": { "name": "search_web", "description": "在互联网上搜索信息", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "搜索关键词" }, "max_results": { "type": "integer", "description": "最大返回结果数量", "default": 5 } }, "required": [ "query" ] } } }, { "type": "function", "function": { "name": "get_time", "description": "获取当前时间或指定时区的时间", "parameters": { "type": "object", "properties": { "timezone": { "type": "string", "description": "时区,例如:Asia/Shanghai, UTC" }, "format": { "type": "string", "enum": [ "iso", "timestamp", "readable" ], "description": "时间格式" } }, "required": [] } } } ], "tool_choice": "auto", "temperature": 0.7, "max_tokens": 2048 } ================================================================================ 模型返回的 JSON: ================================================================================ { "id": "chatcmpl-2f9ac7a00bd442f5862ff50d49cd5790", "object": "chat.completion", "created": 1765514787, "model": "gpt", "choices": [ { "index": 0, "message": { "role": "assistant", "content": null, "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [ { "id": "chatcmpl-tool-c0f0319735664ccd9fe8a16fa94a3990", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\": \"北京\"}" } }, { "id": "chatcmpl-tool-6c0c2211ffd741b9a6a64548caa5c875", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\": \"上海\"}" } }, { "id": "chatcmpl-tool-43c31954d2f040a3a4c33eff6ecdf783", "type": "function", "function": { "name": "search_web", "arguments": "{\"query\": \"北京最新新闻\"}" } }, { "id": "chatcmpl-tool-f109a0eff4364a9581276fae7678984f", "type": "function", "function": { "name": "search_web", "arguments": "{\"query\": \"上海最新新闻\"}" } } ], "reasoning_content": null }, "logprobs": null, "finish_reason": "tool_calls", "stop_reason": null, "token_ids": null } ], "service_tier": null, "system_fingerprint": null, "usage": { "prompt_tokens": 422, "total_tokens": 506, "completion_tokens": 84, "prompt_tokens_details": null }, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null } ================================================================================ ================================================================================ 响应分析: ================================================================================ 角色: assistant 内容: None 完成原因: tool_calls 工具调用数量: 4 ✅ 成功!模型在一次响应中调用了 4 个工具 工具调用 1: ID: chatcmpl-tool-c0f0319735664ccd9fe8a16fa94a3990 类型: function 函数名: get_weather 参数: {"city": "北京"} 解析后的参数: { "city": "北京" } 工具调用 2: ID: chatcmpl-tool-6c0c2211ffd741b9a6a64548caa5c875 类型: function 函数名: get_weather 参数: {"city": "上海"} 解析后的参数: { "city": "上海" } 工具调用 3: ID: chatcmpl-tool-43c31954d2f040a3a4c33eff6ecdf783 类型: function 函数名: search_web 参数: {"query": "北京最新新闻"} 解析后的参数: { "query": "北京最新新闻" } 工具调用 4: ID: chatcmpl-tool-f109a0eff4364a9581276fae7678984f 类型: function 函数名: search_web 参数: {"query": "上海最新新闻"} 解析后的参数: { "query": "上海最新新闻" } Token 使用情况: 输入 tokens: 422 输出 tokens: 84 总计 tokens: 506

模型可以理解的字符串的长相

保存这个代码为py文件,然后启动方式在后面:

bash
展开代码
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ 【目的】 在不改 vLLM 包源码的前提下,打印 “OpenAI api_server 真正喂给模型的 prompt 字符串” 以及 “模型生成的原始输出字符串(在 tool parser 解析之前)”。 【用法】 1) 用本脚本替代原来的 api_server 启动方式(参数完全一样): python /mnt/s3fs/code_xd/X_28qwen3vl的tools设计验证/inspect_vllm_prompt.py \ --model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \ --served-model-name gpt \ --host 0.0.0.0 --port 8000 \ --trust-remote-code --max-model-len 8192 \ --gpu-memory-utilization 0.9 --tensor-parallel-size 1 \ --api-key 123 --enable-auto-tool-choice --tool-call-parser hermes 2) 然后你再像平时一样请求 /v1/chat/completions 3) 服务端 stdout 会出现两段打印: - 【VLLM_PROMPT】...(最终 prompt) - 【VLLM_RAW_MODEL_OUTPUT】...(模型原始输出字符串,parser 前) """ from __future__ import annotations import importlib import inspect import importlib.util import json import os import pkgutil import runpy import sys from types import ModuleType from typing import Any, Callable, Optional, Tuple def _print_block(tag: str, payload: str) -> None: sep = "=" * 30 print(f"\n{sep} {tag} {sep}") print(payload) print(f"{sep} END_{tag} {sep}\n") def _patch_transformers_apply_chat_template() -> None: """ 在 vLLM serving 构建 prompt 时,一定会走 tokenizer.apply_chat_template(..., tokenize=False, ...) 我们 monkeypatch 这个方法,把最终返回的 prompt 打出来(这就是“实际喂给模型的字符串”)。 """ try: from transformers import PreTrainedTokenizerBase # type: ignore except Exception as e: print(f"[inspect_vllm_prompt] 无法导入 transformers.PreTrainedTokenizerBase: {e!r}") return if getattr(PreTrainedTokenizerBase.apply_chat_template, "__vllm_inspect_patched__", False): return orig = PreTrainedTokenizerBase.apply_chat_template def wrapped(self, conversation, *args, **kwargs): # type: ignore[no-untyped-def] out = orig(self, conversation, *args, **kwargs) # 只在 tokenize=False 且返回 str 的情况下打印(就是最终 prompt 字符串) try: tokenize = kwargs.get("tokenize", None) if tokenize is False and isinstance(out, str): conv_json = None try: conv_json = json.dumps(conversation, ensure_ascii=False, indent=2) except Exception: conv_json = str(conversation) _print_block("VLLM_MESSAGES", conv_json) _print_block("VLLM_PROMPT", out) except Exception as e: print(f"[inspect_vllm_prompt] 打印 prompt 失败: {e!r}") return out wrapped.__vllm_inspect_patched__ = True # type: ignore[attr-defined] PreTrainedTokenizerBase.apply_chat_template = wrapped # type: ignore[assignment] print("[inspect_vllm_prompt] 已 monkeypatch transformers.PreTrainedTokenizerBase.apply_chat_template") def _patch_hermes_tool_parser() -> None: """ vLLM 使用 --tool-call-parser hermes 时,会把“模型原始输出字符串”交给 Hermes parser 解析。 我们 monkeypatch Hermes parser 的解析函数,在解析前把 text 打出来。 由于 vLLM 版本/目录结构可能变化,这里用“动态搜索 + 容错”找到 hermes parser 类/函数并打补丁。 """ try: import vllm # type: ignore except Exception as e: print(f"[inspect_vllm_prompt] 无法导入 vllm: {e!r}") return patched = 0 def try_patch_obj(obj: Any) -> int: nonlocal patched # 常见命名:parse / parse_tool_calls / extract_tool_calls for method_name in ("parse_tool_calls", "extract_tool_calls", "parse"): if hasattr(obj, method_name) and callable(getattr(obj, method_name)): fn = getattr(obj, method_name) if getattr(fn, "__vllm_inspect_patched__", False): continue def make_wrapper(_fn): # type: ignore[no-untyped-def] def _wrapped(self, text: str, *args, **kwargs): # type: ignore[no-untyped-def] try: _print_block("VLLM_RAW_MODEL_OUTPUT", text) except Exception: pass return _fn(self, text, *args, **kwargs) _wrapped.__vllm_inspect_patched__ = True # type: ignore[attr-defined] return _wrapped try: setattr(obj, method_name, make_wrapper(fn)) patched += 1 print(f"[inspect_vllm_prompt] 已 monkeypatch {obj.__name__}.{method_name}") # type: ignore[attr-defined] except Exception: # 有些是 C-extension / frozen / property,忽略 pass return patched # 1) 先尝试一些常见模块路径(不同版本可能有差异) common_mods = [ "vllm.entrypoints.openai.tool_parsers", "vllm.entrypoints.openai.tool_parsers.hermes", "vllm.entrypoints.openai.tool_parsers.hermes_parser", "vllm.entrypoints.openai.tool_parsers.hermes_tool_parser", "vllm.entrypoints.openai.tool_parsers.utils", ] for mn in common_mods: try: m = importlib.import_module(mn) for name, val in vars(m).items(): if "hermes" in name.lower() and inspect.isclass(val): try_patch_obj(val) except Exception: pass # 2) 动态遍历 vllm 包下模块名包含 hermes 的,继续尝试 try: for modinfo in pkgutil.walk_packages(vllm.__path__, prefix=vllm.__name__ + "."): # type: ignore[attr-defined] if "hermes" not in modinfo.name.lower(): continue if "tool" not in modinfo.name.lower() and "parser" not in modinfo.name.lower(): continue try: m = importlib.import_module(modinfo.name) except Exception: continue for name, val in vars(m).items(): if inspect.isclass(val) and "hermes" in val.__name__.lower(): try_patch_obj(val) except Exception: pass if patched == 0: print("[inspect_vllm_prompt] 未找到可 patch 的 Hermes parser(但 prompt 仍会打印;raw output 可能打印不到)") def _find_api_server_module_name() -> str: """ 找到 vLLM 的 OpenAI api_server 模块名。 先按常见路径尝试,再 fallback 到 walk_packages 搜索 api_server。 """ candidates = [ "vllm.entrypoints.openai.api_server", "vllm.entrypoints.api_server", "vllm.entrypoints.openai.server", ] for mn in candidates: try: # 不要 import(避免 runpy 警告/潜在副作用),只探测 spec 是否存在 if importlib.util.find_spec(mn) is not None: return mn except Exception: pass import vllm # type: ignore for modinfo in pkgutil.walk_packages(vllm.__path__, prefix=vllm.__name__ + "."): # type: ignore[attr-defined] if modinfo.name.endswith(".api_server") or modinfo.name.endswith("api_server"): try: if importlib.util.find_spec(modinfo.name) is not None: return modinfo.name except Exception: continue raise RuntimeError("找不到 vLLM api_server 模块(请确认 vllm 安装完整)") def _run_api_server_inprocess() -> None: """ vLLM 0.11.0 的 api_server 通常没有暴露 main()。 最稳妥的方式:把模块当作 `python -m ...` 运行,触发其 __main__ 逻辑, 从而走原生 argparse 读取 sys.argv 的路径。 """ mn = _find_api_server_module_name() print(f"[inspect_vllm_prompt] 使用 api_server 模块: {mn}") # 等价于:python -m vllm.entrypoints.openai.api_server <args...> runpy.run_module(mn, run_name="__main__", alter_sys=True) if __name__ == "__main__": # 你要求“不写文件”,这里只打印 stdout os.environ.setdefault("VLLM_LOG_LEVEL", os.environ.get("VLLM_LOG_LEVEL", "INFO")) # 关键提示:你必须把原来 api_server 的参数(至少 --model/--port/--host/...)原样带上。 # 否则 vLLM 会使用默认模型(通常是 HuggingFace 上的某个 repo),从而触发联网下载配置。 if not any(a.startswith("--model") for a in sys.argv[1:]): print("[inspect_vllm_prompt] 你当前未传入 --model=...,api_server 将使用默认模型,可能会去 huggingface.co 拉取配置。") print("[inspect_vllm_prompt] 当前 sys.argv =") print(" " + " ".join(sys.argv)) # 关键:先 patch,再 import/启动 api_server _patch_transformers_apply_chat_template() _patch_hermes_tool_parser() _run_api_server_inprocess()

启动方式:

bash
展开代码
python /mnt/s3fs/code_xd/X_28qwen3vl的tools设计验证/inspect_vllm_prompt.py \ --model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \ --served-model-name gpt \ --host 0.0.0.0 --port 8000 \ --trust-remote-code \ --max-model-len 8192 \ --gpu-memory-utilization 0.9 \ --tensor-parallel-size 1 \ --api-key 123 \ --enable-auto-tool-choice \ --tool-call-parser hermes

VLM收到JSON后解析为模型可以理解的字符串】

VLLM_PROMPT

bash
展开代码
<|im_start|>system # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {"type": "function", "function": {"name": "get_weather", "description": "获取指定城市的天气信息", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "城市名称,例如:北京、上海"}, "date": {"type": "string", "description": "日期,格式:YYYY-MM-DD,默认为今天"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "温度单位,摄氏度或华氏度"}}, "required": ["city"]}}} {"type": "function", "function": {"name": "search_web", "description": "在互联网上搜索信息", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "搜索关键词"}, "max_results": {"type": "integer", "description": "最大返回结果数量", "default": 5}}, "required": ["query"]}}} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call><|im_end|> <|im_start|>user 请查询北京今天的天气情况<|im_end|> <|im_start|>assistant

模型回复:

bash
展开代码
<tool_call> {"name": "get_weather", "arguments": {"city": "北京", "date": "2023-11-14", "unit": "celsius"}} </tool_call>

多工具的时候,模型回复:

bash
展开代码
<tool_call> {"name": "get_weather", "arguments": {"city": "北京"}} </tool_call> <tool_call> {"name": "get_weather", "arguments": {"city": "上海"}} </tool_call> <tool_call> {"name": "search_web", "arguments": {"query": "北京最新新闻"}} </tool_call> <tool_call> {"name": "search_web", "arguments": {"query": "上海最新新闻"}} </tool_call>
如果对你有用的话,可以打赏哦
打赏
ali pay
wechat pay

本文作者:Dong

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC。本作品采用《知识共享署名-非商业性使用 4.0 国际许可协议》进行许可。您可以在非商业用途下自由转载和修改,但必须注明出处并提供原作者链接。 许可协议。转载请注明出处!