tools工作流:
部署一个qwen3vl模型:
bash展开代码python -m vllm.entrypoints.openai.api_server \
--model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \
--served-model-name gpt \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code \
--max-model-len 8192 \
--gpu-memory-utilization 0.9 \
--tensor-parallel-size 1 \
--api-key "123" --enable-auto-tool-choice --tool-call-parser hermes
请求数据:
bash展开代码
# 部署的模型服务地址
base_url = "http://100.96.168.186:8000/v1"
# 请求头
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer 123" # 使用部署时设置的 api-key
}
# 构建请求体 - 包含工具定义
request_data = {
"model": "gpt",
"messages": [
{
"role": "user",
"content": "请查询北京今天的天气情况"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "城市名称,例如:北京、上海"
},
"date": {
"type": "string",
"description": "日期,格式:YYYY-MM-DD,默认为今天"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "温度单位,摄氏度或华氏度"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "在互联网上搜索信息",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"max_results": {
"type": "integer",
"description": "最大返回结果数量",
"default": 5
}
},
"required": ["query"]
}
}
}
],
"tool_choice": "auto", # 自动选择工具
"temperature": 0.7,
"max_tokens": 2048
}
返回数据:
bash展开代码{
"id": "chatcmpl-ac7bc87341614d479f5e5bca93a9df4e",
"object": "chat.completion",
"created": 1765514379,
"model": "gpt",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"refusal": null,
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [
{
"id": "chatcmpl-tool-5f1efff241e04159939d7a8e10312d67",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"北京\", \"date\": \"2023-11-14\", \"unit\": \"celsius\"}"
}
}
],
"reasoning_content": null
},
"logprobs": null,
"finish_reason": "tool_calls",
"stop_reason": null,
"token_ids": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 311,
"total_tokens": 353,
"completion_tokens": 42,
"prompt_tokens_details": null
},
"prompt_logprobs": null,
"prompt_token_ids": null,
"kv_transfer_params": null
}
这个返回其实就是模型理解了含义,输出要调用工具:
bash展开代码 ID: chatcmpl-tool-5f1efff241e04159939d7a8e10312d67
类型: function
函数名: get_weather
参数: {"city": "北京", "date": "2023-11-14", "unit": "celsius"}
解析后的参数: {
"city": "北京",
"date": "2023-11-14",
"unit": "celsius"
}
bash展开代码================================================================================
测试 2: 多个工具调用(测试是否支持一次调用多个工具)
================================================================================
================================================================================
测试多个工具调用...
================================================================================
请求 JSON:
{
"model": "gpt",
"messages": [
{
"role": "user",
"content": "请帮我查询北京和上海的天气,并且搜索一下这两个城市的最新新闻"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "城市名称,例如:北京、上海"
},
"date": {
"type": "string",
"description": "日期,格式:YYYY-MM-DD,默认为今天"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
],
"description": "温度单位,摄氏度或华氏度"
}
},
"required": [
"city"
]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "在互联网上搜索信息",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"max_results": {
"type": "integer",
"description": "最大返回结果数量",
"default": 5
}
},
"required": [
"query"
]
}
}
},
{
"type": "function",
"function": {
"name": "get_time",
"description": "获取当前时间或指定时区的时间",
"parameters": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "时区,例如:Asia/Shanghai, UTC"
},
"format": {
"type": "string",
"enum": [
"iso",
"timestamp",
"readable"
],
"description": "时间格式"
}
},
"required": []
}
}
}
],
"tool_choice": "auto",
"temperature": 0.7,
"max_tokens": 2048
}
================================================================================
模型返回的 JSON:
================================================================================
{
"id": "chatcmpl-2f9ac7a00bd442f5862ff50d49cd5790",
"object": "chat.completion",
"created": 1765514787,
"model": "gpt",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"refusal": null,
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [
{
"id": "chatcmpl-tool-c0f0319735664ccd9fe8a16fa94a3990",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"北京\"}"
}
},
{
"id": "chatcmpl-tool-6c0c2211ffd741b9a6a64548caa5c875",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"上海\"}"
}
},
{
"id": "chatcmpl-tool-43c31954d2f040a3a4c33eff6ecdf783",
"type": "function",
"function": {
"name": "search_web",
"arguments": "{\"query\": \"北京最新新闻\"}"
}
},
{
"id": "chatcmpl-tool-f109a0eff4364a9581276fae7678984f",
"type": "function",
"function": {
"name": "search_web",
"arguments": "{\"query\": \"上海最新新闻\"}"
}
}
],
"reasoning_content": null
},
"logprobs": null,
"finish_reason": "tool_calls",
"stop_reason": null,
"token_ids": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 422,
"total_tokens": 506,
"completion_tokens": 84,
"prompt_tokens_details": null
},
"prompt_logprobs": null,
"prompt_token_ids": null,
"kv_transfer_params": null
}
================================================================================
================================================================================
响应分析:
================================================================================
角色: assistant
内容: None
完成原因: tool_calls
工具调用数量: 4
✅ 成功!模型在一次响应中调用了 4 个工具
工具调用 1:
ID: chatcmpl-tool-c0f0319735664ccd9fe8a16fa94a3990
类型: function
函数名: get_weather
参数: {"city": "北京"}
解析后的参数: {
"city": "北京"
}
工具调用 2:
ID: chatcmpl-tool-6c0c2211ffd741b9a6a64548caa5c875
类型: function
函数名: get_weather
参数: {"city": "上海"}
解析后的参数: {
"city": "上海"
}
工具调用 3:
ID: chatcmpl-tool-43c31954d2f040a3a4c33eff6ecdf783
类型: function
函数名: search_web
参数: {"query": "北京最新新闻"}
解析后的参数: {
"query": "北京最新新闻"
}
工具调用 4:
ID: chatcmpl-tool-f109a0eff4364a9581276fae7678984f
类型: function
函数名: search_web
参数: {"query": "上海最新新闻"}
解析后的参数: {
"query": "上海最新新闻"
}
Token 使用情况:
输入 tokens: 422
输出 tokens: 84
总计 tokens: 506
保存这个代码为py文件,然后启动方式在后面:
bash展开代码#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
【目的】
在不改 vLLM 包源码的前提下,打印 “OpenAI api_server 真正喂给模型的 prompt 字符串”
以及 “模型生成的原始输出字符串(在 tool parser 解析之前)”。
【用法】
1) 用本脚本替代原来的 api_server 启动方式(参数完全一样):
python /mnt/s3fs/code_xd/X_28qwen3vl的tools设计验证/inspect_vllm_prompt.py \
--model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \
--served-model-name gpt \
--host 0.0.0.0 --port 8000 \
--trust-remote-code --max-model-len 8192 \
--gpu-memory-utilization 0.9 --tensor-parallel-size 1 \
--api-key 123 --enable-auto-tool-choice --tool-call-parser hermes
2) 然后你再像平时一样请求 /v1/chat/completions
3) 服务端 stdout 会出现两段打印:
- 【VLLM_PROMPT】...(最终 prompt)
- 【VLLM_RAW_MODEL_OUTPUT】...(模型原始输出字符串,parser 前)
"""
from __future__ import annotations
import importlib
import inspect
import importlib.util
import json
import os
import pkgutil
import runpy
import sys
from types import ModuleType
from typing import Any, Callable, Optional, Tuple
def _print_block(tag: str, payload: str) -> None:
sep = "=" * 30
print(f"\n{sep} {tag} {sep}")
print(payload)
print(f"{sep} END_{tag} {sep}\n")
def _patch_transformers_apply_chat_template() -> None:
"""
在 vLLM serving 构建 prompt 时,一定会走 tokenizer.apply_chat_template(..., tokenize=False, ...)
我们 monkeypatch 这个方法,把最终返回的 prompt 打出来(这就是“实际喂给模型的字符串”)。
"""
try:
from transformers import PreTrainedTokenizerBase # type: ignore
except Exception as e:
print(f"[inspect_vllm_prompt] 无法导入 transformers.PreTrainedTokenizerBase: {e!r}")
return
if getattr(PreTrainedTokenizerBase.apply_chat_template, "__vllm_inspect_patched__", False):
return
orig = PreTrainedTokenizerBase.apply_chat_template
def wrapped(self, conversation, *args, **kwargs): # type: ignore[no-untyped-def]
out = orig(self, conversation, *args, **kwargs)
# 只在 tokenize=False 且返回 str 的情况下打印(就是最终 prompt 字符串)
try:
tokenize = kwargs.get("tokenize", None)
if tokenize is False and isinstance(out, str):
conv_json = None
try:
conv_json = json.dumps(conversation, ensure_ascii=False, indent=2)
except Exception:
conv_json = str(conversation)
_print_block("VLLM_MESSAGES", conv_json)
_print_block("VLLM_PROMPT", out)
except Exception as e:
print(f"[inspect_vllm_prompt] 打印 prompt 失败: {e!r}")
return out
wrapped.__vllm_inspect_patched__ = True # type: ignore[attr-defined]
PreTrainedTokenizerBase.apply_chat_template = wrapped # type: ignore[assignment]
print("[inspect_vllm_prompt] 已 monkeypatch transformers.PreTrainedTokenizerBase.apply_chat_template")
def _patch_hermes_tool_parser() -> None:
"""
vLLM 使用 --tool-call-parser hermes 时,会把“模型原始输出字符串”交给 Hermes parser 解析。
我们 monkeypatch Hermes parser 的解析函数,在解析前把 text 打出来。
由于 vLLM 版本/目录结构可能变化,这里用“动态搜索 + 容错”找到 hermes parser 类/函数并打补丁。
"""
try:
import vllm # type: ignore
except Exception as e:
print(f"[inspect_vllm_prompt] 无法导入 vllm: {e!r}")
return
patched = 0
def try_patch_obj(obj: Any) -> int:
nonlocal patched
# 常见命名:parse / parse_tool_calls / extract_tool_calls
for method_name in ("parse_tool_calls", "extract_tool_calls", "parse"):
if hasattr(obj, method_name) and callable(getattr(obj, method_name)):
fn = getattr(obj, method_name)
if getattr(fn, "__vllm_inspect_patched__", False):
continue
def make_wrapper(_fn): # type: ignore[no-untyped-def]
def _wrapped(self, text: str, *args, **kwargs): # type: ignore[no-untyped-def]
try:
_print_block("VLLM_RAW_MODEL_OUTPUT", text)
except Exception:
pass
return _fn(self, text, *args, **kwargs)
_wrapped.__vllm_inspect_patched__ = True # type: ignore[attr-defined]
return _wrapped
try:
setattr(obj, method_name, make_wrapper(fn))
patched += 1
print(f"[inspect_vllm_prompt] 已 monkeypatch {obj.__name__}.{method_name}") # type: ignore[attr-defined]
except Exception:
# 有些是 C-extension / frozen / property,忽略
pass
return patched
# 1) 先尝试一些常见模块路径(不同版本可能有差异)
common_mods = [
"vllm.entrypoints.openai.tool_parsers",
"vllm.entrypoints.openai.tool_parsers.hermes",
"vllm.entrypoints.openai.tool_parsers.hermes_parser",
"vllm.entrypoints.openai.tool_parsers.hermes_tool_parser",
"vllm.entrypoints.openai.tool_parsers.utils",
]
for mn in common_mods:
try:
m = importlib.import_module(mn)
for name, val in vars(m).items():
if "hermes" in name.lower() and inspect.isclass(val):
try_patch_obj(val)
except Exception:
pass
# 2) 动态遍历 vllm 包下模块名包含 hermes 的,继续尝试
try:
for modinfo in pkgutil.walk_packages(vllm.__path__, prefix=vllm.__name__ + "."): # type: ignore[attr-defined]
if "hermes" not in modinfo.name.lower():
continue
if "tool" not in modinfo.name.lower() and "parser" not in modinfo.name.lower():
continue
try:
m = importlib.import_module(modinfo.name)
except Exception:
continue
for name, val in vars(m).items():
if inspect.isclass(val) and "hermes" in val.__name__.lower():
try_patch_obj(val)
except Exception:
pass
if patched == 0:
print("[inspect_vllm_prompt] 未找到可 patch 的 Hermes parser(但 prompt 仍会打印;raw output 可能打印不到)")
def _find_api_server_module_name() -> str:
"""
找到 vLLM 的 OpenAI api_server 模块名。
先按常见路径尝试,再 fallback 到 walk_packages 搜索 api_server。
"""
candidates = [
"vllm.entrypoints.openai.api_server",
"vllm.entrypoints.api_server",
"vllm.entrypoints.openai.server",
]
for mn in candidates:
try:
# 不要 import(避免 runpy 警告/潜在副作用),只探测 spec 是否存在
if importlib.util.find_spec(mn) is not None:
return mn
except Exception:
pass
import vllm # type: ignore
for modinfo in pkgutil.walk_packages(vllm.__path__, prefix=vllm.__name__ + "."): # type: ignore[attr-defined]
if modinfo.name.endswith(".api_server") or modinfo.name.endswith("api_server"):
try:
if importlib.util.find_spec(modinfo.name) is not None:
return modinfo.name
except Exception:
continue
raise RuntimeError("找不到 vLLM api_server 模块(请确认 vllm 安装完整)")
def _run_api_server_inprocess() -> None:
"""
vLLM 0.11.0 的 api_server 通常没有暴露 main()。
最稳妥的方式:把模块当作 `python -m ...` 运行,触发其 __main__ 逻辑,
从而走原生 argparse 读取 sys.argv 的路径。
"""
mn = _find_api_server_module_name()
print(f"[inspect_vllm_prompt] 使用 api_server 模块: {mn}")
# 等价于:python -m vllm.entrypoints.openai.api_server <args...>
runpy.run_module(mn, run_name="__main__", alter_sys=True)
if __name__ == "__main__":
# 你要求“不写文件”,这里只打印 stdout
os.environ.setdefault("VLLM_LOG_LEVEL", os.environ.get("VLLM_LOG_LEVEL", "INFO"))
# 关键提示:你必须把原来 api_server 的参数(至少 --model/--port/--host/...)原样带上。
# 否则 vLLM 会使用默认模型(通常是 HuggingFace 上的某个 repo),从而触发联网下载配置。
if not any(a.startswith("--model") for a in sys.argv[1:]):
print("[inspect_vllm_prompt] 你当前未传入 --model=...,api_server 将使用默认模型,可能会去 huggingface.co 拉取配置。")
print("[inspect_vllm_prompt] 当前 sys.argv =")
print(" " + " ".join(sys.argv))
# 关键:先 patch,再 import/启动 api_server
_patch_transformers_apply_chat_template()
_patch_hermes_tool_parser()
_run_api_server_inprocess()
启动方式:
bash展开代码python /mnt/s3fs/code_xd/X_28qwen3vl的tools设计验证/inspect_vllm_prompt.py \ --model /mnt/jfs6/model/Qwen3-VL-8B-Instruct \ --served-model-name gpt \ --host 0.0.0.0 --port 8000 \ --trust-remote-code \ --max-model-len 8192 \ --gpu-memory-utilization 0.9 \ --tensor-parallel-size 1 \ --api-key 123 \ --enable-auto-tool-choice \ --tool-call-parser hermes
VLLM_PROMPT
bash展开代码<|im_start|>system
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "获取指定城市的天气信息", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "城市名称,例如:北京、上海"}, "date": {"type": "string", "description": "日期,格式:YYYY-MM-DD,默认为今天"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "温度单位,摄氏度或华氏度"}}, "required": ["city"]}}}
{"type": "function", "function": {"name": "search_web", "description": "在互联网上搜索信息", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "搜索关键词"}, "max_results": {"type": "integer", "description": "最大返回结果数量", "default": 5}}, "required": ["query"]}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
请查询北京今天的天气情况<|im_end|>
<|im_start|>assistant
模型回复:
bash展开代码<tool_call>
{"name": "get_weather", "arguments": {"city": "北京", "date": "2023-11-14", "unit": "celsius"}}
</tool_call>
多工具的时候,模型回复:
bash展开代码<tool_call>
{"name": "get_weather", "arguments": {"city": "北京"}}
</tool_call>
<tool_call>
{"name": "get_weather", "arguments": {"city": "上海"}}
</tool_call>
<tool_call>
{"name": "search_web", "arguments": {"query": "北京最新新闻"}}
</tool_call>
<tool_call>
{"name": "search_web", "arguments": {"query": "上海最新新闻"}}
</tool_call>


本文作者:Dong
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC。本作品采用《知识共享署名-非商业性使用 4.0 国际许可协议》进行许可。您可以在非商业用途下自由转载和修改,但必须注明出处并提供原作者链接。 许可协议。转载请注明出处!