Tool Call Check
The Tool Call Check metric evaluates whether a model’s generated output accurately reflects the usage and results of tools such as APIs or function calls. This metric is designed for tool-using LLM agents and ensures that the final response honors what was actually executed in the tool trace.
This evaluation is critical in tool-augmented systems where the model must not fabricate tool-based information or ignore tool call results. It helps verify that tool usage aligns with the user query and that results are reflected faithfully in the output.
When to Use
Use this metric when:
- You want to check whether the tool selected matches the user's intent.
- Your system relies on tools (functions, APIs, plugins) to respond to user queries.
- You need to ensure the model includes accurate tool results in its final response.
- You want to prevent hallucinated tool usage or misreporting of tool execution status.
Labels Evaluated
The API uses a set of instruction checks to evaluate the alignment of tool usage and tool output. These include:
- Tool relevance: Ensures the tool used is appropriate for the user query.
- Result fidelity: Checks that key results from successful tool calls are reflected in the output.
- Status consistency: Ensures that failed or pending tools are not treated as successful in the response.
- No hallucination: Verifies that the model does not refer to tools that were never called.
Required Fields
To evaluate this metric, the following fields must be present:
user_query
: The original user request that triggered the tool use.generated_text
: The model’s final response to the user.tool_trace
: An array representing the trace of tool calls made during the interaction. Each entry should include tool name, execution status, timestamps, and optionally payloads and retries.
Response Format
The response contains a tool_call_check
object with the following structure:
-
score (
float
, 0.0 to 1.0):A final score that reflects tool alignment. It is equal to the lowest
follow_probability
across all evaluated instructions. -
instructions_list (
array
):A list of individual checks applied to the generated output. Each entry contains:
instruction
: The behavioral rule being evaluated.label
: Whether the output followed this rule (true
) or violated it (false
).follow_probability
: Confidence that the instruction was followed.explanation
: Human-readable explanation for why the rule was or was not followed.
Schema for tool_trace
{
"tool_trace": [
{
"order": 1,
"tool_name": "tool.name",
"tool_version": "v1.0.0",
"start_time": "2025-01-09T17:23:46Z",
"end_time": "2025-01-09T17:23:47Z",
"status": "success",
"error": null,
"request_payload": {},
"response_payload": {},
"retries": 0
}
]
}
API Request & Response Example
- Request
- Response
[
{
"user_query": "What is the current weather in New York?",
"generated_text": "I'll check the current weather in New York for you. Let me use a weather tool to get the latest information.\n\nThe current weather in New York is 72°F with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph.",
"tool_trace": [
{
"order": 1,
"tool_name": "weather_api",
"status": "success",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"retries": 0,
"request_payload": {
"location": "New York, NY",
"units": "imperial"
},
"response_payload": {
"temperature": 72,
"condition": "partly cloudy",
"humidity": 65,
"wind_speed": 8,
"wind_direction": "west"
}
}
],
"config": {
"tool_call_check": {
"detector_name": "default",
"explain": true
}
}
}
]
[
{
"tool_call_check": {
"instructions_list": [
{
"explanation": "The response uses a weather tool ('weather tool') which matches the intent of querying New York weather.",
"follow_probability": 0.9924,
"instruction": "Do not use a tool whose purpose does not match the intent of the user query.",
"label": true
},
{
"explanation": "It includes all relevant results ('72°F', 'partly cloudy', '65%', 'light breeze from the west at 8 mph').",
"follow_probability": 0.9985,
"instruction": "Do not generate an output that omits or ignores relevant results from successful tool calls.",
"label": true
},
{
"explanation": "The tool call is treated as successful since the output reflects the provided successful tool trace.",
"follow_probability": 0.974,
"instruction": "Do not treat tool calls as successful in the output if the tool_trace shows a failure status such as 'error', 'timeout', or 'cancelled'.",
"label": true
},
{
"explanation": "The output mentions tool-based facts ('72°F', 'partly cloudy', etc.) which aligns with the actual tool call.",
"follow_probability": 0.7982,
"instruction": "Do not mention tool-based facts or refer to tool results in the output if no such tool was actually called.",
"label": true
}
],
"score": 0.7982
}
}
]
Code Examples
- Python (Sync)
- Python (Async)
- Python (Decorator)
- TypeScript
# Synchronous example
import os
from aimon import Client
import json
# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")
# Construct payload
payload = [
{
"user_query": "What is the current weather in New York?",
"generated_text": "I'll check the current weather in New York for you. Let me use a weather tool to get the latest information.\n\nThe current weather in New York is 72°F with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph.",
"tool_trace": [
{
"order": 1,
"tool_name": "weather_api",
"status": "success",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"retries": 0,
"request_payload": {
"location": "New York, NY",
"units": "imperial"
},
"response_payload": {
"temperature": 72,
"condition": "partly cloudy",
"humidity": 65,
"wind_speed": 8,
"wind_direction": "west"
}
}
],
"config": {
"tool_call_check": {
"detector_name": "default",
"explain": True
}
}
}
]
# Call sync detect
response = client.inference.detect(body=payload)
# Print result
print(json.dumps(response[0].tool_call_check, indent=2))
# Aynchronous example
import os
import json
from aimon import AsyncClient
# Read the AIMon API key from environment
aimon_api_key = os.environ["AIMON_API_KEY"]
# Construct async payload
aimon_payload = {
"user_query": "What is the current weather in New York?",
"generated_text": "I'll check the current weather in New York for you. Let me use a weather tool to get the latest information.\n\nThe current weather in New York is 72°F with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph.",
"tool_trace": [
{
"order": 1,
"tool_name": "weather_api",
"status": "success",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"retries": 0,
"request_payload": {
"location": "New York, NY",
"units": "imperial"
},
"response_payload": {
"temperature": 72,
"condition": "partly cloudy",
"humidity": 65,
"wind_speed": 8,
"wind_direction": "west"
}
}
],
"config": {
"tool_call_check": {
"detector_name": "default",
"explain": True
}
},
"publish": True,
"async_mode": True,
"application_name": "async_tool_check_example",
"model_name": "async_tool_check_example"
}
data_to_send = [aimon_payload]
# Async call to AIMon
async def call_aimon():
async with AsyncClient(auth_header=f"Bearer {aimon_api_key}") as aimon:
resp = await aimon.inference.detect(body=data_to_send)
return resp
# Await and confirm
resp = await call_aimon()
print(json.dumps(resp, indent=2))
print("View results at: https://www.app.aimon.ai/llmapps?source=sidebar&stage=production")
import os
from aimon import Detect
import json
# Initialize the Detect decorator
detect = Detect(
values_returned=["user_query", "generated_text", "tool_trace"],
config={"tool_call_check": {"detector_name": "default", "explain": True}},
api_key=os.getenv("AIMON_API_KEY"),
application_name="tool_check_app",
model_name="tool_check_model"
)
# Define a function that returns the values needed for the metric
@detect
def check_tool_call(user_query, generated_text, tool_trace):
return user_query, generated_text, tool_trace
# Prepare inputs
user_query = "What is the current weather in New York?"
generated_text = (
"I'll check the current weather in New York for you. Let me use a weather tool to get the latest information.\n\n"
"The current weather in New York is 72°F with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph."
)
tool_trace = [
{
"order": 1,
"tool_name": "weather_api",
"status": "success",
"start_time": "2024-01-15T10:30:00Z",
"end_time": "2024-01-15T10:30:02Z",
"retries": 0,
"request_payload": {
"location": "New York, NY",
"units": "imperial"
},
"response_payload": {
"temperature": 72,
"condition": "partly cloudy",
"humidity": 65,
"wind_speed": 8,
"wind_direction": "west"
}
}
]
# Run the function and retrieve the result
_, _, _, aimon_result = check_tool_call(user_query, generated_text, tool_trace)
# Print the metric result
print(json.dumps(aimon_result.detect_response.tool_call_check, indent=2))
import Client from "aimon";
import dotenv from "dotenv";
dotenv.config();
const aimon = new Client({
authHeader: `Bearer ${process.env.AIMON_API_KEY}`,
});
const run = async () => {
const response = await aimon.detect({
userQuery: "What is the current weather in New York?",
generatedText:
"I'll check the current weather in New York for you. Let me use a weather tool to get the latest information.\n\nThe current weather in New York is 72°F with partly cloudy skies. The humidity is 65% and there's a light breeze from the west at 8 mph.",
toolTrace: [
{
order: 1,
tool_name: "weather_api",
status: "success",
start_time: "2024-01-15T10:30:00Z",
end_time: "2024-01-15T10:30:02Z",
retries: 0,
request_payload: {
location: "New York, NY",
units: "imperial",
},
response_payload: {
temperature: 72,
condition: "partly cloudy",
humidity: 65,
wind_speed: 8,
wind_direction: "west",
},
},
],
config: {
tool_call_check: {
detector_name: "default",
explain: true,
},
},
});
console.log("Tool Call Check result:", JSON.stringify(response, null, 2));
};
run();