Instruction Adherence
The Instruction Adherence metric evaluates whether a model’s response follows a set of provided or implied instructions. It is especially useful when testing prompt engineering patterns, instruction-tuned models, or fine-grained control over generated output.
This metric analyzes both explicit instructions (manually provided) and implicit instructions (automatically extracted from the user_query
, if enabled). For each instruction, it returns:
- Whether it was followed (
label
) - A confidence score (
follow_probability
) - An optional explanation (if enabled)
In addition, the metric computes an overall instruction adherence score
.
Configuration Options
To enable or customize Instruction Adherence evaluation, pass the following fields inside the instruction_adherence
object in the config
:
-
detector_name
(string, required):
Must be"default"
. -
explain
(boolean or"negatives_only"
):
Controls whether textual explanations are included for each instruction.false
(default): No explanations.true
: Explanations for all instructions."negatives_only"
: Only for instructions that were not followed (label: false
).
-
extract_from_system
(boolean):
Iftrue
, the metric will try to extract implicit instructions from theuser_query
in addition to any explicitly provided instructions.
API Request & Response Example
- Request
- Response
[
{
"context":"Baking Class 101",
"user_query": "How do I make chocolate chip cookies? Write up to 200 words.",
"generated_text": "To make chocolate chip cookies, you need flour, sugar, butter, eggs, chocolate chips, and vanilla extract. Preheat oven to 375°F. Cream butter and sugar, add eggs and vanilla. Mix dry ingredients separately, then combine. Fold in chocolate chips. Drop spoonfuls onto baking sheet and bake for 9-11 minutes.",
"instructions":["Answer in English only"],
"config": {
"instruction_adherence": {
"detector_name": "default",
"explain": true,
"extract_from_system": true
}
},
"application_name":"ia_results",
"model_name":"ia_results",
"publish": true
}
]
[
{
"avg_context_doc_length": 16.0,
"instruction_adherence": {
"extractions": [
{
"explanation": "The response is 68 words long, well under the 200-word limit.",
"follow_probability": 0.7549,
"instruction": "Write up to 200 words.",
"label": true
}
],
"instructions_list": [
{
"explanation": "The answer is entirely in English without any non-English elements.",
"follow_probability": 0.9987,
"instruction": "Answer in English only",
"label": true
}
],
"score": 1.0
}
}
]
Response Fields Explained
-
instruction_adherence: Main field containing the adherence evaluation results.
-
score: A float between
0.0
and1.0
indicating the fraction of instructions (explicit + extracted) that were followed (label: true
). For example, a score of1.0
means all instructions were followed. -
extractions: A list of instructions automatically extracted from the
user_query
, ifextract_from_system
istrue
.- instruction: The extracted instruction as interpreted from the prompt.
- label:
true
if the response adhered to the instruction. - follow_probability: A confidence score between
0.0
and1.0
.
Higher values indicate greater confidence that the instruction was followed. - explanation (optional): Explanation for the instruction, if enabled.
-
instructions_list: List of explicit instructions provided in the request under the
instructions
field.- instruction: The user-supplied instruction.
- label:
true
if the response adhered to the instruction. - follow_probability: A confidence score between
0.0
and1.0
.
Higher values indicate greater confidence that the instruction was followed. - explanation (optional): Explanation for the instruction, if enabled.
-
-
avg_context_doc_length: Average word length of
context
(if provided). If no documents are present, this isNaN
.
Code Exampless
- Python (Sync)
- Python (Async)
- Python (Decorator)
- TypeScript
# Synchronous example
import os
from aimon import Client
import json
# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")
# Construct payload
payload = [{
"user_query": "Say Hello!",
"generated_text": "Bonjour!",
"instructions": ["The response must be in English only."],
"config": {
"instruction_adherence": {
"detector_name": "default",
"explain": True
}
},
"publish": False
}]
# Call sync detect
response = client.inference.detect(body=payload)
# Print result
print(json.dumps(response[0].instruction_adherence, indent=2))
# Aynchronous example
import os
import json
from aimon import AsyncClient
aimon_api_key = os.environ["AIMON_API_KEY"]
aimon_payload = {
"user_query": "Say Hello!",
"generated_text": "Bonjour!",
"instructions": ["The response must be in English only."],
"config": {
"instruction_adherence": {
"detector_name": "default",
"explain": True
}
},
"publish": True,
"async_mode": True,
"application_name": "async_metric_example",
"model_name": "async_metric_example"
}
data_to_send = [aimon_payload]
async def call_aimon():
async with AsyncClient(auth_header=f"Bearer {aimon_api_key}") as aimon:
return await aimon.inference.detect(body=data_to_send)
# Await and confirm
resp = await call_aimon()
print(json.dumps(resp, indent=2))
print("View results at: https://www.app.aimon.ai/llmapps?source=sidebar&stage=production")
from aimon import Detect
import os
import json
# Configure the detector
detect = Detect(
values_returned=['user_query', 'instructions', 'generated_text'],
config={
"instruction_adherence": {
"detector_name": "default",
"explain": True,
"extract_from_system": False,
}
},
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_llm_app_ia_example",
model_name="my_model_v1"
)
# Decorate your LLM application function
@detect
def my_llm_app(query: str, explicit_instructions: list[str]):
# Example: Generate a response based on query and instructions
# Replace with your actual LLM call
response_text = (f"Based on your request '{query}', and considering the instructions provided, here is a poem: A quick brown fox jumps high. So very fast it goes by now.")
# Return values matching the order in 'values_returned'
return query, explicit_instructions, response_text
# Example Usage
user_query = "Write a 10-word poem about a fox."
instructions_to_follow = [
'The poem must not contain the letter "e".',
"The poem must contain exactly 10 words."
]
# Call the decorated function
# The return value includes the original outputs plus the Aimon results
query_out, instructions_out, response_out, aimon_res = my_llm_app(user_query, instructions_to_follow)
import Client from "aimon";
import dotenv from "dotenv";
dotenv.config();
const aimon = new Client({
authHeader: `Bearer ${process.env.AIMON_API_KEY}`,
});
const run = async () => {
const response = await aimon.detect({
userQuery: "Summarize the plot of Romeo and Juliet.",
generatedText:
"Romeo Montague and Juliet Capulet fall in love despite their families' bitter feud. Their passionate romance brings their families together.",
instructions: [
"Mention the names 'Romeo' and 'Juliet'.",
"State the main conflict (feuding families).",
"Mention the tragic ending explicitly (e.g., death).",
],
config: {
instruction_adherence: {
detector_name: "default",
explain: "negatives_only",
extract_from_system: false,
},
},
});
console.log("AIMon response:", JSON.stringify(response, null, 2));
};
run();
Example notebook
https://colab.research.google.com/drive/1fXXqmGBVJeTnoBla_A-30gy8eUouOuOw