Context Query Relevance
Context Query Relevance is a critical aspect of Language Model (LM) evaluation, especially in retrieval-augmented generation (RAG) pipelines.
It measures how closely each retrieved context document aligns with a given user query. High context query relevance ensures that the most useful documents are prioritized for the LM’s reasoning, improving accuracy, reliability, and efficiency.
Evaluating context relevance can be challenging due to subjectiveness and inconsistencies. AIMon provides a purpose-built, context query relevance evaluator to improve the accuracy and consistency of retrieval in LLM-based applications.
Challenges with Traditional LLM Evaluation Methods
Traditional methods of evaluating context relevance in LLM-based applications have several limitations:
- Variance and inconsistency in scoring results.
- High subjectiveness in human evaluations.
- Cost inefficiency of relying solely on large, general-purpose LLMs.
Read more about pros and cons of LLM Judges here.
AIMon's Approach to Context Query Relevance
AIMon’s context_query_relevance
metric uses a custom built relevance grader that evaluates each context document against explicit scoring rules.
The custom built grader runs at low latency, performs consistent scoring and is much cheaper than an off-the-shelf LLM.
These rules ensure that only the most relevant, specific, and useful passages are prioritized for downstream use. The grader checks for:
- Presence of key information needed to answer the query.
- Strong topical alignment with the query intent.
- Avoidance of generic background, tangents, or keyword-only mentions.
- Absence of contradictory, vague, or off-topic information.
Each document is scored individually, and an overall score is computed as the average of all individual document scores. This makes it easy to assess both per-document relevance and the overall quality of retrieved context.
The task_definition
parameter is optional and can be used to guide evaluation according to a specific domain or use case (e.g., summarization, QA, fact-checking, or RAG).
Task Definition
When provided, task_definition
allows you to describe the intended use case or domain focus for the relevance evaluation. This helps the grader align document scoring with your specific objectives, further improving retrieval quality and application performance.
Examples
Pre-requisites
Before running, ensure that you have an AIMon API key. Refer to the Quickstart guide for more information.
Code Examples
API Request & Response Example
- Request
- Response
[
{
"context": [
"Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
"Cooking recipes for Italian pasta dishes including carbonara, bolognese, and pesto.",
"Machine learning applications in healthcare include diagnostic imaging, drug discovery, and patient outcome prediction.",
"Machine learning algorithms are computational methods that can learn patterns from data without being explicitly programmed."
],
"user_query": "Tell me about machine learning algorithms",
"task_definition": "Evaluate the relevance of each context document to the user query about machine learning algorithms.",
"config": {
"context_query_relevance": {
"detector_name": "default",
"explain": true
}
},
"publish": false,
"async_mode": false,
"application_name": "context_query_relevance_test",
"model_name": "context_query_relevance"
}
]
[
{
"context_query_relevance": {
"document_results": [
{
"document_index": 0,
"document_text": "Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
"raw_response": {
"extractions": [],
"instructions_list": [
{
"explanation": "The selected passage directly explains machine learning, e.g., 'Machine learning is a subset of artificial intelligence', which contains key info.",
"follow_probability": 0.9579,
"instruction": "Do not select context passages that fail to contain key information needed to answer the user query.",
"label": true
},
{
"explanation": "The context is highly relevant, focusing solely on machine learning algorithms without extraneous topics.",
"follow_probability": 0.9797,
"instruction": "Do not include context that is topically unrelated or only vaguely connected to the query intent.",
"label": true
},
{
"explanation": "The passage provides a direct fact ('enables computers to learn and improve') rather than generic background.",
"follow_probability": 0.9903,
"instruction": "Do not prioritize generic background information over directly relevant facts that align with the query.",
"label": true
},
{
"explanation": "The document addresses the core subject by explaining machine learning algorithms, not just mentioning terms.",
"follow_probability": 0.9325,
"instruction": "Do not select documents that merely mention terms from the query without addressing its core subject.",
"label": true
},
{
"explanation": "The response provides a concise definition ('Machine learning is a subset...') without extraneous tangents.",
"follow_probability": 0.9669,
"instruction": "Do not include context that introduces tangents or distracts from the main focus of the query.",
"label": true
},
{
"explanation": "The provided context directly addresses machine learning algorithms and does not contradict the query.",
"follow_probability": 0.9972,
"instruction": "Do not include documents that contradict or misrepresent the key question asked by the user.",
"label": true
},
{
"explanation": "The context is specific and clearly supports the query, offering a precise definition of machine learning.",
"follow_probability": 0.982,
"instruction": "Do not choose context that lacks specificity or fails to support the user's query in a clear and meaningful way.",
"label": true
}
],
"score": 1.0
},
"score": 1.0
},
{
"document_index": 1,
"document_text": "Cooking recipes for Italian pasta dishes including carbonara, bolognese, and pesto.",
"raw_response": {
"extractions": [],
"instructions_list": [
{
"explanation": "The selected context 'Cooking recipes for Italian pasta dishes' contains no key information on machine learning algorithms.",
"follow_probability": 0.0067,
"instruction": "Do not select context passages that fail to contain key information needed to answer the user query.",
"label": false
},
{
"explanation": "The context is entirely topically unrelated, focusing solely on cooking recipes instead of machine learning.",
"follow_probability": 0.5,
"instruction": "Do not include context that is topically unrelated or only vaguely connected to the query intent.",
"label": false
},
{
"explanation": "The response prioritizes irrelevant background (cooking recipes) over any directly relevant facts about machine learning.",
"follow_probability": 0.3208,
"instruction": "Do not prioritize generic background information over directly relevant facts that align with the query.",
"label": false
},
{
"explanation": "The document only mentions terms like 'pasta dishes' without addressing the core subject of machine learning algorithms.",
"follow_probability": 0.0851,
"instruction": "Do not select documents that merely mention terms from the query without addressing its core subject.",
"label": false
},
{
"explanation": "The response includes irrelevant context ('Cooking recipes for Italian pasta dishes') which introduces a tangent.",
"follow_probability": 0.1067,
"instruction": "Do not include context that introduces tangents or distracts from the main focus of the query.",
"label": false
},
{
"explanation": "The provided document does not contradict the user query; it simply misrepresents the topic.",
"follow_probability": 0.6225,
"instruction": "Do not include documents that contradict or misrepresent the key question asked by the user.",
"label": true
},
{
"explanation": "The context document lacks specificity regarding machine learning algorithms and fails to support the query.",
"follow_probability": 0.0474,
"instruction": "Do not choose context that lacks specificity or fails to support the user's query in a clear and meaningful way.",
"label": false
}
],
"score": 0.2857142857142857
},
"score": 0.2857142857142857
},
{
"document_index": 2,
"document_text": "Machine learning applications in healthcare include diagnostic imaging, drug discovery, and patient outcome prediction.",
"raw_response": {
"extractions": [],
"instructions_list": [
{
"explanation": "The selected passage discusses machine learning applications, which directly relates to the query about algorithms.",
"follow_probability": 0.6225,
"instruction": "Do not select context passages that fail to contain key information needed to answer the user query.",
"label": true
},
{
"explanation": "The context is highly relevant, focusing on machine learning rather than being topically unrelated.",
"follow_probability": 0.7773,
"instruction": "Do not include context that is topically unrelated or only vaguely connected to the query intent.",
"label": true
},
{
"explanation": "The passage provides specific examples (diagnostic imaging, drug discovery) instead of generic background info.",
"follow_probability": 0.9669,
"instruction": "Do not prioritize generic background information over directly relevant facts that align with the query.",
"label": true
},
{
"explanation": "The document addresses machine learning algorithms by mentioning concrete applications, not just mentioning terms.",
"follow_probability": 0.6514,
"instruction": "Do not select documents that merely mention terms from the query without addressing its core subject.",
"label": true
},
{
"explanation": "The response provides a single context document without extra tangential details, e.g., 'Machine learning applications in healthcare'.",
"follow_probability": 0.8355,
"instruction": "Do not include context that introduces tangents or distracts from the main focus of the query.",
"label": true
},
{
"explanation": "The provided document does not contradict the query; it directly relates to machine learning algorithms.",
"follow_probability": 0.9914,
"instruction": "Do not include documents that contradict or misrepresent the key question asked by the user.",
"label": true
},
{
"explanation": "The document is specific and clearly supports the query, mentioning concrete applications like 'diagnostic imaging'.",
"follow_probability": 0.852,
"instruction": "Do not choose context that lacks specificity or fails to support the user's query in a clear and meaningful way.",
"label": true
}
],
"score": 1.0
},
"score": 1.0
},
{
"document_index": 3,
"document_text": "Machine learning algorithms are computational methods that can learn patterns from data without being explicitly programmed.",
"raw_response": {
"extractions": [],
"instructions_list": [
{
"explanation": "The selected passage directly explains machine learning algorithms ('computational methods that can learn patterns from data') which meets the query.",
"follow_probability": 0.9924,
"instruction": "Do not select context passages that fail to contain key information needed to answer the user query.",
"label": true
},
{
"explanation": "The context is highly relevant and not topically unrelated, as it precisely addresses machine learning algorithms.",
"follow_probability": 0.9975,
"instruction": "Do not include context that is topically unrelated or only vaguely connected to the query intent.",
"label": true
},
{
"explanation": "The passage provides a direct fact ('learn patterns from data') rather than generic background, aligning with the query.",
"follow_probability": 0.9964,
"instruction": "Do not prioritize generic background information over directly relevant facts that align with the query.",
"label": true
},
{
"explanation": "The document clearly addresses the core subject by explaining machine learning algorithms, not just mentioning terms.",
"follow_probability": 0.9914,
"instruction": "Do not select documents that merely mention terms from the query without addressing its core subject.",
"label": true
},
{
"explanation": "The response provides a direct answer ('Machine learning algorithms are computational methods...') without introducing tangential context.",
"follow_probability": 0.9797,
"instruction": "Do not include context that introduces tangents or distracts from the main focus of the query.",
"label": true
},
{
"explanation": "The provided context document directly addresses the query and does not contradict or misrepresent the key question.",
"follow_probability": 0.9981,
"instruction": "Do not include documents that contradict or misrepresent the key question asked by the user.",
"label": true
},
{
"explanation": "The context document is specific and clearly supports the user's query about machine learning algorithms.",
"follow_probability": 0.989,
"instruction": "Do not choose context that lacks specificity or fails to support the user's query in a clear and meaningful way.",
"label": true
}
],
"score": 1.0
},
"score": 1.0
}
],
"evaluated_documents": 4,
"individual_scores": [
1.0,
0.2857142857142857,
1.0,
1.0
],
"num_documents": 4,
"score": 0.8214285714285714
}
}
]
Code Examples
- Python (Sync)
- Python (Async)
- Python (Decorator)
- TypeScript
# Synchronous example
import os
from aimon import Client
import json
# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")
# Construct payload
payload = [{
"user_query": "Tell me about machine learning algorithms",
"context": [
"Machine learning is a subset of AI that enables systems to learn from data.",
"Cooking recipes for Italian pasta dishes including carbonara and pesto.",
"Machine learning algorithms are computational methods that learn patterns from data."
],
"task_definition": "Evaluate how relevant each context document is to the user's query about machine learning algorithms.",
"config": {
"context_query_relevance": {
"detector_name": "default",
"explain": True
}
},
"publish": False
}]
# Call sync detect
response = client.inference.detect(body=payload)
# Print result
print(json.dumps(response[0].context_query_relevance, indent=2))
# Aynchronous example
# Imports and environment
import os
import json
from aimon import AsyncClient
# Read the AIMon API key from environment
aimon_api_key = os.environ["AIMON_API_KEY"]
# Payload for context query relevance
aimon_payload = {
"user_query": "Tell me about machine learning algorithms",
"context": [
"Machine learning is a subset of AI that enables systems to learn from data.",
"Cooking recipes for Italian pasta dishes including carbonara and pesto.",
"Machine learning algorithms are computational methods that learn patterns from data."
],
"task_definition": "Evaluate how relevant each context document is to the user's query about machine learning algorithms.",
"config": {
"context_query_relevance": {
"detector_name": "default",
"explain": True
}
},
"publish": True,
"async_mode": True,
"application_name": "async_metric_example",
"model_name": "async_metric_example"
}
data_to_send = [aimon_payload]
# Async call to AIMon
async def call_aimon():
async with AsyncClient(auth_header=f"Bearer {aimon_api_key}") as aimon:
resp = await aimon.inference.detect(body=data_to_send)
return resp
# Await and confirm
resp = await call_aimon()
print(json.dumps(resp, indent=2))
print("View results at: https://www.app.aimon.ai/llmapps?source=sidebar&stage=production")
import os
from aimon import Detect
detect = Detect(
values_returned=["context", "user_query", "task_definition"],
config={
"context_query_relevance": {
"detector_name": "default",
"explain": True
}
},
api_key=os.getenv("AIMON_API_KEY"),
application_name="application_name",
model_name="model_name"
)
@detect
def context_query_relevance_test(context, user_query, task_definition):
return context, user_query, task_definition
context, user_query, task_definition, aimon_result = context_query_relevance_test(
[
"Machine learning is a subset of AI that enables systems to learn from data.",
"Cooking recipes for Italian pasta dishes including carbonara and pesto.",
"Machine learning algorithms are computational methods that learn patterns from data."
],
"Tell me about machine learning algorithms",
"Evaluate how relevant each context document is to the user's query about machine learning algorithms."
)
print(aimon_result)
import Client from "aimon";
import dotenv from "dotenv";
dotenv.config();
const aimon = new Client({
authHeader: `Bearer ${process.env.AIMON_API_KEY}`,
});
const run = async () => {
const response = await aimon.detect({
context: [
"Machine learning is a subset of AI that enables systems to learn from data.",
"Cooking recipes for Italian pasta dishes including carbonara and pesto.",
"Machine learning algorithms are computational methods that learn patterns from data."
],
userQuery: "Tell me about machine learning algorithms",
taskDefinition: "Evaluate how relevant each context document is to the user's query about machine learning algorithms.",
config: {
context_query_relevance: {
detector_name: "default",
explain: true
},
},
});
console.log("AIMon response:", JSON.stringify(response, null, 2));
};
run();