Skip to main content

Context Query Relevance

Context Query Relevance is a critical aspect of Language Model (LM) evaluation, especially in retrieval-augmented generation (RAG) pipelines.
It measures how closely each retrieved context document aligns with a given user query. High context query relevance ensures that the most useful documents are prioritized for the LM’s reasoning, improving accuracy, reliability, and efficiency.

Evaluating context relevance can be challenging due to subjectiveness and inconsistencies. AIMon provides a purpose-built, context query relevance evaluator to improve the accuracy and consistency of retrieval in LLM-based applications.

Challenges with Traditional LLM Evaluation Methods

Traditional methods of evaluating context relevance in LLM-based applications have several limitations:

  • Variance and inconsistency in scoring results.
  • High subjectiveness in human evaluations.
  • Cost inefficiency of relying solely on large, general-purpose LLMs.

Read more about pros and cons of LLM Judges here.

AIMon's Approach to Context Query Relevance

AIMon’s context_query_relevance metric uses a custom built relevance grader that evaluates each context document against explicit scoring rules. The custom built grader runs at low latency, performs consistent scoring and is much cheaper than an off-the-shelf LLM. These rules ensure that only the most relevant, specific, and useful passages are prioritized for downstream use. The grader checks for:

  • Presence of key information needed to answer the query.
  • Strong topical alignment with the query intent.
  • Avoidance of generic background, tangents, or keyword-only mentions.
  • Absence of contradictory, vague, or off-topic information.

Each document is scored individually, and an overall score is computed as the average of all individual document scores. This makes it easy to assess both per-document relevance and the overall quality of retrieved context.

The task_definition parameter is optional and can be used to guide evaluation according to a specific domain or use case (e.g., summarization, QA, fact-checking, or RAG).

Task Definition

When provided, task_definition allows you to describe the intended use case or domain focus for the relevance evaluation. This helps the grader align document scoring with your specific objectives, further improving retrieval quality and application performance.

Examples

Pre-requisites

Before running, ensure that you have an AIMon API key. Refer to the Quickstart guide for more information.

Code Examples

API Request & Response Example

[
{
"context": [
"Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
"Cooking recipes for Italian pasta dishes including carbonara, bolognese, and pesto.",
"Machine learning applications in healthcare include diagnostic imaging, drug discovery, and patient outcome prediction.",
"Machine learning algorithms are computational methods that can learn patterns from data without being explicitly programmed."
],
"user_query": "Tell me about machine learning algorithms",
"task_definition": "Evaluate the relevance of each context document to the user query about machine learning algorithms.",
"config": {
"context_query_relevance": {
"detector_name": "default",
"explain": true
}
},
"publish": false,
"async_mode": false,
"application_name": "context_query_relevance_test",
"model_name": "context_query_relevance"
}
]

Code Examples

# Synchronous example

import os
from aimon import Client
import json

# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")

# Construct payload
payload = [{
"user_query": "Tell me about machine learning algorithms",
"context": [
"Machine learning is a subset of AI that enables systems to learn from data.",
"Cooking recipes for Italian pasta dishes including carbonara and pesto.",
"Machine learning algorithms are computational methods that learn patterns from data."
],
"task_definition": "Evaluate how relevant each context document is to the user's query about machine learning algorithms.",
"config": {
"context_query_relevance": {
"detector_name": "default",
"explain": True
}
},
"publish": False
}]

# Call sync detect
response = client.inference.detect(body=payload)

# Print result
print(json.dumps(response[0].context_query_relevance, indent=2))