Skip to main content

Context Relevance

Context relevance is a critical aspect of Language Model (LM) evaluation, especially in the context of Large Language Models (LLMs) like GPT-4o. The relevance of context data plays a crucial role in determining the accuracy and quality of the responses generated by LLMs. However, evaluating the relevance of context data can be challenging due to the complexity and subjectiveness of the evaluation process. In order to address these challenges, AIMon has developed purpose-built tools that enable users to assess and improve the relevance of context data in LLM-based applications. To address the challenge of subjectiveness in evaluating context relevance, AIMon has developed a customizable context relevance evaluator and a re-ranker model that provide a more efficient and accurate solution for evaluating context relevance.

Challenges with Traditional LLM Evaluation Methods

Traditional methods of evaluating context relevance in LLM-based applications have several limitations, including:

  • Variance and inconsistency in results.
  • Subjectiveness of evaluations.
  • Cost inefficiency of using large off-the-shelf LLMs.

Read more about pros and cons of LLM Judges here.

AIMon's Approach to Context Relevance

AIMon's purpose-built and customizable relevance graders are designed to address the limitations of traditional evaluation methods. These tools enable users to assess the relevance of context data more accurately and efficiently, leading to improved performance and reliability of LLM-based applications. By using AIMon's relevance graders, users can customize the evaluation criteria to suit their specific needs and requirements, resulting in more precise and consistent evaluations.

Also, the customization made during the evaluation can be immediately applied to the improve the re-ranking phase of the LLM application. See more about the re-ranker model here.

Example

Pre-requisites

Before running, ensure that you have an AIMon API key. Refer to the Quickstart guide for more information.

Code Example

Like any other evaluation, the context relevance evaluation can be done using the AIMon SDK. Here is an example of how to evaluate the relevance of context data using the AIMon SDK:

from aimon import Detect 
# Configure the retrieval relevance evaluator here
aimon_config = {
"retrieval_relevance": {"detector_name": "default"}
}

# Setup the AIMon decorator
rr_eval = Detect(
values_returned=["user_query", "context", "generated_text", "task_definition"],
api_key=os.getenv("AIMON_API_KEY"),
config=aimon_config,
async_mode=True,
publish=True,
application_name="summarization_app_jan_20_2025",
model_name="gpt-4o-mini"
)

@rr_eval
def my_llm_app(query, context):
task_def = "Your task is to grade the relevance of context document against a specified user query."
return query, context, "The capital of France is Paris".format(query), task_def

_,_,_,_, aimon_res = my_llm_app("What is the capital of France?", ["The city of Paris is the capital of France", "San Francisco is one of the most iconic cities in the United States."])
if aimon_res.status == 200:
print(f"\U0001F7E2 {aimon_res.detect_response['message']}\n")

# Output:
# 🟢 Data successfully sent to AIMon

Here is an example on how the metrics look on the AIMon UI: retrieval_relevance.png