Skip to main content

Output Relevance

The Output Relevance metric evaluates how well a generated response aligns with the core intent of a user's query. It identifies responses that may include unrelated or tangential information, miss the main point, or misinterpret the query. This is particularly useful in systems where precision and intent matching are critical; such as search engines, chatbots, and enterprise assistants.

Response Format

The API returns a list of detection results, each containing an output_relevance object with these key fields:

  • score (float, 0.0 – 1.0):
    Represents how relevant the response is to the user query:

    • 0.0–0.2: Poor alignment with intent.
    • 0.2–0.7: Partial relevance; contains some irrelevant or misunderstood content.
    • 0.7–1.0: Strongly aligned; focused and relevant.
  • instructions_list (array):
    A list of individual guideline evaluations:

    • instruction: A rule representing one aspect of output relevance (e.g., "Response should directly address the core intent").
    • label: Indicates whether the response followed the instruction (true) or violated it (false).
    • follow_probability: A confidence score indicating the likelihood the instruction was followed.
    • explanation: A natural-language rationale explaining why the instruction was marked true or false.

Note: The score is an overall judgment, but the instruction-level breakdown provides insight into specific strengths or issues in the response.

[
{
"context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"output_relevance": {
"detector_name": "default"
}
}
}
]

Code Example

The below example demonstrates how to implement the output relevance metric in a synchronous manner.

from aimon import Detect
import os

# This is a synchronous example
# Use async=True to use it asynchronously
# Use publish=True to publish to the AIMon UI

detect = Detect(
values_returned=['context', 'generated_text'],
config={"output_relevance": {"detector_name": "default"}},
publish=True,
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_awesome_llm_app",
model_name="my_awesome_llm_model"
)

@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
But I often include too much information.
The query you passed is: {query}.
The context you passed is: {context}.'''
generated_text = my_llm_model(context, query)
return context, generated_text

context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")

print(aimon_res)