Output Relevance

The Output Relevance metric evaluates how well a generated response aligns with the core intent of a user's query. It identifies responses that may include unrelated or tangential information, miss the main point, or misinterpret the query. This is particularly useful in systems where precision and intent matching are critical; such as search engines, chatbots, and enterprise assistants.

Response Format

The API returns a list of detection results, each containing an output_relevance object with these key fields:

score (float, 0.0 – 1.0):
Represents how relevant the response is to the user query:
- 0.0–0.2: Poor alignment with intent.
- 0.2–0.7: Partial relevance; contains some irrelevant or misunderstood content.
- 0.7–1.0: Strongly aligned; focused and relevant.
instructions_list (array):
A list of individual guideline evaluations:
- instruction: A rule representing one aspect of output relevance (e.g., "Response should directly address the core intent").
- label: Indicates whether the response followed the instruction (true) or violated it (false).
- follow_probability: A confidence score indicating the likelihood the instruction was followed.
- explanation: A natural-language rationale explaining why the instruction was marked true or false.

Note: The score is an overall judgment, but the instruction-level breakdown provides insight into specific strengths or issues in the response.

Request
Response

[
  {
    "context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
    "generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
    "config": {
      "output_relevance": {
        "detector_name": "default"
      }
    }
  }
]

[
  {
    "output_relevance": {
      "instructions_list": [
        {
          "instruction": "Response should directly address the core intent of the user query.",
          "label": false,
          "follow_probability": 0.1192,
          "explanation": "The response discusses Paul Graham’s career and IBM 1401, which does not directly address any core intent since no user query was provided."
        },
        {
          "instruction": "Response should prioritize information that is specifically relevant to the question asked.",
          "label": false,
          "follow_probability": 0.0851,
          "explanation": "It includes general biographical details without prioritizing any specific aspect of the non-existent query."
        },
        {
          "instruction": "Response should avoid including unrelated or tangential information.",
          "label": false,
          "follow_probability": 0.5,
          "explanation": "The answer contains extra details like 'writing essays' that are tangential to the lack of a clear query."
        },
        {
          "instruction": "Response should not misinterpret or distort the meaning of the query.",
          "label": true,
          "follow_probability": 0.6792,
          "explanation": "There is no misinterpretation; the response simply provides factual information without distorting the query."
        }
      ],
      "score": 0.5
    }
  }
]

Code Example

The below example demonstrates how to implement the output relevance metric in a synchronous manner.

Python
TypeScript

from aimon import Detect
import os

# This is a synchronous example
# Use async=True to use it asynchronously
# Use publish=True to publish to the AIMon UI

detect = Detect(
    values_returned=['context', 'generated_text'],
    config={"output_relevance": {"detector_name": "default"}},
    publish=True,
    api_key=os.getenv("AIMON_API_KEY"),
    application_name="my_awesome_llm_app",
    model_name="my_awesome_llm_model"
)

@detect
def my_llm_app(context, query):
    my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
    But I often include too much information.
    The query you passed is: {query}.
    The context you passed is: {context}.'''
    generated_text = my_llm_model(context, query)
    return context, generated_text

context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")

print(aimon_res)

import Client from "aimon";

// Create the AIMon client using an API Key (retrievable from the UI in your user profile).
const aimon = new Client({ authHeader: "Bearer API_KEY" });

const runDetect = async () => {
  const generatedText = "your_generated_text";
  const context = ["your_context"];
  const userQuery = "your_user_query";
  const config = { output_relevance: { detector_name: "default" } };

  // Analyze the quality of the generated output using AIMon
  const response = await aimon.detect(
    generatedText,
    context,
    userQuery,
    config
  );

  console.log("Response from detect:", response);
};

runDetect();

Response Format​

Code Example​

Response Format

Code Example