Skip to main content

Conciseness

The Conciseness API is designed to assess how directly and efficiently a generated response answers a user query, given a specific context. This context generally includes both the original user query and any background documents or inputs that were passed to the language model. The goal of this evaluation is to determine whether the output remains tightly focused on the user’s information need or diverges into unnecessary, redundant, or irrelevant content. This is particularly useful in applications where clarity and brevity are important, such as customer support, summarization, or response generation, and helps ensure that the output respects the user’s intent without introducing noise or filler.

Response Format

The API returns a list of detection results, each containing a conciseness object with key evaluation fields:

  • score (float, 0.0 – 1.0):

    Indicates how concise the response is:

    • 0.0–0.2: Highly verbose and unfocused.
    • 0.2–0.7: Partially concise; some unnecessary content.
    • 0.7–1.0: Well-focused and efficient.
  • instructions_list (array):

    A breakdown of individual instructions adhered to, where each item includes:

    • instruction: A rule representing one aspect of conciseness (e.g., "Response should not be verbose").
    • label: Indicates whether the response followed the instruction (true) or violated it (false).
    • follow_probability: A confidence score indicating the likelihood the instruction was followed.
    • explanation: A natural-language rationale explaining why the instruction was marked true or false.

Note: The score reflects the overall conciseness, but detailed reasoning is broken down across these instruction-level explanations.

[
{
"context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"conciseness": {
"detector_name": "default"
}
}
}
]

Code Example

The below example demonstrates how to implement the conciseness metric in a synchronous manner.

from aimon import Detect
import os

# This is a synchronous example
# Use async=True to use it asynchronously
# Use publish=True to publish to the AIMon UI

detect = Detect(
values_returned=['context', 'generated_text'],
config={"conciseness": {"detector_name": "default"}},
publish=True,
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_awesome_llm_app",
model_name="my_awesome_llm_model"
)

@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
But I often include too much information.
The query you passed is: {query}.
The context you passed is: {context}.'''
generated_text = my_llm_model(context, query)
return context, generated_text

context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")

print(aimon_res)