Conciseness
The Conciseness API is designed to assess how directly and efficiently a generated response answers a user query, given a specific context. This context generally includes both the original user query and any background documents or inputs that were passed to the language model. The goal of this evaluation is to determine whether the output remains tightly focused on the user’s information need or diverges into unnecessary, redundant, or irrelevant content. This is particularly useful in applications where clarity and brevity are important, such as customer support, summarization, or response generation, and helps ensure that the output respects the user’s intent without introducing noise or filler.
Response Format
The API returns a list of detection results, each containing a conciseness
object with key evaluation fields:
-
score (
float
, 0.0 – 1.0):Indicates how concise the response is:
0.0–0.2
: Highly verbose and unfocused.0.2–0.7
: Partially concise; some unnecessary content.0.7–1.0
: Well-focused and efficient.
-
instructions_list (
array
):A breakdown of individual instructions adhered to, where each item includes:
instruction
: A rule representing one aspect of conciseness (e.g., "Response should not be verbose").label
: Indicates whether the response followed the instruction (true
) or violated it (false
).follow_probability
: A confidence score indicating the likelihood the instruction was followed.explanation
: A natural-language rationale explaining why the instruction was marked true or false.
Note: The
score
reflects the overall conciseness, but detailed reasoning is broken down across these instruction-level explanations.
- Request
- Response
[
{
"context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"conciseness": {
"detector_name": "default"
}
}
}
]
[
{
"conciseness": {
"instructions_list": [
{
"explanation": "The response is lengthy and verbose, e.g., 'He began programming on the IBM 1401 during his school years, specifically in 9th grade.'",
"follow_probability": 0.4688,
"instruction": "Response should not be verbose",
"label": false
},
{
"explanation": "It sticks to the query and context without extra details, focusing solely on Paul Graham's career.",
"follow_probability": 0.5927,
"instruction": "Response should not contain a lot of un-necessary information that is not relevant to the user query and the provided context documents",
"label": true
},
{
"explanation": "The answer is concise and not overly detailed, though it includes more detail than strictly necessary.",
"follow_probability": 0.5312,
"instruction": "Response should not be overly detailed and not focused",
"label": true
},
{
"explanation": "Key details are included, such as his work on the IBM 1401 and essay writing, ensuring no key points are missed.",
"follow_probability": 0.9149,
"instruction": "Response should not miss any key details",
"label": true
},
{
"explanation": "The response focuses solely on Paul Graham’s career without extraneous details, e.g., 'Paul Graham has worked in several key areas'.",
"follow_probability": 0.7311,
"instruction": "Response should not have any irrelevant information",
"label": true
},
{
"explanation": "It provides concise details without excessive elaboration, as seen in the brief mention of 'IBM 1401' and essay writing.",
"follow_probability": 0.7773,
"instruction": "Response should not have unnecessary elaboration",
"label": true
},
{
"explanation": "There are no contradictions; the narrative is clear and consistent, e.g., 'He began programming on the IBM 1401 during his school years'.",
"follow_probability": 0.9988,
"instruction": "Response should not lead to confusion due to contradictions",
"label": true
},
{
"explanation": "The answer makes good use of the available information, citing specific facts like '9th grade' and 'IBM 1401'.",
"follow_probability": 0.9399,
"instruction": "Response should make the best use of the available information",
"label": true
},
{
"explanation": "The response focuses on key aspects by mentioning 'IBM 1401' and 'writing essays', directly addressing the query.",
"follow_probability": 0.6225,
"instruction": "Response should focus on the most critical aspects of the user's question",
"label": true
},
{
"explanation": "The answer explains clearly with detailed phrases like 'He began programming on the IBM 1401 during his school years'.",
"follow_probability": 0.9399,
"instruction": "Response should explain information clearly",
"label": true
}
],
"score": 0.9
}
}
]
Code Example
The below example demonstrates how to implement the conciseness metric in a synchronous manner.
- Python
- TypeScript
from aimon import Detect
import os
# This is a synchronous example
# Use async=True to use it asynchronously
# Use publish=True to publish to the AIMon UI
detect = Detect(
values_returned=['context', 'generated_text'],
config={"conciseness": {"detector_name": "default"}},
publish=True,
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_awesome_llm_app",
model_name="my_awesome_llm_model"
)
@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
But I often include too much information.
The query you passed is: {query}.
The context you passed is: {context}.'''
generated_text = my_llm_model(context, query)
return context, generated_text
context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")
print(aimon_res)
import Client from "aimon";
// Create the AIMon client using an API Key (retrievable from the UI in your user profile).
const aimon = new Client({ authHeader: "Bearer API_KEY" });
const runDetect = async () => {
const generatedText = "your_generated_text";
const context = ["your_context"];
const userQuery = "your_user_query";
const config = { conciseness: { detector_name: "default" } };
// Analyze the quality of the generated output using AIMon
const response = await aimon.detect(
generatedText,
context,
userQuery,
config,
);
console.log("Response from detect:", response);
}
runDetect();