Conciseness
The Conciseness API is designed to assess how directly and efficiently a generated response answers a user query, given a specific context. This context generally includes both the original user query and any background documents or inputs that were passed to the language model. The goal of this evaluation is to determine whether the output remains tightly focused on the user’s information need or diverges into unnecessary, redundant, or irrelevant content. This is particularly useful in applications where clarity and brevity are important, such as customer support, summarization, or response generation, and helps ensure that the output respects the user’s intent without introducing noise or filler.
Response Format
The API returns a list of detection results, each containing a conciseness
object with key evaluation fields:
-
score (
float
, 0.0 – 1.0):Indicates how concise the response is:
0.0–0.2
: Highly verbose and unfocused.0.2–0.7
: Partially concise; some unnecessary content.0.7–1.0
: Well-focused and efficient.
-
instructions_list (
array
):A breakdown of individual instructions adhered to, where each item includes:
instruction
: A rule representing one aspect of conciseness (e.g., "Response should not be verbose").label
: Indicates whether the response followed the instruction (true
) or violated it (false
).follow_probability
: A confidence score indicating the likelihood the instruction was followed.explanation
: A natural-language rationale explaining why the instruction was marked true or false.
Note: The
score
reflects the overall conciseness, but detailed reasoning is broken down across these instruction-level explanations.
API Request & Response Example
- Request
- Response
[
{
"context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"conciseness": {
"detector_name": "default",
"explain":true
}
}
}
]
[
{
"conciseness": {
"instructions_list": [
{
"explanation": "The response is lengthy and verbose, e.g., 'He began programming on the IBM 1401 during his school years, specifically in 9th grade.'",
"follow_probability": 0.4688,
"instruction": "Response should not be verbose",
"label": false
},
{
"explanation": "It sticks to the query and context without extra details, focusing solely on Paul Graham's career.",
"follow_probability": 0.5927,
"instruction": "Response should not contain a lot of un-necessary information that is not relevant to the user query and the provided context documents",
"label": true
},
{
"explanation": "The answer is concise and not overly detailed, though it includes more detail than strictly necessary.",
"follow_probability": 0.5312,
"instruction": "Response should not be overly detailed and not focused",
"label": true
},
{
"explanation": "Key details are included, such as his work on the IBM 1401 and essay writing, ensuring no key points are missed.",
"follow_probability": 0.9149,
"instruction": "Response should not miss any key details",
"label": true
},
{
"explanation": "The response focuses solely on Paul Graham’s career without extraneous details, e.g., 'Paul Graham has worked in several key areas'.",
"follow_probability": 0.7311,
"instruction": "Response should not have any irrelevant information",
"label": true
},
{
"explanation": "It provides concise details without excessive elaboration, as seen in the brief mention of 'IBM 1401' and essay writing.",
"follow_probability": 0.7773,
"instruction": "Response should not have unnecessary elaboration",
"label": true
},
{
"explanation": "There are no contradictions; the narrative is clear and consistent, e.g., 'He began programming on the IBM 1401 during his school years'.",
"follow_probability": 0.9988,
"instruction": "Response should not lead to confusion due to contradictions",
"label": true
},
{
"explanation": "The answer makes good use of the available information, citing specific facts like '9th grade' and 'IBM 1401'.",
"follow_probability": 0.9399,
"instruction": "Response should make the best use of the available information",
"label": true
},
{
"explanation": "The response focuses on key aspects by mentioning 'IBM 1401' and 'writing essays', directly addressing the query.",
"follow_probability": 0.6225,
"instruction": "Response should focus on the most critical aspects of the user's question",
"label": true
},
{
"explanation": "The answer explains clearly with detailed phrases like 'He began programming on the IBM 1401 during his school years'.",
"follow_probability": 0.9399,
"instruction": "Response should explain information clearly",
"label": true
}
],
"score": 0.9
}
}
]
Code Examples
- Python (Sync)
- Python (Async)
- Python (Decorator)
- TypeScript
# Synchronous example
import os
from aimon import Client
import json
# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")
# Construct payload
payload = [{
"generated_text": "Paris is the capital and largest city of France, known for many attractions.",
"context": ["Paris is the capital and largest city of France."],
"config": {
"conciseness": {
"detector_name": "default",
"explain": True
}
},
"publish": False
}]
# Call sync detect
response = client.inference.detect(body=payload)
# Print result
print(json.dumps(response[0].conciseness, indent=2))
# Aynchronous example
import os
import json
from aimon import AsyncClient
aimon_api_key = os.environ["AIMON_API_KEY"]
aimon_payload = {
"generated_text": "Paris is the capital and largest city of France, known for many attractions.",
"context": ["Paris is the capital and largest city of France."],
"config": {
"conciseness": {
"detector_name": "default",
"explain": True
}
},
"publish": True,
"async_mode": True,
"application_name": "async_metric_example",
"model_name": "async_metric_example"
}
data_to_send = [aimon_payload]
async def call_aimon():
async with AsyncClient(auth_header=f"Bearer {aimon_api_key}") as aimon:
return await aimon.inference.detect(body=data_to_send)
resp = await call_aimon()
print(json.dumps(resp, indent=2))
print("View results at: https://www.app.aimon.ai/llmapps?source=sidebar&stage=production")
from aimon import Detect
import os
detect = Detect(
values_returned=['context', 'generated_text'],
config={"conciseness": {"detector_name": "default", "explain": True}},
publish=True,
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_awesome_llm_app",
model_name="my_awesome_llm_model"
)
@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
But I often include too much information.
The query you passed is: {query}.
The context you passed is: {context}.'''
generated_text = my_llm_model(context, query)
return context, generated_text
context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")
print(aimon_res)
import Client from "aimon";
import dotenv from "dotenv";
dotenv.config();
const aimon = new Client({
authHeader: `Bearer ${process.env.AIMON_API_KEY}`,
});
const run = async () => {
const context = "Photosynthesis is the process by which green plants make their food using sunlight.";
const generatedText = "Photosynthesis is the biochemical process that occurs in the chloroplasts of green plants, allowing them to convert light energy, carbon dioxide, and water into glucose and oxygen.";
const config = {
conciseness: { detector_name: "default", explain: true },
};
const response = await aimon.detect({ context, generatedText, config });
console.log("Conciseness result:", JSON.stringify(response, null, 2));
};
run();