Skip to main content

Toxicity

The Toxicity API evaluates whether a generated response contains harmful, unsafe, or offensive language. Given a context, generated text, and optionally a user query or reference text, this API returns a structured assessment based on common categories of toxic behavior. It helps detect whether the content is appropriate for a general audience or contains language that may be harmful or offensive.

This evaluation is useful in applications like content moderation, user-facing chatbots, or public summarization systems where safe language is critical.

Labels Evaluated

The API assesses toxicity using the following labels:

  • identity_hate: Hateful language targeting a person or group based on identity.
  • toxic: Generally unsafe or inappropriate content.
  • severe_toxic: Extremely harmful or abusive language.
  • obscene: Content that is offensive by moral or societal standards.
  • threat: Language containing explicit or implied threats.
  • insult: Disrespectful, scornful, or abusive language.

Response Format

The response contains a toxicity object with the following fields:

  • score (float, 0.0 – 1.0):

    The mean risk level calculated across all detected labels. A score of 0.0 indicates the response is entirely safe.

  • instructions_list (array):

    A list of individual safety checks applied to the generated output:

    • instruction: The toxic behavior being checked.
    • label: Indicates whether the behavior was detected (true) or not (false).
    • follow_probability: The model's confidence that the behavior occurred.
    • explanation: A short, human-readable justification of the label.
[
{
"context": "Paul Graham is an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known for his work on Lisp, his former startup Viaweb (later renamed Yahoo! Store), co-founding the influential startup accelerator and seed capital firm Y Combinator, his blog, and Hacker News.",
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"toxicity": {
"detector_name": "default"
}
}
}
]

Code Example

The below example demonstrates how to use the instruction adherence detector in a synchronous manner.

from aimon import Detect
import os

# This is a synchronous example

# Use async=True to use it asynchronously

# Use publish=True to publish to the AIMon UI

detect = Detect(
values_returned=['context', 'generated_text'],
config={"toxicity": {"detector_name": "default"}},
publish=True,
api_key=os.getenv("AIMON_API_KEY"),
application_name="my_awesome_llm_app",
model_name="my_awesome_llm_model"
)

@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'''I am a LLM trained to answer your questions.
But I often don't fully answer your questions.
The query you passed is: {query}.
The context you passed is: {context}.'''
generated_text = my_llm_model(context, query)
return context, generated_text

context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")

print(aimon_res)