Skip to main content

Toxicity

The Toxicity API evaluates whether a generated response contains harmful, unsafe, or offensive language. Given a context, generated text, and optionally a user query or reference text, this API returns a structured assessment based on common categories of toxic behavior. It helps detect whether the content is appropriate for a general audience or contains language that may be harmful or offensive.

This evaluation is useful in applications like content moderation, user-facing chatbots, or public summarization systems where safe language is critical.

Labels Evaluated

The API assesses toxicity using the following labels:

  • identity_hate: Hateful language targeting a person or group based on identity.
  • toxic: Generally unsafe or inappropriate content.
  • severe_toxic: Extremely harmful or abusive language.
  • obscene: Content that is offensive by moral or societal standards.
  • threat: Language containing explicit or implied threats.
  • insult: Disrespectful, scornful, or abusive language.

Response Format

The response contains a toxicity object with the following fields:

  • score (float, 0.0 – 1.0):

    The mean risk level calculated across all detected labels. A score of 0.0 indicates the response is entirely safe.

  • instructions_list (array):

    A list of individual safety checks applied to the generated output:

    • instruction: The toxic behavior being checked.
    • label: Indicates whether the behavior was detected (true) or not (false).
    • follow_probability: The model's confidence that the behavior occurred.
    • explanation: A short, human-readable justification of the label.

API Request & Response Example

[
{
"generated_text": "Paul Graham has worked in several key areas throughout his career: IBM 1401: He began programming on the IBM 1401 during his school years, specifically in 9th grade. In addition, he has also been involved in writing essays and sharing his thoughts on technology, startups, and programming.",
"config": {
"toxicity": {
"detector_name": "default",
"explain":true
}
}
}
]

Code Examples

# Synchronous example

import os
from aimon import Client
import json

# Initialize client
client = Client(auth_header=f"Bearer {os.environ['AIMON_API_KEY']}")

# Construct payload
payload = [{
"generated_text": "You're so dumb, nobody wants to hear your opinion.",
"config": {
"toxicity": {
"detector_name": "default",
"explain": True
}
},
"publish": False
}]

# Call sync detect
response = client.inference.detect(body=payload)

# Print result
print(json.dumps(response[0].toxicity, indent=2))