Glossary

The following definitions will help you develop a mental model of how things are structured in AIMon.

Model: The LLM that generates text based on your user input query, retrieved context, and instructions. The model can be a vanilla LLM, a fine-tuned model, or a prompt-engineered model.

LLM Apps: An LLM App is an application that you develop, identified uniquely by its name, the underlying model, and an app version.

Evaluation: This is the development phase when you as a builder are experimenting with your LLM App. For instance, you might be using AIMon for selecting the best LLM for your use case.

Continuous Monitoring: This is the phase when you have opened up your LLM app to testers or users who might be using arbitrary queries. You may use AIMon to run online evals to continuously monitor your apps in this phase.

Metrics: AIMon offers advanced metrics to identify and quantify unwanted characteristics in the responses generated by your LLM-powered applications. Our innovations include the state-of-the-art HDM-2 model, which outperforms GPT-4 Turbo and GPT-4o-mini on industry-standard benchmarks such as RagTruth, TruthfulQA, and our proprietary HDM-Bench.

Hallucination: We define Hallucination as the presence of any statement in the LLM output that contradicts or violates the facts given to your LLM in your context. These could be factual inaccuracies or fabrications of new information. This is different from the "Faithfulness" metric which is defined as the total number of correct facts in the output divided by the total number of facts in the context. Faithfulness is useful when you are interested in the percentage of truthful "facts" in the output compared to the facts in the input context. Hallucination, on the other hand, is a calibrated probability that gives you a better understanding of the magnitude of the hallucination problem in your LLM outputs. For instance, a hallucination score closer to 0.0 indicates a low probability of hallucination and a score closer to 1.0 indicates a high probability of hallucination.

Detector Threshold: AIMon provides a classification (ex: hallucinated = True/False) based on a default threshold of the detector score. In addition, AIMon also offers a way for you to choose the thresholds to apply a tri-color coded scheme (red, amber, green) to indicate the likelihood of Hallucination or other metrics.

Instruction Adherence: LLMs are often provided multiple instructions such as "Only use Asian date formats" and it is critical for developers to catch any deviations from such instructions. The Instruction Adherence score is the total number of correctly followed instructions divided by the total number of instructions.

SDK: AIMon provides two SDKs for Python and Typescript respectively. These can be used to instrument your application and send the relevant data required for AIMon to detect any unwanted characteristics in your LLM Apps such as Hallucinations or Instruction Adherence. You have the flexibility of configuring which detections you would like AIMon to perform.