📄️ Metrics Overview
Evaluating LLM and RAG applications often involves trade-offs between accuracy, reliability, and speed. While several frameworks like RAGAs, TrueLens, and DeepEval exist, they can be overwhelming due to inconsistent metrics, complexity in setup, and reliance on subjective LLM-based evaluation.
🗃️ Output Quality
5 items
🗃️ Safety Metrics
9 items
🗃️ RAG & Data
2 items
📄️ Custom Metrics
Custom Metrics let you define your own evaluation criteria for AI-generated responses. These metrics work just like AIMon’s built-in ones (e.g., Conciseness, Toxicity), but are fully customizable by your team.
📄️ Deprecated Metrics
This page lists AIMon metrics that have been deprecated in favor of more accurate or modular replacements. These are no longer actively maintained but may still be available for legacy users.