📄️ Metrics Overview
Evaluating LLM and RAG applications often involves trade-offs between accuracy, reliability, and speed. While several frameworks like RAGAs, TrueLens, and DeepEval exist, they can be overwhelming due to inconsistent metrics, complexity in setup, and reliance on subjective LLM-based evaluation.
🗃️ Output Quality
6 items
🗃️ Output Safety
9 items
🗃️ RAG & Data
2 items
📄️ Custom Metrics
Custom Metrics let you define your own evaluation criteria for AI-generated responses. These metrics work just like AIMon’s built-in ones (e.g., Conciseness, Toxicity), but are fully customizable by your team.