Continuous Monitoring
AIMon can be used to monitor the quality of your model outputs continuously. This is possible due to AIMon's low latency and low cost models. When used in asynchronous mode, AIMon adds minimal overhead to your model's inference time.
By adding continuous monitoring to your model, you can:
- Track the quality of your model outputs over time.
- Monitor the quality of your model outputs in near real-time.
- Identify and address any issues with your model outputs as they arise.
- Ensure that every inference is monitored for quality.
How it works
AIMon continuously monitors the quality of your model outputs by comparing them to a set of predefined quality metrics. These metrics are defined by you and can include things like hallucination, toxicity, and more.
When you make an inference request to your model, the AIMon SDK asynchronously sends the request and output to the AIMon API. The AIMon API then evaluates the output against the predefined quality metrics and returns the results. These results are then displayed on the AIMon App UI. These results can also be accessed used the Metrics API in the AIMon SDK.
How to use it
To use continuous monitoring with AIMon, you need to:
- Configure the AIMon SDK to monitor the quality metrics you care about.
- Integrating the AIMon SDK with your model.
Here is an example of how to use the AIMon SDK to monitor the quality of your model outputs:
- Python
- TypeScript
from aimon import Detect
import os
detect = Detect(
values_returned=['context', 'generated_text'],
config={"hallucination": {"detector_name": "default"}},
api_key=os.getenv("AIMON_API_KEY"),
async_mode=True,
publish=True,
application_name='my_awesome_llm_app',
model_name='my_awesome_llm_model' # name of your LLM which generates the responses to be scored
)
@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'I am a LLM trained to answer your questions. But I hallucinate often. The query you passed is: {query}. The context you passed is: {context}.'
generated_text = my_llm_model(context, query)
return context, generated_text
context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")
print(aimon_res)
# DetectResult(status=200, detect_response=AnalyzeCreateResponse(message='Data successfully sent to AIMon.', status=200), publish_response=[])
import Client from "aimon";
import { OpenAI } from "@langchain/openai";
import { loadSummarizationChain } from "langchain/chains";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { fileFromPath } from "formdata-node/file-from-path";
// Create the AIMon client. You would need an API Key (that can be retrieved from the UI in your user profile).
const aimon = new Client({
authHeader: 'Bearer: <AIMON_API_KEY>',
});
// Initialize OpenAI configuration
const openaiApiKey = "OPENAI_API_KEY";
const runApplication: any = async (
applicationName: string,
modelName: string,
sourceText: any,
prompt: string | null = null,
userQuery: string | null = null,
) => {
// Split the source text
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([sourceText]);
const contextDocs = docs.map((doc) => doc.pageContent);
// Summarize the texts
const llm = new OpenAI({ temperature: 0, openAIApiKey: openaiApiKey });
const chain = loadSummarizationChain(llm, { type: "map_reduce" });
const output = await chain.invoke({
input_documents: docs,
});
const payload = {
context_docs: contextDocs,
output: String(output.text),
prompt: prompt ?? "",
user_query: userQuery ?? "",
instructions: "These are the instructions",
};
const config = {
hallucination: { detector_name: "default" },
conciseness: { detector_name: "default" },
completeness: { detector_name: "default" },
instruction_adherence: { detector_name: "default" },
};
// Analyze quality of the generated output using AIMon
const response: Client.AnalyzeCreateResponse = await aimon.analyze.production(
applicationName,
modelName,
payload,
config
);
};