Skip to main content

Continuous Monitoring

AIMon can be used to monitor the quality of your model outputs continuously. This is possible due to AIMon's low latency and low cost models. When used in asynchronous mode, AIMon adds minimal overhead to your model's inference time.

By adding continuous monitoring to your model, you can:

  • Track the quality of your model outputs over time.
  • Monitor the quality of your model outputs in near real-time.
  • Identify and address any issues with your model outputs as they arise.
  • Ensure that every inference is monitored for quality.

How it works

AIMon continuously monitors the quality of your model outputs by comparing them to a set of predefined quality metrics. These metrics are defined by you and can include things like hallucination, toxicity, and more.

When you make an inference request to your model, the AIMon SDK asynchronously sends the request and output to the AIMon API. The AIMon API then evaluates the output against the predefined quality metrics and returns the results. These results are then displayed on the AIMon App UI. These results can also be accessed used the Metrics API in the AIMon SDK.

How to use it

To use continuous monitoring with AIMon, you need to:

  1. Configure the AIMon SDK to monitor the quality metrics you care about.
  2. Integrating the AIMon SDK with your model.

Here is an example of how to use the AIMon SDK to monitor the quality of your model outputs:

from aimon import Detect
import os

detect = Detect(
values_returned=['context', 'generated_text'],
config={"hallucination": {"detector_name": "default"}},
api_key=os.getenv("AIMON_API_KEY"),
async_mode=True,
publish=True,
application_name='my_awesome_llm_app',
model_name='my_awesome_llm_model' # name of your LLM which generates the responses to be scored
)

@detect
def my_llm_app(context, query):
my_llm_model = lambda context, query: f'I am a LLM trained to answer your questions. But I hallucinate often. The query you passed is: {query}. The context you passed is: {context}.'
generated_text = my_llm_model(context, query)
return context, generated_text


context, gen_text, aimon_res = my_llm_app("This is a context", "This is a query")

print(aimon_res)
# DetectResult(status=200, detect_response=AnalyzeCreateResponse(message='Data successfully sent to AIMon.', status=200), publish_response=[])