Skip to main content

Evaluate

Dataset

A dataset is a collection of records that can be used for evaluations. The dataset should be a CSV file. The supported columns are:

  • "prompt": This is the system prompt used for the LLM
  • "user_query": This the query specified by the user
  • "context_docs": These are context documents that are either retrieved from a RAG or through other methods. For tasks like summarization, these documents could be directly specified by the user.
  • "output": This is the generated text by the LLM
  • "instructions": These are the instructions provided to the LLM
  • "metadata": This is a dictionary of additional metadata associated with the record.

Create dataset:

import Client from "aimon";
import { fileFromPath } from "formdata-node/file-from-path";

const aimon = new Client({
authHeader: `Bearer API_KEY`,
});

// Creates a new dataset from the local path csv file
const createDataset = async (
path: string,
datasetName: string,
description: string
): Promise<Client.Dataset> => {
const file = await fileFromPath(path);
const json_data = JSON.stringify({
name: datasetName,
description: description,
});

const params = {
file: file,
json_data: json_data,
};

const dataset: Client.Dataset = await aimon.datasets.create(params);
return dataset;
};

const dataset1 = await createDataset(
"/path/to/file/filename_1.csv",
"filename1.csv",
"description"
);

const dataset2 = await createDataset(
"/path/to/file/filename_2.csv",
"filename2.csv",
"description"
);

DatasetCollection

A dataset collection is a collection of one or more datasets that can be used for evaluations.

aimon.datasets.collection.create

This function creates a new dataset collection.

Args:

  • name (str): The name of the dataset collection.
  • description (str): The description of the dataset collection.
  • dataset_ids (List[str]): A list of dataset IDs to include in the collection.

Returns:

  • CollectionCreateResponse: The created dataset collection.

Example:

let datasetCollection: Client.Datasets.CollectionCreateResponse | undefined;

// Ensures that dataset1.sha and dataset2.sha are defined
if (dataset1.sha && dataset2.sha) {
// Creates dataset collection
datasetCollection = await aimon.datasets.collection.create({
name: "my_first_dataset_collection",
dataset_ids: [dataset1.sha, dataset2.sha],
description: "This is a collection of two datasets.",
});
} else {
throw new Error("Dataset sha is undefined");
}

Function: evaluate

Run an evaluation on a dataset collection using the Aimon API.

Parameters:

  • applicationName (str): The name of the application to run the evaluation on.
  • modelName (str): The name of the model to run the evaluation on.
  • datasetCollectionName (str): The name of the dataset collection to run the evaluation on.
  • evaluationName (str): The name of the evaluation to be created.
  • headers (list): A list of column names in the dataset to be used for the evaluation. Must include 'context_docs'.
  • config (dict, optional): A dictionary of configuration options for the evaluation.

Returns:

  • List[EvaluateResponse]: A list of EvaluateResponse objects containing the output and response for each record in the dataset collection.

Raises:

  • ValueError: If headers is empty or doesn't contain 'context_docs', or if required fields are missing from the dataset records.

Note:

The dataset records must contain 'context_docs' and all fields specified in the 'headers' argument. The 'prompt', 'output', and 'instructions' fields are optional.

Example:

const headers = ["context_docs", "user_query", "output"];
const config = {
hallucination: { detector_name: "default" },
instruction_adherence: { detector_name: "default" },
};
const results = await aimon.evaluate(
"my_application_name", //Application Name
"my_model_name", // Model name
//this dataset collection must exist in the Aimon platform
"my_first_dataset_collection",
"my_evaluation_name", // Evaluation name,
headers,
config
);