DeepEval is the open-source LLM evaluation framework. It provides 20+ research-backed metrics to help you evaluate and pick the best hyperparameters for your LLM system. When building a RAG system, you can use DeepEval to pick the best parameters for your Choma retriever for optimal retrieval performance and accuracy:Documentation Index
Fetch the complete documentation index at: https://docs.trychroma.com/llms.txt
Use this file to discover all available pages before exploring further.
n_results, distance_function, embedding_model, chunk_size, etc.
For more information on how to use DeepEval, see the DeepEval docs.
Getting Started
Step 1: Installation
Step 2: Preparing a Test Case
Prepare a query, generate a response using your RAG pipeline, and store the retrieval context from your Chroma retriever to create anLLMTestCase for evaluation.
Step 3: Evaluation
Define retriever metrics likeContextual Precision, Contextual Recall, and Contextual Relevancy to evaluate test cases. Recall ensures enough vectors are retrieved, while relevancy reduces noise by filtering out irrelevant ones.
Balancing recall and relevancy is key.
distance_function and embedding_model affects recall, while n_results and chunk_size impact relevancy.4. Visualize and Optimize
To visualize evaluation results, log in to the Confident AI (DeepEval platform) by running:evaluate will automatically send evaluation results to Confident AI, where you can visualize and analyze performance metrics, identify failing retriever hyperparameters, and optimize your Chroma retriever for better accuracy.
To learn more about how to use the platform, please see this Quickstart Guide.