Braintrust is an enterprise-grade stack for building AI products including: evaluations, prompt playground, dataset management, tracing, etc.
Braintrust provides a Typescript and Python library to run and log evaluations and integrates well with Chroma.
Example evaluation script in Python: (refer to the tutorial above to get the full implementation)
Learn more: docs.