Multimodal support is currently available only in Python. Javascript/Typescript support coming soon!
Multi-modal Embedding Functions
Chroma supports multi-modal embedding functions, which can be used to embed data from multiple modalities into a single embedding space. Chroma ships with the OpenCLIP embedding function built in, which supports both text and images.Adding Multimodal Data and Data Loaders
You can add embedded data of modalities different from text directly to Chroma. For now images are supported:ImageLoader, for loading images from a local filesystem. We can create a collection set up with the ImageLoader:
.add method to add records to this collection. The collection’s data loader will grab the images using the URIs, embed them using the OpenCLIPEmbeddingFunction, and store the embeddings in Chroma.
OpenCLIPEmbeddingFunction), you can also add text to the same collection:
Querying
You can query a multi-modal collection with any of the modalities that it supports. For example, you can query with images:uris are also available as an include field.
Updating
You can update a multi-modal collection by specifying the data modality, in the same way asadd. For now, images are supported: