Chroma provides a convenient wrapper around JinaAI’s embedding API. This embedding function runs remotely on JinaAI’s servers, and requires an API key. You can get an API key by signing up for an account at JinaAI.Documentation Index
Fetch the complete documentation index at: https://docs.trychroma.com/llms.txt
Use this file to discover all available pages before exploring further.
model_name argument, which lets you choose which Jina model to use. By default, Chroma uses jina-embedding-v2-base-en.
Jina has added new attributes on embedding functions, including
task, late_chunking, truncate, dimensions, embedding_type, and normalized. See JinaAI for references on which models support these attributes.Late Chunking Example
jina-embeddings-v3 supports Late Chunking, a technique to leverage the model’s long-context capabilities for generating contextual chunk embeddings. Includelate_chunking=True in your request to enable contextual chunked representation. When set to true, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list.
Task parameter
jina-embeddings-v3 has been trained with 5 task-specific adapters for different embedding uses. Include task in your request to optimize your downstream application:
retrieval.query: Used to encode user queries or questions in retrieval tasks.retrieval.passage: Used to encode large documents in retrieval tasks at indexing time.classification: Used to encode text for text classification tasks.text-matching: Used to encode text for similarity matching, such as measuring similarity between two sentences.separation: Used for clustering or reranking tasks.