model_name argument, which lets you choose which Jina model to use. By default, Chroma uses jina-embedding-v2-base-en.
Jina has added new attributes on embedding functions, including
task, late_chunking, truncate, dimensions, embedding_type, and normalized. See JinaAI for references on which models support these attributes.Late Chunking Example
jina-embeddings-v3 supports Late Chunking, a technique to leverage the model’s long-context capabilities for generating contextual chunk embeddings. Includelate_chunking=True in your request to enable contextual chunked representation. When set to true, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list.
Task parameter
jina-embeddings-v3 has been trained with 5 task-specific adapters for different embedding uses. Include task in your request to optimize your downstream application:
retrieval.query: Used to encode user queries or questions in retrieval tasks.retrieval.passage: Used to encode large documents in retrieval tasks at indexing time.classification: Used to encode text for text classification tasks.text-matching: Used to encode text for similarity matching, such as measuring similarity between two sentences.separation: Used for clustering or reranking tasks.