Chroma provides a built-in BM25 sparse embedding function. BM25 (Best Matching 25) is a ranking function used to estimate the relevance of documents to a given search query. This embedding function runs locally and does not require any external API keys. Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.Documentation Index
Fetch the complete documentation index at: https://docs.trychroma.com/llms.txt
Use this file to discover all available pages before exploring further.
- Python
- TypeScript
- Rust
This embedding function uses snowballstemmer
to tokenize documents.You can customize the BM25 parameters:
k: Controls term frequency saturation (default: 1.2)b: Controls document length normalization (default: 0.75)avg_doc_length: Average document length in tokens (default: 256.0)token_max_length: Maximum token length (default: 40)stopwords: Optional list of stopwords to exclude