- Python
- TypeScript
- Rust
This embedding function uses snowballstemmer
to tokenize documents.You can customize the BM25 parameters:
k: Controls term frequency saturation (default: 1.2)b: Controls document length normalization (default: 0.75)avg_doc_length: Average document length in tokens (default: 256.0)token_max_length: Maximum token length (default: 40)stopwords: Optional list of stopwords to exclude