How Ranking Works
A ranking expression determines which documents are scored and how they’re ordered:Expression Evaluation Process
-
No ranking (
rank=None): Documents are returned in index order (typically insertion order) -
With ranking expression:
- Must contain at least one
Knnexpression - Documents must appear in at least one
Knn’s top-k results to be considered - Documents must also appear in ALL
Knnresults wheredefault=None - Documents missing from a
Knnwith adefaultvalue get that default score - Each
Knnconsiders its toplimitcandidates (default: 16) - Documents are sorted by score (ascending - lower scores first)
- Final results based on
Search.limit()
- Must contain at least one
Document Selection and Scoring
When combining multiple
Knn expressions, documents must appear in at least one Knn’s results AND must appear in every Knn where default=None. To avoid excluding documents, set default values on your Knn expressions.The Knn Class
TheKnn class performs K-nearest neighbor search to find similar vectors. It’s the primary way to add vector similarity scoring to your searches.
Sparse embeddings: To search custom sparse embedding fields, you must first configure a sparse vector index in your collection schema. See Sparse Vector Search Setup for configuration instructions.
Knn Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str, List[float], SparseVector, or np.ndarray | Required | The query text or vector to search with |
key | str | "#embedding" | Field to search - "#embedding" for dense embeddings, or a metadata field name for sparse embeddings |
limit | int | 16 | Maximum number of candidates to consider |
default | float or None | None | Score for documents not in KNN results |
return_rank | bool | False | If True, return rank position (0, 1, 2…) instead of distance |
"#embedding" (or K.EMBEDDING) refers to the default embedding field where Chroma stores dense embeddings. Sparse embeddings must be stored in metadata under a consistent key.Query Formats
Text Queries
Dense Vectors
Sparse Vectors
Embedding Fields
Chroma currently supports:- Dense embeddings - Stored in the default embedding field (
"#embedding"orK.EMBEDDING) - Sparse embeddings - Can be stored in metadata under a consistent key
Currently, dense embeddings can only be stored in the default embedding field (
#embedding). Only sparse vector embeddings can be stored in metadata, and they must be stored consistently under the same key across all documents. Additionally, only one sparse vector index is allowed per collection in metadata.Support for multiple dense embedding fields and multiple sparse vector indices is coming in a future release. This will allow you to store and query multiple embeddings per document, with optimized indexing for each field.
Arithmetic Operations
Supported operators:+- Addition-- Subtraction*- Multiplication/- Division-(unary) - Negation
Numbers in expressions are automatically converted to
Val constants. For example, Knn(query=v) * 0.5 is equivalent to Knn(query=v) * Val(0.5).Mathematical Functions
Supported functions:exp()- Exponential (e^x)log()- Natural logarithmabs()- Absolute valuemin()- Minimum of two valuesmax()- Maximum of two values
Val for Constant Values
TheVal class represents constant values in ranking expressions. Numbers are automatically converted to Val, but you can use it explicitly for clarity.
Combining Ranking Expressions
You can combine multiple Knn searches using arithmetic operations for custom scoring strategies.For advanced hybrid search combining multiple ranking strategies, consider using RRF (Reciprocal Rank Fusion) which is specifically designed for this purpose.
Dictionary Syntax
You can also construct ranking expressions using dictionary syntax. This is useful when building ranking expressions programmatically. Supported dictionary operators:$knn- K-nearest neighbor search$val- Constant value$sum- Addition of multiple ranks$sub- Subtraction (left - right)$mul- Multiplication of multiple ranks$div- Division (left / right)$abs- Absolute value$exp- Exponential$log- Natural logarithm$max- Maximum of multiple ranks$min- Minimum of multiple ranks
Understanding Scores
- Lower scores = better matches - Chroma uses distance-based scoring
- Score range - Depends on your embedding model and distance metric
- No ranking - When
rank=None, results are returned in natural storage order - Distance vs similarity - Scores represent distance; for similarity, use
1 - score(for normalized embeddings)
Edge Cases and Important Behavior
Default Ranking
When no ranking is specified (rank=None), results are returned in index order (typically insertion order). This is useful when you only need filtering without scoring.
Combining Knn Expressions with default=None
Documents must appear in at least oneKnn’s results to be candidates, AND must appear in ALL Knn results where default=None.
Vector Dimension Mismatch
Query vectors must match the dimension of the indexed embeddings. Mismatched dimensions will result in an error.The return_rank Parameter
Setreturn_rank=True when using Knn with RRF to get rank positions (0, 1, 2…) instead of distances.
The limit Parameter
Thelimit parameter in Knn controls how many candidates are considered, not the final result count. Use Search.limit() to control the number of results returned.
Complete Example
Here’s a practical example combining different ranking features:Tips and Best Practices
- Normalize your vectors - Ensure consistent scoring by normalizing query vectors
- Use appropriate limit values - Higher limits in Knn mean more accurate but slower results
- Set return_rank=True for RRF - Essential when using Reciprocal Rank Fusion
- Test score ranges - Understand your model’s typical score ranges for better thresholding
- Combine strategies wisely - Linear combinations work well for similar score ranges
Next Steps
- Learn about Group By & Aggregation to diversify search results by category
- Learn about hybrid search with RRF for advanced ranking strategies
- See practical examples of ranking in real-world scenarios
- Explore batch operations for multiple searches