How Ranking Works
A ranking expression determines which documents are scored and how they’re ordered:Expression Evaluation Process
-
No ranking (
rank=None): Documents are returned in index order (typically insertion order) -
With ranking expression:
- Must contain at least one
Knnexpression - Documents must appear in at least one
Knn’s top-k results to be considered - Documents must also appear in ALL
Knnresults wheredefault=None - Documents missing from a
Knnwith adefaultvalue get that default score - Each
Knnconsiders its toplimitcandidates (default: 16) - Documents are sorted by score (ascending - lower scores first)
- Final results based on
Search.limit()
- Must contain at least one
Document Selection and Scoring
The Knn Class
TheKnn class performs K-nearest neighbor search to find similar vectors. It’s the primary way to add vector similarity scoring to your searches.
Knn Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str, List[float], SparseVector, or np.ndarray | Required | The query text or vector to search with |
key | str | "#embedding" | Field to search - "#embedding" for dense embeddings, or a metadata field name for sparse embeddings |
limit | int | 16 | Maximum number of candidates to consider |
default | float or None | None | Score for documents not in KNN results |
return_rank | bool | False | If True, return rank position (0, 1, 2…) instead of distance |
Query Formats
Text Queries
Dense Vectors
Sparse Vectors
Embedding Fields
Chroma currently supports:- Dense embeddings - Stored in the default embedding field (
"#embedding"orK.EMBEDDING) - Sparse embeddings - Can be stored in metadata under a consistent key
Arithmetic Operations
Supported operators:+- Addition-- Subtraction*- Multiplication/- Division-(unary) - Negation
Mathematical Functions
Supported functions:exp()- Exponential (e^x)log()- Natural logarithmabs()- Absolute valuemin()- Minimum of two valuesmax()- Maximum of two values
Val for Constant Values
TheVal class represents constant values in ranking expressions. Numbers are automatically converted to Val, but you can use it explicitly for clarity.
Combining Ranking Expressions
You can combine multiple Knn searches using arithmetic operations for custom scoring strategies.Understanding Scores
- Lower scores = better matches - Chroma uses distance-based scoring
- Score range - Depends on your embedding model and distance metric
- No ranking - When
rank=None, results are returned in natural storage order - Distance vs similarity - Scores represent distance; for similarity, use
1 - score(for normalized embeddings)
Edge Cases and Important Behavior
Default Ranking
When no ranking is specified (rank=None), results are returned in index order (typically insertion order). This is useful when you only need filtering without scoring.
Combining Knn Expressions with default=None
Documents must appear in at least oneKnn’s results to be candidates, AND must appear in ALL Knn results where default=None.
Vector Dimension Mismatch
Query vectors must match the dimension of the indexed embeddings. Mismatched dimensions will result in an error.The return_rank Parameter
Setreturn_rank=True when using Knn with RRF to get rank positions (0, 1, 2…) instead of distances.
The limit Parameter
Thelimit parameter in Knn controls how many candidates are considered, not the final result count. Use Search.limit() to control the number of results returned.
Complete Example
Here’s a practical example combining different ranking features:Tips and Best Practices
- Normalize your vectors - Ensure consistent scoring by normalizing query vectors
- Use appropriate limit values - Higher limits in Knn mean more accurate but slower results
- Set return_rank=True for RRF - Essential when using Reciprocal Rank Fusion
- Test score ranges - Understand your model’s typical score ranges for better thresholding
- Combine strategies wisely - Linear combinations work well for similar score ranges
Next Steps
- Learn about Group By & Aggregation to diversify search results by category
- Learn about hybrid search with RRF for advanced ranking strategies
- See practical examples of ranking in real-world scenarios
- Explore batch operations for multiple searches