GroupBy currently requires a ranking expression to be specified. Support for grouping without ranking is planned for a future release.
How Grouping Works
GroupBy organizes ranked results into groups based on metadata keys, then performs aggregation on each group. Currently, aggregation supportsMinK and MaxK, which select the top k results from each group based on the specified sorting keys.
After grouping and aggregation, results from all groups are flattened and sorted by score. The limit() method operates on this flattened list.
The GroupBy Class
TheGroupBy class specifies how to partition results and which records to keep from each partition.
GroupBy Parameters
| Parameter | Type | Description |
|---|---|---|
keys | Key or List[Key] | Metadata key(s) to group by |
aggregate | MinK or MaxK | Aggregation function to select top k records within each group |
Aggregation Functions
MinK
Keeps the k records with the smallest values for the specified keys. UseMinK when lower values are better (e.g., distance scores, prices, priorities).
| Parameter | Type | Description |
|---|---|---|
keys | Key or List[Key] | Key(s) to sort by in ascending order |
k | int | Number of records to keep from each group |
MaxK
Keeps the k records with the largest values for the specified keys. UseMaxK when higher values are better (e.g., ratings, relevance scores, dates).
| Parameter | Type | Description |
|---|---|---|
keys | Key or List[Key] | Key(s) to sort by in descending order |
k | int | Number of records to keep from each group |
Key References
UseK.SCORE to reference the search score, or K("field_name") for metadata fields.
Common Patterns
Single Key Grouping
Group by one metadata field and keep the top results from each group.Multiple Key Grouping
Group by combinations of metadata fields for finer-grained control.Multiple Ranking Keys with Tiebreakers
Sort within groups by multiple criteria when the primary key has ties.Dictionary Syntax
You can also construct GroupBy using dictionary syntax for programmatic query building.Edge Cases and Important Behavior
Groups with Fewer Records
If a group has fewer records than the requestedk, all records from that group are returned.
Missing Metadata Keys
Documents missing the grouping key are treated as having anull/None value for that key, and are grouped together.
Limit Still Applies
TheSearch.limit() still controls the final number of results returned after grouping. Set it high enough to include results from all groups.
Complete Example
Here’s a practical example showing diversified search results across categories:Tips and Best Practices
- Set Knn limit high enough - The Knn
limitdetermines the candidate pool before grouping. Set it high enough to include candidates from all groups you want represented. - Use MinK with scores - Since Chroma uses distance-based scoring (lower is better), use
MinKwithK.SCOREto get the most relevant results per group. - Use MaxK for user-defined metrics - For metadata fields where higher is better (ratings, popularity), use
MaxK. - Combine with filtering - Use
.where()to filter before grouping to reduce the candidate pool to relevant documents. - Account for group size variance - Groups may return fewer than
kresults if they don’t have enough matching documents.
Next Steps
- Learn about ranking expressions to control how documents are scored before grouping
- See Filtering with Where to narrow down candidates before grouping
- Explore batch operations to run multiple grouped searches at once