Pagination & Field Selection

Control how many results to return and which fields to include in your search results.

Pagination with Limit#

Use limit() to control how many results to return and offset to skip results for pagination.

from chromadb import Search # Limit results search = Search().limit(10) # Return top 10 results # Pagination with offset search = Search().limit(10, offset=20) # Skip first 20, return next 10 # No limit - returns all matching results search = Search() # Be careful with large collections!

Limit Parameters#

ParameterTypeDefaultDescription
limitint or NoneNoneMaximum results to return (None = no limit)
offsetint0Number of results to skip (for pagination)

For Chroma Cloud users: The actual number of results returned will be capped by your quota limits, regardless of the limit value specified. This applies even when no limit is set.

Pagination Patterns#

# Page through results (0-indexed) page_size = 10 # Page 0: Results 1-10 page_0 = Search().limit(page_size, offset=0) # Page 1: Results 11-20 page_1 = Search().limit(page_size, offset=10) # Page 2: Results 21-30 page_2 = Search().limit(page_size, offset=20) # General formula def get_page(page_number, page_size=10): return Search().limit(page_size, offset=page_number * page_size)

Pagination uses 0-based indexing. The first page is page 0, not page 1.

Field Selection with Select#

Control which fields are returned in your results to optimize data transfer and processing.

from chromadb import Search, K # Default - returns IDs only search = Search() # Select specific fields search = Search().select(K.DOCUMENT, K.SCORE) # Select metadata fields search = Search().select("title", "author", "date") # Mix predefined and metadata fields search = Search().select(K.DOCUMENT, K.SCORE, "title", "author") # Select all available fields search = Search().select_all() # Returns: IDs, documents, embeddings, metadata, scores

Selectable Fields#

FieldInternal KeyUsageDescription
IDs#idAlways includedDocument IDs are always returned
K.DOCUMENT#document.select(K.DOCUMENT)Full document text
K.EMBEDDING#embedding.select(K.EMBEDDING)Vector embeddings
K.METADATA#metadata.select(K.METADATA)All metadata fields as a dict
K.SCORE#score.select(K.SCORE)Search scores (when ranking is used)
"field_name"(user-defined).select("title", "author")Specific metadata fields

Field constants: K.* constants (e.g., K.DOCUMENT, K.EMBEDDING, K.ID) correspond to internal keys with # prefix (e.g., #document, #embedding, #id). Use the K.* constants in queries. Internal keys like #document and #embedding are used in schema configuration, while #metadata and #score are query-only fields not used in schema.

When selecting specific metadata fields (e.g., "title"), they appear directly in the metadata dict. Using K.METADATA returns ALL metadata fields at once.

Performance Considerations#

Selecting fewer fields improves performance by reducing data transfer:

  • Minimal: IDs only (default) - fastest queries
  • Moderate: Add scores and specific metadata fields
  • Heavy: Including documents and embeddings - larger payloads
  • Maximum: select_all() - returns everything
# Fast - minimal data search = Search().limit(100) # IDs only # Moderate - just what you need search = Search().limit(100).select(K.SCORE, "title", "date") # Slower - large fields search = Search().limit(100).select(K.DOCUMENT, K.EMBEDDING) # Slowest - everything search = Search().limit(100).select_all()

Edge Cases#

No Limit Specified

Without a limit, the search attempts to return all matching results, but will be capped by quota limits in Chroma Cloud.

# Attempts to return ALL matching documents search = Search().where(K("status") == "active") # No limit() # Chroma Cloud: Results capped by quota

Empty Results

When no documents match, results will have empty lists/arrays.

Non-existent Fields

Selecting non-existent metadata fields simply omits them from the results - they won't appear in the metadata dict.

# If "non_existent_field" doesn't exist search = Search().select("title", "non_existent_field") # Result metadata will only contain "title" if it exists # "non_existent_field" will not appear in the metadata dict at all

Complete Example#

Here's a practical example combining pagination with field selection:

from chromadb import Search, K, Knn # Paginated search with field selection def search_with_pagination(collection, query_text, page_size=20): current_page = 0 while True: search = (Search() .where(K("status") == "published") .rank(Knn(query=query_text)) .limit(page_size, offset=current_page * page_size) .select(K.DOCUMENT, K.SCORE, "title", "author", "date") ) results = collection.search(search) rows = results.rows()[0] # Get first (and only) search results if not rows: # No more results break print(f"\n--- Page {current_page + 1} ---") for i, row in enumerate(rows, 1): print(f"{i}. {row['metadata']['title']} by {row['metadata']['author']}") print(f" Score: {row['score']:.3f}, Date: {row['metadata']['date']}") print(f" Preview: {row['document'][:100]}...") # Check if we want to continue user_input = input("\nPress Enter for next page, or 'q' to quit: ") if user_input.lower() == 'q': break current_page += 1

Tips and Best Practices#

  • Select only what you need - Reduces network transfer and memory usage
  • Use appropriate page sizes - 10-50 for UI, 100-500 for batch processing
  • Consider bandwidth - Avoid selecting embeddings unless necessary
  • IDs are always included - No need to explicitly select them
  • Use select_all() sparingly - Only when you truly need all fields

Next Steps#