Filtering with Where

The Key/K Class

The Key class (aliased as K for brevity) provides a fluent interface for building filter expressions. Use K to reference document fields, IDs, and metadata properties.

from chromadb import K

# K is an alias for Key - use K for more concise code
# Filter by metadata field
K("status") == "active"

# Filter by document content
K.DOCUMENT.contains("machine learning")

# Filter by document IDs
K.ID.is_in(["doc1", "doc2", "doc3"])

Filterable Fields

Field	Usage	Description
`K.ID`	`K.ID.is_in(["id1", "id2"])`	Filter by document IDs
`K.DOCUMENT`	`K.DOCUMENT.contains("text")`	Filter by document content
`K("field_name")`	`K("status") == "active"`	Filter by any metadata field

Comparison Operators

Supported operators:

== - Equality (all types: string, numeric, boolean)
!= - Inequality (all types: string, numeric, boolean)
> - Greater than (numeric only)
>= - Greater than or equal (numeric only)
< - Less than (numeric only)
<= - Less than or equal (numeric only)

# Equality and inequality (all types)
K("status") == "published"     # String equality
K("views") != 0                # Numeric inequality
K("featured") == True          # Boolean equality

# Numeric comparisons (numbers only)
K("price") > 100               # Greater than
K("rating") >= 4.5             # Greater than or equal
K("stock") < 10                # Less than
K("discount") <= 0.25          # Less than or equal

Chroma supports three data types for metadata: strings, numbers (int/float), and booleans. Order comparison operators (>, <, >=, <=) currently only work with numeric types.

Set and String Operators

Supported operators:

is_in() - Value matches any in the list
not_in() - Value doesn’t match any in the list
contains() - On K.DOCUMENT: substring search (case-sensitive). On metadata fields: checks if an array contains a scalar value.
not_contains() - On K.DOCUMENT: excludes by substring. On metadata fields: checks that an array does not contain a scalar value.
regex() - String matches regex pattern (currently K.DOCUMENT only)
not_regex() - String doesn’t match regex pattern (currently K.DOCUMENT only)

# Set membership operators (works on all fields)
K.ID.is_in(["doc1", "doc2", "doc3"])           # Match any ID in list
K("category").is_in(["tech", "science"])       # Match any category
K("status").not_in(["draft", "deleted"])       # Exclude specific values

# String content operators (K.DOCUMENT only)
K.DOCUMENT.contains("machine learning")        # Substring search in document
K.DOCUMENT.not_contains("deprecated")          # Exclude documents with text
K.DOCUMENT.regex(r"\bAPI\b")                   # Match whole word "API" in document

# Array membership operators (metadata fields)
K("tags").contains("action")                   # Array contains value
K("tags").not_contains("draft")                # Array does not contain value
K("scores").contains(42)                       # Works with numbers
K("flags").contains(True)                      # Works with booleans

# Note: String pattern matching on metadata scalar fields not yet supported
# K("title").regex(r".*Python.*")              # NOT YET SUPPORTED

String operations like contains() and regex() on K.DOCUMENT are case-sensitive by default. When used on metadata fields, contains() checks array membership rather than substring matching. The is_in() operator is efficient even with large lists.

Array Metadata

Chroma supports storing arrays of values in metadata fields. You can use contains() / not_contains() (or $contains / $not_contains in dictionary syntax) to filter records based on whether an array includes a specific scalar value.

Storing Array Metadata

Arrays can contain strings, numbers, or booleans. All elements in an array must be the same type. Empty arrays are not allowed.

collection.add(
    ids=["m1", "m2", "m3"],
    embeddings=[[1, 0, 0], [0, 1, 0], [0, 0, 1]],
    metadatas=[
        {"genres": ["action", "comedy"], "year": 2020},
        {"genres": ["drama"], "year": 2021},
        {"genres": ["action", "thriller"], "year": 2022},
    ],
)

Filtering Arrays

Use contains() to check if a metadata array includes a value, and not_contains() to check that it does not.

from chromadb import Search, K

# Find all records where genres contains "action"
search = Search().where(K("genres").contains("action"))

# Exclude records with a specific tag
search = Search().where(K("tags").not_contains("draft"))

# Works with numbers and booleans too
search = Search().where(K("scores").contains(42))

# Combine with other filters
search = Search().where(
    K("genres").contains("action") &
    (K("year") >= 2021)
)

Supported Array Types

Type	Python	TypeScript	Rust
String	`["a", "b"]`	`["a", "b"]`	`MetadataValue::StringArray(...)`
Integer	`[1, 2, 3]`	`[1, 2, 3]`	`MetadataValue::IntArray(...)`
Float	`[1.5, 2.5]`	`[1.5, 2.5]`	`MetadataValue::FloatArray(...)`
Boolean	`[true, false]`	`[true, false]`	`MetadataValue::BoolArray(...)`

The $contains value must be a scalar that matches the array’s element type. All elements in an array must be the same type, and nested arrays are not supported.

Logical Operators

Supported operators:

& - Logical AND (all conditions must match)
| - Logical OR (any condition can match)

Combine multiple conditions using these operators. Always use parentheses to ensure correct precedence.

# AND operator (&) - all conditions must match
(K("status") == "published") & (K("year") >= 2020)

# OR operator (|) - any condition can match
(K("category") == "tech") | (K("category") == "science")

# Combining with document and ID filters
(K.DOCUMENT.contains("AI")) & (K("author") == "Smith")
(K.ID.is_in(["id1", "id2"])) | (K("featured") == True)

# Complex nesting - use parentheses for clarity
(
    (K("status") == "published") &
    ((K("category") == "tech") | (K("category") == "science")) &
    (K("rating") >= 4.0)
)

Always use parentheses around each condition when using logical operators. Python’s operator precedence may not work as expected without them.

Dictionary Syntax (MongoDB-style)

You can also use dictionary syntax instead of K expressions. This is useful when building filters programmatically. Supported dictionary operators:

Direct value - Shorthand for equality
$eq - Equality
$ne - Not equal
$gt - Greater than (numeric only)
$gte - Greater than or equal (numeric only)
$lt - Less than (numeric only)
$lte - Less than or equal (numeric only)
$in - Value in list
$nin - Value not in list
$contains - On #document: substring search. On metadata fields: array contains value.
$not_contains - On #document: excludes by substring. On metadata fields: array does not contain value.
$regex - Regex match
$not_regex - Regex doesn’t match
$and - Logical AND
$or - Logical OR

# Direct equality (shorthand)
{"status": "active"}                        # Same as K("status") == "active"

# Comparison operators
{"status": {"$eq": "published"}}            # Same as K("status") == "published"
{"count": {"$ne": 0}}                       # Same as K("count") != 0
{"price": {"$gt": 100}}                     # Same as K("price") > 100 (numbers only)
{"rating": {"$gte": 4.5}}                   # Same as K("rating") >= 4.5 (numbers only)
{"stock": {"$lt": 10}}                      # Same as K("stock") < 10 (numbers only)
{"discount": {"$lte": 0.25}}                # Same as K("discount") <= 0.25 (numbers only)

# Set membership operators
{"#id": {"$in": ["id1", "id2"]}}            # Same as K.ID.is_in(["id1", "id2"])
{"category": {"$in": ["tech", "ai"]}}       # Same as K("category").is_in(["tech", "ai"])
{"status": {"$nin": ["draft", "deleted"]}}  # Same as K("status").not_in(["draft", "deleted"])

# String operators (K.DOCUMENT only)
{"#document": {"$contains": "API"}}         # Same as K.DOCUMENT.contains("API")
# {"email": {"$regex": ".*@example\\.com"}} # Not yet supported - metadata fields
# {"version": {"$not_regex": "^beta"}}      # Not yet supported - metadata fields

# Array membership operators (metadata fields)
{"genres": {"$contains": "action"}}         # Same as K("genres").contains("action")
{"genres": {"$not_contains": "draft"}}      # Same as K("genres").not_contains("draft")
{"scores": {"$contains": 42}}               # Works with numbers

# Logical operators
{"$and": [
    {"status": "published"},
    {"year": {"$gte": 2020}},
    {"#document": {"$contains": "AI"}}
]}                                          # Combines multiple conditions with AND

{"$or": [
    {"category": "tech"},
    {"category": "science"},
    {"featured": True}
]}                                          # Combines multiple conditions with OR

# Complex nested example
{
    "$and": [
        {"$or": [
            {"category": "tech"},
            {"category": "science"}
        ]},
        {"status": "published"},
        {"quality_score": {"$gte": 0.8}}
    ]
}

Each dictionary can only contain one field or one logical operator ($and/$or). For field dictionaries, only one operator is allowed per field.

Common Filtering Patterns

# Filter by specific document IDs
search = Search().where(K.ID.is_in(["doc_001", "doc_002", "doc_003"]))

# Exclude already processed documents
processed_ids = ["doc_100", "doc_101"]
search = Search().where(K.ID.not_in(processed_ids))

# Full-text search in documents
search = Search().where(K.DOCUMENT.contains("quantum computing"))

# Combine document search with metadata
search = Search().where(
    K.DOCUMENT.contains("machine learning") &
    (K("language") == "en")
)

# Price range filtering
search = Search().where(
    (K("price") >= 100) &
    (K("price") <= 500)
)

# Multi-field filtering
search = Search().where(
    (K("status") == "active") &
    (K("category").is_in(["tech", "ai", "ml"])) &
    (K("score") >= 0.8)
)

Edge Cases and Important Behavior

Missing Keys

When filtering on a metadata field that doesn’t exist for a document:

Most operators (==, >, <, >=, <=, is_in()) evaluate to false - the document won’t match
!= evaluates to true - documents without the field are considered “not equal” to any value
not_in() evaluates to true - documents without the field are not in any list

# If a document doesn't have a "category" field:
K("category") == "tech"         # false - won't match
K("category") != "tech"         # true - will match
K("category").is_in(["tech"])   # false - won't match
K("category").not_in(["tech"])  # true - will match

Mixed Types

Avoid storing different data types under the same metadata key across documents. Query behavior is undefined when comparing values of different types.

# DON'T DO THIS - undefined behavior
# Document 1: {"score": 95}      (numeric)
# Document 2: {"score": "95"}    (string)
# Document 3: {"score": true}    (boolean)

K("score") > 90  # Undefined results when mixed types exist

# DO THIS - consistent types
# All documents: {"score": <numeric>} or all {"score": <string>}

String Pattern Matching Limitations

regex() and not_regex() only work on K.DOCUMENT. These operators do not yet support metadata fields. contains() and not_contains() have different behavior depending on the field:

On K.DOCUMENT: substring search (the pattern must have at least 3 literal characters)
On metadata fields: array membership check (see Array Metadata above)

Substring matching on metadata scalar fields (e.g. checking if a string field contains a substring) is not yet supported.

# Substring search on K.DOCUMENT - works
K.DOCUMENT.contains("API")              # Works
K.DOCUMENT.regex(r"v\d\.\d\.\d")       # Works

# Array membership on metadata fields - works
K("tags").contains("action")            # Works - checks if array contains value

# Substring/regex on metadata scalar fields - NOT YET SUPPORTED
# K("title").regex(r".*Python.*")       # Not supported yet

# Pattern length requirements (for K.DOCUMENT substring search)
K.DOCUMENT.contains("API")              # 3 characters - good
K.DOCUMENT.contains("AI")               # Only 2 characters - may give incorrect results
K.DOCUMENT.regex(r"\d+")                # No literal characters - may give incorrect results

regex() and not_regex() currently only work on K.DOCUMENT. Substring matching on metadata scalar fields is not yet available. Also, patterns with fewer than 3 literal characters may return incorrect results.

Substring and regex matching on metadata scalar fields is not currently supported. Full support is coming in a future release, which will allow users to opt-in to additional indexes for string pattern matching on specific metadata fields.

Complete Example

Here’s a practical example combining different filter types:

from chromadb import Search, K, Knn

# Complex filter combining IDs, document content, and metadata
search = (Search()
    .where(
        # Exclude specific documents
        K.ID.not_in(["excluded_001", "excluded_002"]) &

        # Must contain specific content
        K.DOCUMENT.contains("artificial intelligence") &

        # Metadata conditions
        (K("status") == "published") &
        (K("quality_score") >= 0.75) &
        (
            (K("category") == "research") |
            (K("category") == "tutorial")
        ) &
        (K("year") >= 2023)
    )
    .rank(Knn(query="latest AI research developments"))
    .limit(10)
    .select(K.DOCUMENT, "title", "author", "year")
)

results = collection.search(search)

Tips and Best Practices

Use parentheses liberally when combining conditions with & and | to avoid precedence issues
Filter before ranking when possible to reduce the number of vectors to score
Be specific with ID filters - using K.ID.is_in() with a small list is very efficient
String matching is case-sensitive - normalize your data if case-insensitive matching is needed
Use the right operator - is_in() for multiple exact matches, contains() for substring search

Next Steps

Learn about ranking and scoring to order your filtered results
See practical examples of filtering in real-world scenarios
Explore batch operations for running multiple filtered searches

Features

Schema

Search API

Sync

Package Search

The Key/K Class

Filterable Fields

Comparison Operators

Set and String Operators

Array Metadata

Storing Array Metadata

Filtering Arrays

Supported Array Types

Logical Operators

Dictionary Syntax (MongoDB-style)

Common Filtering Patterns

Edge Cases and Important Behavior

Missing Keys

Mixed Types

String Pattern Matching Limitations

Complete Example

Tips and Best Practices

Next Steps

Features

Schema

Search API

Sync

Package Search

​The Key/K Class

​Filterable Fields

​Comparison Operators

​Set and String Operators

​Array Metadata

​Storing Array Metadata

​Filtering Arrays

​Supported Array Types

​Logical Operators

​Dictionary Syntax (MongoDB-style)

​Common Filtering Patterns

​Edge Cases and Important Behavior

​Missing Keys

​Mixed Types

​String Pattern Matching Limitations

​Complete Example

​Tips and Best Practices

​Next Steps

The Key/K Class

Filterable Fields

Comparison Operators

Set and String Operators

Array Metadata

Storing Array Metadata

Filtering Arrays

Supported Array Types

Logical Operators

Dictionary Syntax (MongoDB-style)

Common Filtering Patterns

Edge Cases and Important Behavior

Missing Keys

Mixed Types

String Pattern Matching Limitations

Complete Example

Tips and Best Practices

Next Steps