Full Text Search and Regex

The where_document argument in get and query is used to filter records based on their document content.

We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.

For example, here we get all records whose document contains a search string:

Python
collection.get( where_document={"$contains": "search string"} )

Note: Full-text search is case-sensitive.

Here we get all records whose documents matches the regex pattern for an email address:

Python
collection.get( where_document={ "$regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" } )

Using Logical Operators#

You can also use the logical operators $and and $or to combine multiple filters.

An $and operator will return results that match all the filters in the list:

Python
collection.query( query_texts=["query1", "query2"], where_document={ "$and": [ {"$contains": "search_string_1"}, {"$regex": "[a-z]+"}, ] } )

An $or operator will return results that match any of the filters in the list:

Python
collection.query( query_texts=["query1", "query2"], where_document={ "$or": [ {"$contains": "search_string_1"}, {"$not_contains": "search_string_2"}, ] } )

Combining with Metadata Filtering#

.get and .query can handle where_document search combined with metadata filtering:

Python
collection.query( query_texts=["doc10", "thus spake zarathustra", ...], n_results=10, where={"metadata_field": "is_equal_to_this"}, where_document={"$contains":"search_string"} )