Full Text Search and Regex
The where_document argument in get and query is used to filter records based on their document content.
We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.
For example, here we get all records whose document contains a search string:
collection.get(
where_document={"$contains": "search string"}
)
Note: Full-text search is case-sensitive.
Here we get all records whose documents matches the regex pattern for an email address:
collection.get(
where_document={
"$regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
}
)
Using Logical Operators#
You can also use the logical operators $and and $or to combine multiple filters.
An $and operator will return results that match all the filters in the list:
collection.query(
query_texts=["query1", "query2"],
where_document={
"$and": [
{"$contains": "search_string_1"},
{"$regex": "[a-z]+"},
]
}
)
An $or operator will return results that match any of the filters in the list:
collection.query(
query_texts=["query1", "query2"],
where_document={
"$or": [
{"$contains": "search_string_1"},
{"$not_contains": "search_string_2"},
]
}
)
Combining with Metadata Filtering#
.get and .query can handle where_document search combined with metadata filtering:
collection.query(
query_texts=["doc10", "thus spake zarathustra", ...],
n_results=10,
where={"metadata_field": "is_equal_to_this"},
where_document={"$contains":"search_string"}
)