Getting Started
Chroma is an AI-native open-source vector database. It comes with everything you need to get started built-in, and runs on your machine.
For production, Chroma offers Chroma Cloud - a fast, scalable, and serverless database-as-a-service. Get started in 30 seconds - $5 in free credits included.
Install with AI#
Give the following prompt to Claude Code, Cursor, Codex, or your favorite AI agent. It will quickly set you up with Chroma.
In this directory create a new Python project with Chroma set up. Use a virtual environment. Write a small example that adds some data to a collection and queries it. Do not delete the data from the collection when it's complete. Run the script when you are done setting up the environment and writing the script. The output should show what data was ingested, what was the query, and the results. Your own summary should include this output so the user can see it. First, install `chromadb`. The project should be set up with Chroma Cloud. When you install `chromadb`, you get access to the Chroma CLI. You can run `chroma login` to authenticate. This will open a browser for authentication and save a connection profile locally. You can also use `chroma profile show` to see if the user already has an active profile saved locally. If so, you can skip the login step. Then create a DB using the CLI with `chroma db create chroma-getting-started`. This will create a DB with this name. Then use the CLI command `chroma db connect chroma-getting-started --env-file`. This will create a .env file in the current directory with the connection variables for this DB and account, so the CloudClient can be instantiated with chromadb.CloudClient(api_key=os.getenv("CHROMA_API_KEY"), ...).
Install Manually#
Install
pip install chromadb
Create a Chroma Client
import chromadb
chroma_client = chromadb.Client()
Create a collection
Collections are where you'll store your embeddings, documents, and any additional metadata. Collections index your embeddings and documents, and enable efficient retrieval and filtering. You can create a collection with a name:
collection = chroma_client.create_collection(name="my_collection")
Add some text documents to the collection
Chroma will store your text and handle embedding and indexing automatically. You can also customize the embedding model. You must provide unique string IDs for your documents.
collection.add(
ids=["id1", "id2"],
documents=[
"This is a document about pineapple",
"This is a document about oranges"
]
)
Query the collection
You can query the collection with a list of query texts, and Chroma will return the n most similar results. It's that easy!
results = collection.query(
query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
n_results=2 # how many results to return
)
print(results)
If n_results is not provided, Chroma will return 10 results by default. Here we only added 2 documents, so we set n_results=2.
Inspect Results
From the above - you can see that our query about hawaii is semantically most similar to the document about pineapple.
{
'documents': [[
'This is a document about pineapple',
'This is a document about oranges'
]],
'ids': [['id1', 'id2']],
'distances': [[1.0404009819030762, 1.243080496788025]],
'uris': None,
'data': None,
'metadatas': [[None, None]],
'embeddings': None,
}
Try it out yourself
What if we tried querying with "This is a document about florida"? Here is a full example.
import chromadb
chroma_client = chromadb.Client()
# switch `create_collection` to `get_or_create_collection` to avoid creating a new collection every time
collection = chroma_client.get_or_create_collection(name="my_collection")
# switch `add` to `upsert` to avoid adding the same documents every time
collection.upsert(
documents=[
"This is a document about pineapple",
"This is a document about oranges"
],
ids=["id1", "id2"]
)
results = collection.query(
query_texts=["This is a query document about florida"], # Chroma will embed this for you
n_results=2 # how many results to return
)
print(results)
Next steps#
In this guide we used Chroma's ephemeral client for simplicity. It starts a Chroma server in-memory, so any data you ingest will be lost when your program terminates. You can use the persistent client or run Chroma in client-server mode if you need data persistence.
- Learn how to Deploy Chroma to a server
- Join Chroma's Discord Community to ask questions and get help
- Follow Chroma on X (@trychroma) for updates