Skip to main content
Chroma provides a convenient wrapper around Perplexity’s embedding API. This embedding function runs remotely on Perplexity’s servers, and requires an API key. You can get an API key by signing up for an account at Perplexity.
This embedding function relies on the perplexityai python package, which you can install with pip install perplexityai.
import chromadb.utils.embedding_functions as embedding_functions

perplexity_ef = embedding_functions.PerplexityEmbeddingFunction(
    api_key="YOUR_API_KEY",
    model_name="pplx-embed-v1-4b"
)

perplexity_ef(input=["document1", "document2"])

Semantic Search with Chroma

Here’s a complete example of using Perplexity embeddings with Chroma for semantic search:
import chromadb
import chromadb.utils.embedding_functions as embedding_functions

# Initialize the embedding function
perplexity_ef = embedding_functions.PerplexityEmbeddingFunction(
    api_key="YOUR_API_KEY",
    model_name="pplx-embed-v1-4b"
)

# Create a Chroma client and collection
client = chromadb.Client()
collection = client.create_collection(
    name="my_documents",
    embedding_function=perplexity_ef
)

# Add documents
documents = [
    "Python is a versatile programming language",
    "Machine learning automates analytical model building",
    "The Eiffel Tower is located in Paris, France"
]

collection.add(
    documents=documents,
    ids=["doc1", "doc2", "doc3"]
)

# Query for similar documents
results = collection.query(
    query_texts=["What programming languages are good for data science?"],
    n_results=2
)

print("Search results:")
for doc, distance in zip(results["documents"][0], results["distances"][0]):
    print(f"  {distance:.4f}: {doc}")

Available Models

Perplexity offers two embedding models:
ModelDimensionsContext WindowPrice
pplx-embed-v1-0.6b102432K tokens$0.004/1M tokens
pplx-embed-v1-4b256032K tokens$0.03/1M tokens

Matryoshka Dimensions

Both models support Matryoshka Representation Learning, allowing you to reduce embedding dimensions while maintaining quality. This is useful for reducing storage costs and improving search speed.
# Reduce dimensions from 2560 to 512 for the 4b model
perplexity_ef = embedding_functions.PerplexityEmbeddingFunction(
    api_key="YOUR_API_KEY",
    model_name="pplx-embed-v1-4b",
    dimensions=512
)

embeddings = perplexity_ef(input=["document1", "document2"])
print(len(embeddings[0]))  # 512
Supported dimension ranges:
  • pplx-embed-v1-0.6b: 128 to 1024
  • pplx-embed-v1-4b: 128 to 2560
For more details on Perplexity’s embedding models, check the documentation.