The instructor-embeddings library is another option, especially when running on a machine with a cuda-capable GPU. They are a good local alternative to OpenAI (see the Massive Text Embedding Benchmark rankings). The embedding function requires the InstructorEmbedding package. To install it, run
pip install InstructorEmbedding.
There are three models available. The default is
hkunlp/instructor-base, and for better performance you can use
hkunlp/instructor-xl. You can also specify whether to use
cpu (default) or
cuda. For example:
#uses base model and cpu
ef = embedding_functions.InstructorEmbeddingFunction()
ef = embedding_functions.InstructorEmbeddingFunction(
Keep in mind that the large and xl models are 1.5GB and 5GB respectively, and are best suited to running on a GPU.