Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Vector Stores

This guide shows how to store and search document embeddings using Synaptic's VectorStore trait and the built-in InMemoryVectorStore.

Overview

The VectorStore trait from synaptic_vectorstores provides methods for adding, searching, and deleting documents:

#[async_trait]
pub trait VectorStore: Send + Sync {
    async fn add_documents(
        &self, docs: Vec<Document>, embeddings: &dyn Embeddings,
    ) -> Result<Vec<String>, SynapticError>;

    async fn similarity_search(
        &self, query: &str, k: usize, embeddings: &dyn Embeddings,
    ) -> Result<Vec<Document>, SynapticError>;

    async fn similarity_search_with_score(
        &self, query: &str, k: usize, embeddings: &dyn Embeddings,
    ) -> Result<Vec<(Document, f32)>, SynapticError>;

    async fn similarity_search_by_vector(
        &self, embedding: &[f32], k: usize,
    ) -> Result<Vec<Document>, SynapticError>;

    async fn delete(&self, ids: &[&str]) -> Result<(), SynapticError>;
}

The embeddings parameter is passed to each method rather than stored inside the vector store. This design lets you swap embedding providers without rebuilding the store.

InMemoryVectorStore

An in-memory vector store that uses cosine similarity for search. Backed by a RwLock<HashMap>.

Creating a store

use synaptic::vectorstores::InMemoryVectorStore;

let store = InMemoryVectorStore::new();

Adding documents

use synaptic::vectorstores::{InMemoryVectorStore, VectorStore};
use synaptic::embeddings::FakeEmbeddings;
use synaptic::retrieval::Document;

let store = InMemoryVectorStore::new();
let embeddings = FakeEmbeddings::new(128);

let docs = vec![
    Document::new("1", "Rust is a systems programming language"),
    Document::new("2", "Python is great for data science"),
    Document::new("3", "Go is designed for concurrency"),
];

let ids = store.add_documents(docs, &embeddings).await?;
// ids == ["1", "2", "3"]

Find the k most similar documents to a query:

let results = store.similarity_search("fast systems language", 2, &embeddings).await?;
for doc in &results {
    println!("{}: {}", doc.id, doc.content);
}

Search with scores

Get similarity scores alongside results (higher is more similar):

let scored = store.similarity_search_with_score("concurrency", 3, &embeddings).await?;
for (doc, score) in &scored {
    println!("{} (score: {:.3}): {}", doc.id, score, doc.content);
}

Search by vector

Search using a pre-computed embedding vector instead of a text query:

use synaptic::embeddings::Embeddings;

let query_vec = embeddings.embed_query("systems programming").await?;
let results = store.similarity_search_by_vector(&query_vec, 3).await?;

Deleting documents

store.delete(&["1", "3"]).await?;

Convenience constructors

Create a store pre-populated with documents:

use synaptic::vectorstores::InMemoryVectorStore;
use synaptic::embeddings::FakeEmbeddings;

let embeddings = FakeEmbeddings::new(128);

// From (id, content) tuples
let store = InMemoryVectorStore::from_texts(
    vec![("1", "Rust is fast"), ("2", "Python is flexible")],
    &embeddings,
).await?;

// From Document values
let store = InMemoryVectorStore::from_documents(docs, &embeddings).await?;

Maximum Marginal Relevance (MMR)

MMR search balances relevance with diversity. The lambda_mult parameter controls the trade-off:

  • 1.0 -- pure relevance (equivalent to standard similarity search)
  • 0.0 -- maximum diversity
  • 0.5 -- balanced (typical default)
let results = store.max_marginal_relevance_search(
    "programming language",
    3,        // k: number of results
    10,       // fetch_k: initial candidates to consider
    0.5,      // lambda_mult: relevance vs. diversity
    &embeddings,
).await?;

VectorStoreRetriever

VectorStoreRetriever bridges any VectorStore to the Retriever trait, making it compatible with the rest of Synaptic's retrieval infrastructure.

use std::sync::Arc;
use synaptic::vectorstores::{InMemoryVectorStore, VectorStoreRetriever};
use synaptic::embeddings::FakeEmbeddings;
use synaptic::retrieval::Retriever;

let embeddings = Arc::new(FakeEmbeddings::new(128));
let store = Arc::new(InMemoryVectorStore::new());
// ... add documents to store ...

let retriever = VectorStoreRetriever::new(store, embeddings, 5);

let results = retriever.retrieve("query", 5).await?;

MultiVectorRetriever

MultiVectorRetriever stores small child chunks in a vector store for precise retrieval, but returns the larger parent documents they came from. This gives you the best of both worlds: small chunks for accurate embedding search and full documents for LLM context.

use std::sync::Arc;
use synaptic::vectorstores::{InMemoryVectorStore, MultiVectorRetriever};
use synaptic::embeddings::FakeEmbeddings;
use synaptic::retrieval::{Document, Retriever};

let embeddings = Arc::new(FakeEmbeddings::new(128));
let store = Arc::new(InMemoryVectorStore::new());

let retriever = MultiVectorRetriever::new(store, embeddings, 3);

// Add parent documents with their child chunks
let parent = Document::new("parent-1", "Full article about Rust ownership...");
let children = vec![
    Document::new("child-1", "Ownership rules in Rust"),
    Document::new("child-2", "Borrowing and references"),
];

retriever.add_documents(parent, children).await?;

// Search finds child chunks but returns the parent
let results = retriever.retrieve("ownership", 1).await?;
assert_eq!(results[0].id, Some("parent-1".to_string()));

The id_key metadata field links children to their parent. By default it is "doc_id".

Score threshold filtering

Set a minimum similarity score. Only documents meeting the threshold are returned:

let retriever = VectorStoreRetriever::new(store, embeddings, 10)
    .with_score_threshold(0.7);

let results = retriever.retrieve("query", 10).await?;
// Only documents with cosine similarity >= 0.7 are included