Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Retrieval

Synaptic provides a complete Retrieval-Augmented Generation (RAG) pipeline. The pipeline follows five stages:

  1. Load -- ingest raw data from files, JSON, CSV, web URLs, or entire directories.
  2. Split -- break large documents into smaller chunks that fit within context windows.
  3. Embed -- convert text chunks into numerical vectors using an embedding model.
  4. Store -- persist embeddings in a vector store for efficient similarity search.
  5. Retrieve -- find the most relevant documents for a given query.

Key types

TypeCratePurpose
Documentsynaptic_retrievalA unit of text with id, content, and metadata: HashMap<String, Value>
Loader traitsynaptic_loadersAsync trait for loading documents from various sources
TextSplitter traitsynaptic_splittersSplits text into chunks with optional overlap
Embeddings traitsynaptic_embeddingsConverts text into vector representations
VectorStore traitsynaptic_vectorstoresStores and searches document embeddings
Retriever traitsynaptic_retrievalRetrieves relevant documents given a query string

Retrievers

Synaptic ships with seven retriever implementations, each suited to different use cases:

RetrieverStrategy
VectorStoreRetrieverWraps any VectorStore for cosine similarity search
BM25RetrieverOkapi BM25 keyword scoring -- no embeddings required
MultiQueryRetrieverUses an LLM to generate query variants, retrieves for each, deduplicates
EnsembleRetrieverCombines multiple retrievers via Reciprocal Rank Fusion
ContextualCompressionRetrieverPost-filters retrieved documents using a DocumentCompressor
SelfQueryRetrieverUses an LLM to extract structured metadata filters from natural language
ParentDocumentRetrieverSearches small child chunks but returns full parent documents

Guides