Retrieval
Synaptic provides a complete Retrieval-Augmented Generation (RAG) pipeline. The pipeline follows five stages:
- Load -- ingest raw data from files, JSON, CSV, web URLs, or entire directories.
- Split -- break large documents into smaller chunks that fit within context windows.
- Embed -- convert text chunks into numerical vectors using an embedding model.
- Store -- persist embeddings in a vector store for efficient similarity search.
- Retrieve -- find the most relevant documents for a given query.
Key types
| Type | Crate | Purpose |
|---|---|---|
Document | synaptic_retrieval | A unit of text with id, content, and metadata: HashMap<String, Value> |
Loader trait | synaptic_loaders | Async trait for loading documents from various sources |
TextSplitter trait | synaptic_splitters | Splits text into chunks with optional overlap |
Embeddings trait | synaptic_embeddings | Converts text into vector representations |
VectorStore trait | synaptic_vectorstores | Stores and searches document embeddings |
Retriever trait | synaptic_retrieval | Retrieves relevant documents given a query string |
Retrievers
Synaptic ships with seven retriever implementations, each suited to different use cases:
| Retriever | Strategy |
|---|---|
VectorStoreRetriever | Wraps any VectorStore for cosine similarity search |
BM25Retriever | Okapi BM25 keyword scoring -- no embeddings required |
MultiQueryRetriever | Uses an LLM to generate query variants, retrieves for each, deduplicates |
EnsembleRetriever | Combines multiple retrievers via Reciprocal Rank Fusion |
ContextualCompressionRetriever | Post-filters retrieved documents using a DocumentCompressor |
SelfQueryRetriever | Uses an LLM to extract structured metadata filters from natural language |
ParentDocumentRetriever | Searches small child chunks but returns full parent documents |
Guides
- Document Loaders -- load data from text, JSON, CSV, files, directories, and the web
- Text Splitters -- break documents into chunks with character, recursive, markdown, or token-based strategies
- Embeddings -- embed text using OpenAI, Ollama, or deterministic fake embeddings
- Vector Stores -- store and search embeddings with
InMemoryVectorStore - BM25 Retriever -- keyword-based retrieval with Okapi BM25 scoring
- Multi-Query Retriever -- improve recall by generating multiple query perspectives
- Ensemble Retriever -- combine retrievers with Reciprocal Rank Fusion
- Contextual Compression -- post-filter results with embedding similarity thresholds
- Self-Query Retriever -- LLM-powered metadata filtering from natural language
- Parent Document Retriever -- search small chunks, return full parent documents