Retrieval

Synaptic provides a complete Retrieval-Augmented Generation (RAG) pipeline. The pipeline follows five stages:

Load -- ingest raw data from files, JSON, CSV, web URLs, or entire directories.
Split -- break large documents into smaller chunks that fit within context windows.
Embed -- convert text chunks into numerical vectors using an embedding model.
Store -- persist embeddings in a vector store for efficient similarity search.
Retrieve -- find the most relevant documents for a given query.

Key types

Type	Crate	Purpose
`Document`	`synaptic_retrieval`	A unit of text with `id`, `content`, and `metadata: HashMap<String, Value>`
`Loader` trait	`synaptic_loaders`	Async trait for loading documents from various sources
`TextSplitter` trait	`synaptic_splitters`	Splits text into chunks with optional overlap
`Embeddings` trait	`synaptic_embeddings`	Converts text into vector representations
`VectorStore` trait	`synaptic_vectorstores`	Stores and searches document embeddings
`Retriever` trait	`synaptic_retrieval`	Retrieves relevant documents given a query string

Synaptic ships with seven retriever implementations, each suited to different use cases:

Retriever	Strategy
`VectorStoreRetriever`	Wraps any `VectorStore` for cosine similarity search
`BM25Retriever`	Okapi BM25 keyword scoring -- no embeddings required
`MultiQueryRetriever`	Uses an LLM to generate query variants, retrieves for each, deduplicates
`EnsembleRetriever`	Combines multiple retrievers via Reciprocal Rank Fusion
`ContextualCompressionRetriever`	Post-filters retrieved documents using a `DocumentCompressor`
`SelfQueryRetriever`	Uses an LLM to extract structured metadata filters from natural language
`ParentDocumentRetriever`	Searches small child chunks but returns full parent documents

Document Loaders -- load data from text, JSON, CSV, files, directories, and the web
Text Splitters -- break documents into chunks with character, recursive, markdown, or token-based strategies
Embeddings -- embed text using OpenAI, Ollama, or deterministic fake embeddings
Vector Stores -- store and search embeddings with InMemoryVectorStore
BM25 Retriever -- keyword-based retrieval with Okapi BM25 scoring
Multi-Query Retriever -- improve recall by generating multiple query perspectives
Ensemble Retriever -- combine retrievers with Reciprocal Rank Fusion
Contextual Compression -- post-filter results with embedding similarity thresholds
Self-Query Retriever -- LLM-powered metadata filtering from natural language
Parent Document Retriever -- search small chunks, return full parent documents