Summary Memory
ConversationSummaryMemory uses an LLM to compress older messages into a running summary. Recent messages are kept verbatim, while everything beyond a buffer_size threshold is summarized into a single system message.
Usage
use std::sync::Arc;
use synaptic::memory::{ConversationSummaryMemory, InMemoryStore};
use synaptic::core::{MemoryStore, Message, ChatModel};
// You need a ChatModel to generate summaries
let model: Arc<dyn ChatModel> = Arc::new(my_model);
let store = Arc::new(InMemoryStore::new());
// Keep the last 4 messages verbatim; summarize older ones
let memory = ConversationSummaryMemory::new(store, model, 4);
let session = "user-1";
// As messages accumulate beyond buffer_size * 2, summarization triggers
memory.append(session, Message::human("Tell me about Rust.")).await?;
memory.append(session, Message::ai("Rust is a systems programming language...")).await?;
memory.append(session, Message::human("What about ownership?")).await?;
memory.append(session, Message::ai("Ownership is Rust's core memory model...")).await?;
// ... more messages ...
let history = memory.load(session).await?;
// If summarization has occurred, history starts with a system message
// containing the summary, followed by the most recent messages.
How It Works
-
append()stores the message in the underlying store, then checks the total message count. -
When the count exceeds
buffer_size * 2, the strategy splits messages into "older" and "recent" (the lastbuffer_sizemessages). -
The older messages are sent to the
ChatModelwith a prompt asking for a concise summary. If a previous summary already exists, it is included as context for the new summary. -
The store is cleared and repopulated with only the recent messages.
-
load()returns the stored messages, prepended with a system message containing the summary text (if one exists):Summary of earlier conversation: <summary text> -
clear()removes both the stored messages and the summary for the session.
Parameters
| Parameter | Type | Description |
|---|---|---|
store | Arc<dyn MemoryStore> | The backing store for raw messages |
model | Arc<dyn ChatModel> | The LLM used to generate summaries |
buffer_size | usize | Number of recent messages to keep verbatim |
When to Use
Summary memory is a good fit when:
- Conversations are very long and you need to preserve context from the entire history.
- You can afford the additional LLM call for summarization (it only triggers when the buffer overflows, not on every append).
- You want roughly constant token usage regardless of how long the conversation runs.
Trade-offs
- Lossy compression -- the summary is generated by an LLM, so specific details from older messages may be lost or distorted.
- Additional LLM cost -- each summarization step makes a separate ChatModel call. The model used for summarization can be a smaller, cheaper model than your primary model.
- Latency -- the
append()call that triggers summarization will be slower than usual due to the LLM round-trip.
If you want exact recent messages with no LLM calls, use Window Memory or Token Buffer Memory. For a hybrid approach that balances exact recall of recent messages with summarized older history, see Summary Buffer Memory.