Summary Buffer Memory

ConversationSummaryBufferMemory is a hybrid strategy that combines the strengths of Summary Memory and Token Buffer Memory. Recent messages are kept verbatim, while older messages are compressed into a running LLM-generated summary when the total estimated token count exceeds a configurable threshold.

Usage

use std::sync::Arc;
use synaptic::memory::{ConversationSummaryBufferMemory, InMemoryStore};
use synaptic::core::{MemoryStore, Message, ChatModel};

let model: Arc<dyn ChatModel> = Arc::new(my_model);
let store = Arc::new(InMemoryStore::new());

// Summarize older messages when total tokens exceed 500
let memory = ConversationSummaryBufferMemory::new(store, model, 500);

let session = "user-1";

memory.append(session, Message::human("What is Rust?")).await?;
memory.append(session, Message::ai("Rust is a systems programming language...")).await?;
memory.append(session, Message::human("How does ownership work?")).await?;
memory.append(session, Message::ai("Ownership is a set of rules...")).await?;
// ... as conversation grows and exceeds 500 estimated tokens,
// older messages are summarized automatically ...

let history = memory.load(session).await?;
// history = [System("Summary of earlier conversation: ..."), recent messages...]

How It Works

append() stores the new message, then estimates the total token count across all stored messages.
When the total exceeds max_token_limit and there is more than one message:
- A split point is calculated: recent messages that fit within half the token limit are kept verbatim.
- All messages before the split point are summarized by the ChatModel. If a previous summary exists, it is included as context.
- The store is cleared and repopulated with only the recent messages.
load() returns the stored messages, prepended with a system message containing the summary (if one exists):
```
Summary of earlier conversation: <summary text>
```
clear() removes both stored messages and the summary for the session.

Parameters

Parameter	Type	Description
`store`	`Arc<dyn MemoryStore>`	The backing store for raw messages
`model`	`Arc<dyn ChatModel>`	The LLM used to generate summaries
`max_token_limit`	`usize`	Token threshold that triggers summarization

Token Estimation

Like ConversationTokenBufferMemory, this strategy estimates tokens at approximately 4 characters per token (with a minimum of 1). The same heuristic caveat applies: actual token counts will vary by model.

When to Use

Summary buffer memory is the recommended strategy when:

Conversations are long and you need both exact recent context and compressed older context.
You want to stay within a token budget while preserving as much information as possible.
The additional cost of occasional LLM summarization calls is acceptable.

This is the closest equivalent to LangChain's ConversationSummaryBufferMemory and is generally the best default choice for production chatbots.

Trade-offs

LLM cost on overflow -- summarization only triggers when the token limit is exceeded, but each summarization call adds latency and cost.
Lossy for old messages -- details from older messages may be lost in the summary, though recent messages are always exact.
Heuristic token counting -- the split point is based on estimated tokens, not exact counts.

Offline Testing with ScriptedChatModel

Use ScriptedChatModel to test summarization without API keys:

use std::sync::Arc;
use synaptic::core::{ChatResponse, MemoryStore, Message};
use synaptic::models::ScriptedChatModel;
use synaptic::memory::{ConversationSummaryBufferMemory, InMemoryStore};

// Script the model to return a summary when called
let summarizer = Arc::new(ScriptedChatModel::new(vec![
    ChatResponse {
        message: Message::ai("The user asked about Rust and ownership."),
        usage: None,
    },
]));

let store = Arc::new(InMemoryStore::new());
let memory = ConversationSummaryBufferMemory::new(store, summarizer, 50);

let session = "test";

// Add enough messages to exceed the 50-token threshold
memory.append(session, Message::human("What is Rust?")).await?;
memory.append(session, Message::ai("Rust is a systems programming language focused on safety, speed, and concurrency.")).await?;
memory.append(session, Message::human("How does ownership work?")).await?;
memory.append(session, Message::ai("Ownership is a set of rules the compiler checks at compile time. Each value has a single owner.")).await?;

// Load -- older messages are now summarized
let history = memory.load(session).await?;
// history[0] is a System message with the summary
// Remaining messages are the most recent ones kept verbatim

For simpler alternatives, see Buffer Memory (keep everything), Window Memory (fixed message count), or Token Buffer Memory (token budget without summarization).

Keyboard shortcuts

Synaptic