Summary Memory

ConversationSummaryMemory uses an LLM to compress older messages into a running summary. Recent messages are kept verbatim, while everything beyond a buffer_size threshold is summarized into a single system message.

Usage

use std::sync::Arc;
use synaptic::memory::{ConversationSummaryMemory, InMemoryStore};
use synaptic::core::{MemoryStore, Message, ChatModel};

// You need a ChatModel to generate summaries
let model: Arc<dyn ChatModel> = Arc::new(my_model);
let store = Arc::new(InMemoryStore::new());

// Keep the last 4 messages verbatim; summarize older ones
let memory = ConversationSummaryMemory::new(store, model, 4);

let session = "user-1";

// As messages accumulate beyond buffer_size * 2, summarization triggers
memory.append(session, Message::human("Tell me about Rust.")).await?;
memory.append(session, Message::ai("Rust is a systems programming language...")).await?;
memory.append(session, Message::human("What about ownership?")).await?;
memory.append(session, Message::ai("Ownership is Rust's core memory model...")).await?;
// ... more messages ...

let history = memory.load(session).await?;
// If summarization has occurred, history starts with a system message
// containing the summary, followed by the most recent messages.

How It Works

append() stores the message in the underlying store, then checks the total message count.
When the count exceeds buffer_size * 2, the strategy splits messages into "older" and "recent" (the last buffer_size messages).
The older messages are sent to the ChatModel with a prompt asking for a concise summary. If a previous summary already exists, it is included as context for the new summary.
The store is cleared and repopulated with only the recent messages.
load() returns the stored messages, prepended with a system message containing the summary text (if one exists):
```
Summary of earlier conversation: <summary text>
```
clear() removes both the stored messages and the summary for the session.

Parameters

Parameter	Type	Description
`store`	`Arc<dyn MemoryStore>`	The backing store for raw messages
`model`	`Arc<dyn ChatModel>`	The LLM used to generate summaries
`buffer_size`	`usize`	Number of recent messages to keep verbatim

When to Use

Summary memory is a good fit when:

Conversations are very long and you need to preserve context from the entire history.
You can afford the additional LLM call for summarization (it only triggers when the buffer overflows, not on every append).
You want roughly constant token usage regardless of how long the conversation runs.

Trade-offs

Lossy compression -- the summary is generated by an LLM, so specific details from older messages may be lost or distorted.
Additional LLM cost -- each summarization step makes a separate ChatModel call. The model used for summarization can be a smaller, cheaper model than your primary model.
Latency -- the append() call that triggers summarization will be slower than usual due to the LLM round-trip.

If you want exact recent messages with no LLM calls, use Window Memory or Token Buffer Memory. For a hybrid approach that balances exact recall of recent messages with summarized older history, see Summary Buffer Memory.