Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Token Buffer Memory

ConversationTokenBufferMemory keeps the most recent messages that fit within a token budget. On load(), the oldest messages are dropped until the total estimated token count is at or below max_tokens.

Usage

use std::sync::Arc;
use synaptic::memory::{ConversationTokenBufferMemory, InMemoryStore};
use synaptic::core::{MemoryStore, Message};

let store = Arc::new(InMemoryStore::new());

// Keep messages within a 200-token budget
let memory = ConversationTokenBufferMemory::new(store, 200);

let session = "user-1";

memory.append(session, Message::human("Hello!")).await?;
memory.append(session, Message::ai("Hi! How can I help?")).await?;
memory.append(session, Message::human("Tell me a long story about Rust.")).await?;
memory.append(session, Message::ai("Rust began as a personal project...")).await?;

let history = memory.load(session).await?;
// Only messages that fit within 200 estimated tokens are returned.
// Oldest messages are dropped first.

How It Works

  • append() stores every message in the underlying MemoryStore without modification.
  • load() retrieves all messages, estimates their total token count, and removes the oldest messages one by one until the total fits within max_tokens.
  • clear() removes all messages from the underlying store for the session.

Token Estimation

Synaptic uses a simple heuristic of approximately 4 characters per token, with a minimum of 1 token per message:

fn estimate_tokens(text: &str) -> usize {
    text.len() / 4 + 1
}

This is a rough approximation. Actual token counts vary by model and tokenizer. The heuristic is intentionally conservative (slightly overestimates) to avoid exceeding real token limits.

Parameters

ParameterTypeDescription
storeArc<dyn MemoryStore>The backing store for raw messages
max_tokensusizeMaximum estimated tokens to return from load()

When to Use

Token buffer memory is a good fit when:

  • You need to control prompt size in token terms rather than message count.
  • You want to stay within a model's context window without manually counting messages.
  • You prefer a simple, no-LLM-call strategy for managing memory size.

Trade-offs

  • Approximate -- the token estimate is a heuristic, not an exact count. For precise token budgeting, you would need a model-specific tokenizer.
  • Hard cutoff -- dropped messages are lost entirely. There is no summary or compressed representation of older history.
  • Drops whole messages -- if a single message is very long, it may consume most of the budget by itself.

For a fixed message count instead of a token budget, see Window Memory. For a strategy that preserves older context through summarization, see Summary Memory or Summary Buffer Memory.