Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Retry & Rate Limiting

This guide shows how to add automatic retry logic and rate limiting to any ChatModel.

Retry with RetryChatModel

RetryChatModel wraps a model and automatically retries on transient failures (rate limit errors and timeouts). It uses exponential backoff between attempts.

use std::sync::Arc;
use synaptic::core::ChatModel;
use synaptic::models::{RetryChatModel, RetryPolicy};

let base_model: Arc<dyn ChatModel> = Arc::new(model);

// Use default policy: 3 attempts, 500ms base delay
let retry_model = RetryChatModel::new(base_model, RetryPolicy::default());

Custom retry policy

Configure the maximum number of attempts and the base delay for exponential backoff:

use std::time::Duration;
use synaptic::models::RetryPolicy;

let policy = RetryPolicy {
    max_attempts: 5,                         // Try up to 5 times
    base_delay: Duration::from_millis(200),  // Start with 200ms delay
};

let retry_model = RetryChatModel::new(base_model, policy);

The delay between retries follows exponential backoff: base_delay * 2^attempt. With a 200ms base delay:

AttemptDelay before retry
1st retry200ms
2nd retry400ms
3rd retry800ms
4th retry1600ms

Only retryable errors trigger retries:

  • SynapticError::RateLimit -- the provider returned a rate limit response.
  • SynapticError::Timeout -- the request timed out.

All other errors are returned immediately without retrying.

Streaming with retry

RetryChatModel also retries stream_chat() calls. If a retryable error occurs during streaming, the entire stream is retried from the beginning.

Concurrency limiting with RateLimitedChatModel

RateLimitedChatModel uses a semaphore to limit the number of concurrent requests to the underlying model:

use std::sync::Arc;
use synaptic::core::ChatModel;
use synaptic::models::RateLimitedChatModel;

let base_model: Arc<dyn ChatModel> = Arc::new(model);

// Allow at most 5 concurrent requests
let limited = RateLimitedChatModel::new(base_model, 5);

When the concurrency limit is reached, additional callers wait until a slot becomes available. This is useful for:

  • Respecting provider concurrency limits.
  • Preventing resource exhaustion in high-throughput applications.
  • Controlling costs by limiting parallel API calls.

Token bucket rate limiting with TokenBucketChatModel

TokenBucketChatModel uses a token bucket algorithm for smoother rate limiting. The bucket starts full and refills at a steady rate:

use std::sync::Arc;
use synaptic::core::ChatModel;
use synaptic::models::TokenBucketChatModel;

let base_model: Arc<dyn ChatModel> = Arc::new(model);

// Bucket capacity: 100 tokens, refill rate: 10 tokens/second
let throttled = TokenBucketChatModel::new(base_model, 100.0, 10.0);

Each chat() or stream_chat() call consumes one token from the bucket. When the bucket is empty, callers wait until a token is refilled.

Parameters:

  • capacity -- the maximum burst size. A capacity of 100 allows 100 rapid-fire requests before throttling kicks in.
  • refill_rate -- tokens added per second. A rate of 10.0 means the bucket refills at 10 tokens per second.

Token bucket vs concurrency limiting

FeatureRateLimitedChatModelTokenBucketChatModel
ControlsConcurrent requestsRequest rate over time
MechanismSemaphoreToken bucket
Burst handlingBlocks when N requests are in-flightAllows bursts up to capacity
Best forConcurrency limitsRate limits (requests/second)

Stacking wrappers

All wrappers implement ChatModel, so they compose naturally. A common pattern is retry on the outside, rate limiting on the inside:

use std::sync::Arc;
use synaptic::core::ChatModel;
use synaptic::models::{RetryChatModel, RetryPolicy, TokenBucketChatModel};

let base_model: Arc<dyn ChatModel> = Arc::new(model);

// First, apply rate limiting
let throttled: Arc<dyn ChatModel> = Arc::new(
    TokenBucketChatModel::new(base_model, 50.0, 5.0)
);

// Then, add retry on top
let reliable = RetryChatModel::new(throttled, RetryPolicy::default());

This ensures that retried requests also go through the rate limiter, preventing retry storms from overwhelming the provider.