Streaming Responses
This guide shows how to consume LLM responses as a stream of tokens, rather than waiting for the entire response to complete.
Overview
Every ChatModel in Synaptic provides two methods:
chat()-- returns a completeChatResponseonce the model finishes generating.stream_chat()-- returns aChatStream, which yieldsAIMessageChunkitems as the model produces them.
Streaming is useful for displaying partial results to users in real time.
Basic streaming
Use stream_chat() and iterate over chunks with StreamExt::next():
use futures::StreamExt;
use synaptic::core::{ChatModel, ChatRequest, Message, AIMessageChunk};
async fn stream_example(model: &dyn ChatModel) -> Result<(), Box<dyn std::error::Error>> {
let request = ChatRequest::new(vec![
Message::human("Tell me a story about a brave robot"),
]);
let mut stream = model.stream_chat(request);
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
print!("{}", chunk.content); // Print each token as it arrives
}
println!(); // Final newline
Ok(())
}
The ChatStream type is defined as:
type ChatStream<'a> = Pin<Box<dyn Stream<Item = Result<AIMessageChunk, SynapticError>> + Send + 'a>>;
Accumulating chunks into a message
AIMessageChunk supports the + and += operators for merging chunks together. After streaming completes, convert the accumulated result into a full Message:
use futures::StreamExt;
use synaptic::core::{ChatModel, ChatRequest, Message, AIMessageChunk};
async fn accumulate_stream(model: &dyn ChatModel) -> Result<Message, Box<dyn std::error::Error>> {
let request = ChatRequest::new(vec![
Message::human("Summarize Rust's ownership model"),
]);
let mut stream = model.stream_chat(request);
let mut full = AIMessageChunk::default();
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
full += chunk; // Merge content, tool_calls, usage, etc.
}
let final_message = full.into_message();
println!("Complete response: {}", final_message.content());
Ok(final_message)
}
When merging chunks:
contentstrings are concatenated.tool_callsare appended to the accumulated list.usagetoken counts are summed.- The first non-
Noneidis preserved.
Using the + operator
You can also combine two chunks with + without mutation:
let combined = chunk_a + chunk_b;
This produces a new AIMessageChunk with the merged fields from both.
Streaming with tool calls
When the model streams a response that includes tool calls, tool call data arrives across multiple chunks. After accumulation, the full tool call information is available on the resulting message:
use futures::StreamExt;
use synaptic::core::{ChatModel, ChatRequest, Message, AIMessageChunk, ToolDefinition};
use serde_json::json;
async fn stream_with_tools(model: &dyn ChatModel) -> Result<(), Box<dyn std::error::Error>> {
let tool = ToolDefinition {
name: "get_weather".to_string(),
description: "Get current weather".to_string(),
parameters: json!({"type": "object", "properties": {"city": {"type": "string"}}}),
};
let request = ChatRequest::new(vec![
Message::human("What's the weather in Paris?"),
]).with_tools(vec![tool]);
let mut stream = model.stream_chat(request);
let mut full = AIMessageChunk::default();
while let Some(chunk) = stream.next().await {
full += chunk?;
}
let message = full.into_message();
for tc in message.tool_calls() {
println!("Call tool '{}' with: {}", tc.name, tc.arguments);
}
Ok(())
}
Default streaming behavior
If a provider adapter does not implement native streaming, the default stream_chat() implementation wraps the chat() result as a single-chunk stream. This means you can always use stream_chat() regardless of provider -- you just may not get incremental token delivery from providers that do not support it natively.
Reasoning / Extended Thinking
Many modern LLMs support a "thinking" or "reasoning" mode where the model performs chain-of-thought before producing its final answer. Synaptic exposes this through the ThinkingLevel enum and the .with_thinking() builder on ChatRequest.
ThinkingLevel
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub enum ThinkingLevel {
Off,
Low,
Medium,
High,
Budget(u32), // Token budget for thinking
}
Enabling reasoning on a request
use synaptic::core::{ChatRequest, Message, ThinkingLevel};
let request = ChatRequest::new(vec![Message::human("Solve this step by step")])
.with_thinking(ThinkingLevel::High);
// Or with a specific token budget:
let request = ChatRequest::new(vec![Message::human("Complex problem")])
.with_thinking(ThinkingLevel::Budget(4096));
Provider-specific mapping
Each provider translates ThinkingLevel into its native API parameter:
| Provider | ThinkingLevel mapping |
|---|---|
| OpenAI | reasoning_effort: low/medium/high |
| Anthropic | thinking.budget_tokens |
| Gemini | thinking_config.thinking_budget |
Streaming reasoning content
During streaming, reasoning tokens arrive via the AIMessageChunk.reasoning field. You can also use the StreamingOutput::on_reasoning() callback to handle reasoning tokens as they arrive:
use futures::StreamExt;
use synaptic::core::{ChatModel, ChatRequest, Message, ThinkingLevel};
async fn stream_with_reasoning(model: &dyn ChatModel) -> Result<(), Box<dyn std::error::Error>> {
let request = ChatRequest::new(vec![
Message::human("Explain why the sky is blue, step by step"),
])
.with_thinking(ThinkingLevel::High);
let mut stream = model.stream_chat(request);
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
if !chunk.reasoning.is_empty() {
eprint!("[thinking] {}", chunk.reasoning);
}
if !chunk.content.is_empty() {
print!("{}", chunk.content);
}
}
println!();
Ok(())
}
When using ThinkingLevel::Off (the default), no reasoning tokens are produced and the reasoning field remains empty.