Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Ollama

This guide shows how to use Ollama as a local chat model and embeddings provider in Synaptic. OllamaChatModel wraps the Ollama REST API and supports streaming, tool calling, and all standard ChatModel operations. Because Ollama runs locally, no API key is needed.

Setup

Add the ollama feature to your Cargo.toml:

[dependencies]
synaptic = { version = "0.2", features = ["ollama"] }

Installing Ollama

Install Ollama from ollama.com and pull a model before using the provider:

# Install Ollama (macOS)
brew install ollama

# Start the Ollama server
ollama serve

# Pull a model
ollama pull llama3.1

The default endpoint is http://localhost:11434. Make sure the Ollama server is running before sending requests.

Configuration

Create an OllamaConfig with a model name. No API key is required:

use synaptic::ollama::{OllamaConfig, OllamaChatModel};
use synaptic::models::HttpBackend;
use std::sync::Arc;

let config = OllamaConfig::new("llama3.1");
let model = OllamaChatModel::new(config, Arc::new(HttpBackend::new()));

Custom base URL

To connect to a remote Ollama instance or a non-default port:

let config = OllamaConfig::new("llama3.1")
    .with_base_url("http://192.168.1.100:11434");

Model parameters

let config = OllamaConfig::new("llama3.1")
    .with_top_p(0.9)
    .with_stop(vec!["END".to_string()])
    .with_seed(42);

Usage

OllamaChatModel implements the ChatModel trait:

use synaptic::ollama::{OllamaConfig, OllamaChatModel};
use synaptic::core::{ChatModel, ChatRequest, Message};
use synaptic::models::HttpBackend;
use std::sync::Arc;

let config = OllamaConfig::new("llama3.1");
let model = OllamaChatModel::new(config, Arc::new(HttpBackend::new()));

let request = ChatRequest::new(vec![
    Message::system("You are a helpful assistant."),
    Message::human("Explain Rust's ownership model in one sentence."),
]);

let response = model.chat(request).await?;
println!("{}", response.message.content().unwrap_or_default());

Streaming

OllamaChatModel supports native streaming via the stream_chat method. Unlike cloud providers that use SSE, Ollama uses NDJSON (newline-delimited JSON) where each line is a complete JSON object:

use futures::StreamExt;
use synaptic::core::{ChatModel, ChatRequest, Message};

let request = ChatRequest::new(vec![
    Message::human("Write a short poem about Rust."),
]);

let mut stream = model.stream_chat(request);
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if !chunk.content.is_empty() {
        print!("{}", chunk.content);
    }
}

Tool calling

Ollama models that support function calling (such as llama3.1) can use tool calling through the tool_calls array format. Synaptic maps ToolDefinition and ToolChoice to the Ollama format automatically.

use synaptic::core::{ChatModel, ChatRequest, Message, ToolDefinition, ToolChoice};

let tools = vec![ToolDefinition {
    name: "get_weather".into(),
    description: "Get the current weather for a city".into(),
    parameters: serde_json::json!({
        "type": "object",
        "properties": {
            "city": { "type": "string", "description": "City name" }
        },
        "required": ["city"]
    }),
}];

let request = ChatRequest::new(vec![
    Message::human("What is the weather in Tokyo?"),
])
.with_tools(tools)
.with_tool_choice(ToolChoice::Auto);

let response = model.chat(request).await?;

// Check if the model requested a tool call
for tc in response.message.tool_calls() {
    println!("Tool: {}, Args: {}", tc.name, tc.arguments);
}

ToolChoice variants map to Ollama's tool_choice as follows:

SynapticOllama
Auto"auto"
Required"required"
None"none"
Specific(name){"type": "function", "function": {"name": "..."}}

Reproducibility with seed

Ollama supports a seed parameter for reproducible generation. When set, the model will produce deterministic output for the same input:

let config = OllamaConfig::new("llama3.1")
    .with_seed(42);
let model = OllamaChatModel::new(config, Arc::new(HttpBackend::new()));

let request = ChatRequest::new(vec![
    Message::human("Pick a random number between 1 and 100."),
]);

// Same seed + same input = same output
let response = model.chat(request).await?;
println!("{}", response.message.content().unwrap_or_default());

Embeddings

OllamaEmbeddings provides local embedding generation through Ollama's /api/embed endpoint. Pull an embedding model first:

ollama pull nomic-embed-text

Configuration

use synaptic::ollama::{OllamaEmbeddingsConfig, OllamaEmbeddings};
use synaptic::models::HttpBackend;
use std::sync::Arc;

let config = OllamaEmbeddingsConfig::new("nomic-embed-text");
let embeddings = OllamaEmbeddings::new(config, Arc::new(HttpBackend::new()));

To connect to a remote instance:

let config = OllamaEmbeddingsConfig::new("nomic-embed-text")
    .with_base_url("http://192.168.1.100:11434");

Usage

OllamaEmbeddings implements the Embeddings trait:

use synaptic::core::Embeddings;

// Embed a single query
let vector = embeddings.embed_query("What is Rust?").await?;
println!("Dimension: {}", vector.len());

// Embed multiple documents
let vectors = embeddings.embed_documents(&["First doc", "Second doc"]).await?;
println!("Embedded {} documents", vectors.len());

Configuration reference

OllamaConfig

FieldTypeDefaultDescription
modelStringrequiredModel name (e.g. llama3.1)
base_urlString"http://localhost:11434"Ollama server URL
top_pOption<f64>NoneNucleus sampling parameter
stopOption<Vec<String>>NoneStop sequences
seedOption<u64>NoneSeed for reproducible generation

OllamaEmbeddingsConfig

FieldTypeDefaultDescription
modelStringrequiredEmbedding model name (e.g. nomic-embed-text)
base_urlString"http://localhost:11434"Ollama server URL