Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fireworks AI

Fireworks AI delivers the fastest open-source model inference available, with sub-100ms time-to-first-token for popular models. It uses an OpenAI-compatible API and supports Llama, DeepSeek, Qwen, and other leading open models.

Fireworks AI is available as a compatibility submodule inside synaptic-models. No separate crate is needed.

Setup

[dependencies]
synaptic = { version = "0.4", features = ["openai"] }

Sign up at fireworks.ai to obtain an API key (prefixed with fw-).

Configuration

use synaptic::openai::compat::fireworks::{self, FireworksModel};
use synaptic::models::HttpBackend;
use std::sync::Arc;

let model = fireworks::chat_model("fw-your-api-key", FireworksModel::Llama3_1_70bInstruct.to_string(), Arc::new(HttpBackend::new()));

Builder methods

Use OpenAiConfig builder methods for customization:

use synaptic::openai::compat::fireworks::{self, FireworksModel};
use synaptic::openai::OpenAiChatModel;
use synaptic::models::HttpBackend;
use std::sync::Arc;

let config = fireworks::config("fw-your-api-key", FireworksModel::Llama3_1_70bInstruct.to_string())
    .with_temperature(0.7)
    .with_max_tokens(4096)
    .with_top_p(0.95);

let model = OpenAiChatModel::new(config, Arc::new(HttpBackend::new()));

Available Models

Enum VariantAPI Model IDBest For
Llama3_1_70bInstructaccounts/fireworks/models/llama-v3p1-70b-instructGeneral purpose (recommended)
Llama3_1_8bInstructaccounts/fireworks/models/llama-v3p1-8b-instructFastest, most cost-effective
DeepSeekR1accounts/fireworks/models/deepseek-r1Reasoning tasks
Qwen2_5_72bInstructaccounts/fireworks/models/qwen2p5-72b-instructMultilingual
Custom(String)(any)Unlisted / preview models

Usage

use synaptic::openai::compat::fireworks::{self, FireworksModel};
use synaptic::core::{ChatModel, ChatRequest, Message};
use synaptic::models::HttpBackend;
use std::sync::Arc;

let model = fireworks::chat_model("fw-your-api-key", FireworksModel::Llama3_1_70bInstruct.to_string(), Arc::new(HttpBackend::new()));

let request = ChatRequest::new(vec![
    Message::system("You are a helpful assistant."),
    Message::human("Explain the difference between async and threading in Rust."),
]);

let response = model.chat(request).await?;
println!("{}", response.message.content());

Streaming

use futures::StreamExt;

let request = ChatRequest::new(vec![
    Message::human("Write a haiku about Rust programming."),
]);

let mut stream = model.stream_chat(request);
while let Some(chunk) = stream.next().await {
    print!("{}", chunk?.content);
}
println!();

Configuration Reference

All configuration is done through OpenAiConfig builder methods. See the OpenAI-Compatible Providers page for the full reference.

MethodDescription
.with_temperature(f64)Sampling temperature (0.0-2.0)
.with_max_tokens(u32)Maximum tokens to generate
.with_top_p(f64)Nucleus sampling threshold
.with_stop(Vec<String>)Stop sequences