Model Backends

NOTE: We provide a single function to interface with each of these providers even though some are agents, some are LLM providers, and some are on-premises LLM servers. This interface is an agentic one, and for the non-agents, we turn the model into a (very primitive) agent, by placing it into a query->tool call->query loop.

We plan to in the near future expose an interface to the non-agent models (providers and servers), which is, instead of a primitive agent, the interface you would use to build your own agents.

For most serious tasks that don’t require on-premises LLM serving, we expect you will get better results using the agents (Claude Code, Codex, Antigravity, or OpenCode), with your preferred provider for credentials for the agent, as opposed to using the specific node for your provider (e.g. Claude Code with Bedrock credentials instead of the bedrock node).

API reference for chia.models. These pages are generated from the docstrings in the source, so they stay in sync with the code.

Claude

class chia.models.claude.ClaudeCodeQueryResult(result: str, returncode: int, stderr: str, stream_result: str, success: bool = False, session_transcript: bytes | None = None, session_transcript_path: str | None = None)[source]

Bases: QueryResult

Derived QueryResult specialized for the Claude Code CLI backend.

Carries the on-disk <session_id>.jsonl transcript bytes back to the caller so a resume_session=True LLM can continue the same conversation on a different worker — written by ClaudeCodeLLM._capture_transcript() after a CLI run and consumed by ClaudeCodeLLM._restore_transcript() before the next --resume invocation.

exception chia.models.claude.ClaudeCodeError(node_id: str, error_type: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: Exception

Base for all Claude Code CLI errors.

Every subclass must implement __reduce__ for Ray serialization.

exception chia.models.claude.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', exit_code: int = -1)[source]

Bases: ClaudeCodeError

Raised when the Claude CLI response indicates a usage-limit hit.

exception chia.models.claude.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: ClaudeCodeError

Raised when the CLI’s auth token/API key is invalid or expired.

exception chia.models.claude.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: ClaudeCodeError

Raised when the billing account has payment issues.

exception chia.models.claude.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: ClaudeCodeError

Raised when the request is malformed (bad prompt, unsupported params, etc.).

exception chia.models.claude.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '', retry_after: int | None = None)[source]

Bases: ClaudeCodeError

Raised when Anthropic’s API returns a server-side error (500/503).

exception chia.models.claude.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '', partial_text: str = '')[source]

Bases: ClaudeCodeError

Raised when the LLM’s response was truncated by the output token limit.

exception chia.models.claude.UnknownClaudeError(node_id: str, exit_code: int = -1, raw_message: str = '', stderr: str = '')[source]

Bases: ClaudeCodeError

Raised for unclassified CLI errors.

chia.models.claude.parse_rate_limit_reset(text: str) → datetime | None[source]

Parse a Claude rate-limit message and return the UTC reset time.

Expected format: "You've hit your limit · resets 4pm (America/Los_Angeles)"

Returns None when no rate-limit message is found.

chia.models.claude.parse_rate_limit_event(event: dict) → datetime | None[source]

Parse a rate_limit_event JSON object and return the UTC reset time.

Only triggers when rate_limit_info.status is "rejected" — the event is also emitted with other statuses as an informational notice, which should NOT be treated as a rate limit.

class chia.models.claude.ClaudeCodeLLM(model: str = 'claude-sonnet-4-6', system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'claude_code', logging_level: int = 10, log_dir: str | None = None, resume_session: bool = False, projects_cwd: str | None = '/home/ray/.claude/projects/-home-ray-llm-env', extra_cli_args: List[str] | None = None, log_stream: bool = True, log_all: bool = False, backend: str = 'cli', api_key: str | None = None, max_tokens: int = 16000, thinking: str | None = 'adaptive', max_tool_iterations: int = 100)[source]

Bases: LLMCallBase

Wraps the Claude Code CLI (claude --print) as an LLM backend.

Each call to prompt() spawns a claude subprocess that can optionally connect to MCP tool servers (e.g. BashTool).

prompt(user_message: str, tools: List[ChiaTool] | None = []) → ClaudeCodeQueryResult[source]

Send user_message to Claude Code CLI and return the response.

Returns:

ClaudeCodeQueryResult with success=True when the CLI ran cleanly, or success=False when every retry attempt failed (in which case result is empty and returncode is -1).

Raises:

RateLimitError – Usage limit hit — propagates immediately.
AuthenticationError – Auth failure — propagates immediately.
BillingError – Billing/payment issue — propagates immediately.
InvalidRequestError – Malformed request — propagates immediately.
ServerError – After all retries with exponential backoff.
MaxOutputTokensError – After one retry attempt.

Bedrock

Amazon Bedrock LLM backend built on the boto3 Converse API.

BedrockLLM talks to any tool-capable chat model on Amazon Bedrock (Claude, Amazon Nova, Llama, Mistral, Command R, …) It uses the Bedrock Runtime converse API, which normalises messages, system prompts, and tool use across model families, and runs the agentic tool loop client-side — executing each ChiaTool’s MCP server over HTTP exactly like chia.models.claude.ClaudeCodeLLM’s API backend.

WARNING: experimental. Only exercised by the tests in chia/models/tests/test_bedrock.py (mocked unit tests, plus opt-in live tests). Not validated in production.

Auth/config come from the standard AWS chain (env vars, shared profile, or IAM role); pass region or rely on AWS_REGION / AWS_DEFAULT_REGION. boto3 is imported lazily, so importing this module does not require it.

exception chia.models.bedrock.BedrockError(node_id: str, error_type: str, status_code: str = '', raw_message: str = '')[source]

Bases: Exception

Base for all Bedrock backend errors.

exception chia.models.bedrock.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: str = '')[source]

Bases: BedrockError

Throttling / quota exhaustion (ThrottlingException etc.).

exception chia.models.bedrock.AuthenticationError(node_id: str, status_code: str = '', raw_message: str = '')[source]

Bases: BedrockError

Invalid / expired / unauthorized AWS credentials.

exception chia.models.bedrock.InvalidRequestError(node_id: str, status_code: str = '', raw_message: str = '')[source]

Bases: BedrockError

Malformed request or unknown model (ValidationException etc.).

exception chia.models.bedrock.ServerError(node_id: str, status_code: str = '', raw_message: str = '')[source]

Bases: BedrockError

Transient service-side failure (5xx, model timeout, connection).

exception chia.models.bedrock.MaxOutputTokensError(node_id: str, status_code: str = '', raw_message: str = '', partial_text: str = '')[source]

Bases: BedrockError

The response was truncated at maxTokens.

exception chia.models.bedrock.UnknownBedrockError(node_id: str, status_code: str = '', raw_message: str = '')[source]

Bases: BedrockError

Unclassified Bedrock error.

class chia.models.bedrock.BedrockLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'bedrock_llm', logging_level: int = 10, log_dir: str | None = None, region: str | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]

Bases: LLMCallBase

Bedrock Converse-API LLM backend with client-side MCP tool execution.

Returns the same QueryResult shape as the other backends so callers are interchangeable; returncode is synthesised (0 on success, -1 when every retry fails) and stderr is unused.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via the Bedrock Converse API and return the response, retrying transient failures with the same policy the other backends use.

Vertex

Google Vertex AI LLM backend built on the google-genai SDK.

VertexGeminiLLM runs Google’s Gemini models on Vertex AI and drives the agentic tool loop client-side — executing each ChiaTool’s MCP server over HTTP, exactly like the Bedrock and Claude API backends.

Vertex has no single unified API across model families — Gemini goes through google-genai, Claude-on-Vertex through AnthropicVertex, and Llama/Mistral through OpenAI-compatible MaaS endpoints. Two separate classes are provided here, one for Gemini and one for MaaS.

WARNING: experimental. Only exercised by the tests in chia/models/tests/test_vertex.py (mocked unit tests, plus opt-in live tests). Not validated in production.

Auth/config: Vertex needs a GCP project + location and Application Default Credentials (gcloud auth application-default login, a service-account key via GOOGLE_APPLICATION_CREDENTIALS, or a workload identity). Pass project/location or rely on GOOGLE_CLOUD_PROJECT / GOOGLE_CLOUD_LOCATION. google-genai is imported lazily.

exception chia.models.vertex.VertexError(node_id: str, error_type: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: Exception

Base for all Vertex backend errors.

exception chia.models.vertex.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: int | None = None)[source]

Bases: VertexError

Quota / rate exhaustion (HTTP 429, RESOURCE_EXHAUSTED).

exception chia.models.vertex.AuthenticationError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: VertexError

Invalid / missing credentials or permission (HTTP 401 / 403).

exception chia.models.vertex.InvalidRequestError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: VertexError

Malformed request or unknown model (HTTP 400 / 404).

exception chia.models.vertex.ServerError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: VertexError

Transient service-side failure (HTTP 5xx).

exception chia.models.vertex.MaxOutputTokensError(node_id: str, status_code: int | None = None, raw_message: str = '', partial_text: str = '')[source]

Bases: VertexError

The response was truncated at max_output_tokens.

exception chia.models.vertex.UnknownVertexError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: VertexError

Unclassified Vertex error.

exception chia.models.vertex.ContentBlockedError(node_id: str, block_reason: str = '', raw_message: str = '')[source]

Bases: VertexError

The model returned no usable content because the prompt or the response was blocked (safety, recitation, blocklist, …).

This is NOT an HTTP/API error — Gemini reports it as a 200-OK response whose candidate carries a blocking finish_reason (or whose prompt_feedback carries a block_reason with no candidates). Surfacing it as a typed error keeps a block from masquerading as a successful empty answer. Never automatically retried: re-sending the same prompt will be blocked again.

class chia.models.vertex.VertexGeminiLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'vertex_gemini', logging_level: int = 10, log_dir: str | None = None, project: str | None = None, location: str | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]

Bases: LLMCallBase

Gemini-on-Vertex LLM backend with client-side MCP tool execution.

Returns the same QueryResult shape as the other backends so callers are interchangeable; returncode is synthesised (0 on success, -1 when every retry fails) and stderr is unused.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Gemini on Vertex and return the response, retrying transient failures with the same policy the other backends use.

class chia.models.vertex.VertexGenericLLM(model: str, project: str | None = None, location: str | None = None, **kwargs)[source]

Bases: OpenAICompatLLM

Non-Gemini Vertex models (Llama, Mistral, …) via the Vertex Model-as-a- Service OpenAI-compatible endpoint.

Vertex has no single unified API, so Gemini uses google-genai (VertexGeminiLLM) while the open/partner families are reached through the OpenAI-compatible MaaS endpoint — which is exactly what OpenAICompatLLM already speaks. The only Vertex-specifics are the endpoint URL (built from project/location) and auth: a GCP ADC bearer token that rotates, supplied via token_provider. Everything else — the agent loop, tool handling, error translation — is inherited unchanged.

model is the MaaS model id, e.g. meta/llama-3.1-8b-instruct-maas.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

Antigravity

Google Antigravity CLI LLM backend.

AntigravityLLM wraps Google’s Antigravity CLI (the agy binary, installed via curl -fsSL https://antigravity.google/cli/install.sh | bash) behind the same synchronous prompt shape as the other Chia LLM backends. It runs agy --print (non-interactive “print mode”), which emits the model’s final answer as plain text on stdout.

Auth is OAuth-only. agy signs in with a Google account (“Antigravity”: / Gemini Code Assist) and stores a refresh token on disk; in a container it uses file-based token storage under the Gemini config dir. There is no API-key path.

Output is plain text, not JSON. Print mode has no structured/streaming output format, so there is no per-turn tool/usage trace to parse

The system prompt is folded into the user message (print mode has no --system-prompt flag), mirroring CodexLLM.

exception chia.models.antigravity.AntigravityError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: Exception

Base for Antigravity CLI errors. Subclasses are Ray-serializable.

exception chia.models.antigravity.RateLimitError(node_id: str, reset_time: datetime | None = None, raw_message: str = '', exit_code: int = -1)[source]: Bases: AntigravityError

exception chia.models.antigravity.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

exception chia.models.antigravity.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

exception chia.models.antigravity.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

exception chia.models.antigravity.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

exception chia.models.antigravity.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

exception chia.models.antigravity.UnknownAntigravityError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: AntigravityError

chia.models.antigravity.parse_rate_limit_reset(text: str) → datetime | None[source]: Parse a human reset time such as resets 4pm (America/Los_Angeles).

class chia.models.antigravity.AntigravityLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'antigravity', logging_level: int = 10, log_dir: str | None = None, agy_bin: str = 'agy', work_dir: str | None = None, add_dirs: list[str] | None = None, gemini_dir: str | None = None, dangerously_skip_permissions: bool = True, sandbox: bool = False, extra_cli_args: list[str] | None = None)[source]

Bases: LLMCallBase

Wrap the Google Antigravity CLI (agy --print) as a Chia LLM backend.

prompt(user_message: str, tools: list[ChiaTool] | None = None) → QueryResult[source]: Send user_message to agy --print and return the response.

Codex

Codex CLI LLM backend.

CodexLLM wraps codex exec behind the same synchronous prompt shape as the other Chia LLM backends. Chia MCP tools are passed as per-run Codex config overrides, so this backend does not mutate the user’s persistent Codex configuration.

exception chia.models.codex.CodexError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: Exception

Base for Codex CLI errors. Subclasses are Ray-serializable.

exception chia.models.codex.RateLimitError(node_id: str, reset_time: datetime | None = None, raw_message: str = '', exit_code: int = -1)[source]: Bases: CodexError

exception chia.models.codex.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

exception chia.models.codex.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

exception chia.models.codex.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

exception chia.models.codex.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

exception chia.models.codex.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

exception chia.models.codex.UnknownCodexError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]: Bases: CodexError

class chia.models.codex.CodexQueryResult(result: str, returncode: int, stderr: str, stream_result: str, success: bool = False, session_id: str | None = None, session_state: dict[str, bytes] | None = None, session_state_paths: tuple[str, ...] = ())[source]

Bases: QueryResult

QueryResult specialized for the Codex CLI backend.

Codex stores resumable conversations under CODEX_HOME rather than in a portable JSONL transcript. The state fields carry the needed opaque files back to the caller so a later codex exec resume <session_id> - can run on a different Chia worker.

chia.models.codex.parse_session_id(stdout: str) → str | None[source]: Extract a Codex session id from JSONL stdout.

chia.models.codex.parse_rate_limit_reset(text: str) → datetime | None[source]: Parse a human reset time such as resets 4pm (America/Los_Angeles).

class chia.models.codex.CodexLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'codex', logging_level: int = 10, log_dir: str | None = None, codex_bin: str = 'codex', work_dir: str | None = None, extra_cli_args: list[str] | None = None, sandbox: str = 'workspace-write', approval_policy: str = 'never', dangerously_bypass_approvals_and_sandbox: bool = True, skip_git_repo_check: bool = True, ephemeral: bool = False, ignore_rules: bool = False, profile: str | None = None, reasoning_effort: str | None = None, resume_session: bool = False, auto_compact_token_limit: int | None = 200000)[source]

Bases: LLMCallBase

Wrap codex exec as a Chia LLM backend.

prompt(user_message: str, tools: list[ChiaTool] | None = None) → CodexQueryResult[source]: Send user_message to codex exec.

Opencode

opencode CLI LLM backend.

OpenCodeLLM wraps the opencode CLI (https://opencode.ai) as an LLM backend, opencode is provider-agnostic: the model is given as provider/model (e.g. anthropic/claude-sonnet-4-6) and opencode runs its own server-side agentic tool loop, so there is no client-side MCP loop here.

WARNING: experimental. Only exercised by the tests in chia/models/tests/test_opencode.py (mocked unit tests, plus opt-in live tests). Not validated in production. Auth is environment-driven: opencode uses its own stored credentials (opencode auth login) or provider env vars.

exception chia.models.opencode.OpenCodeError(node_id: str, error_type: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: Exception

Base for all opencode CLI errors.

exception chia.models.opencode.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', exit_code: int = -1)[source]

Bases: OpenCodeError

The provider behind opencode reported a usage/rate limit.

exception chia.models.opencode.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: OpenCodeError

opencode has no/invalid credentials for the selected provider.

exception chia.models.opencode.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: OpenCodeError

The provider account has a billing/payment problem.

exception chia.models.opencode.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]

Bases: OpenCodeError

Malformed request — bad model string, invalid config, unknown agent, etc.

exception chia.models.opencode.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '', retry_after: int | None = None)[source]

Bases: OpenCodeError

Transient provider/server-side failure (5xx, overloaded, connection).

exception chia.models.opencode.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '', partial_text: str = '')[source]

Bases: OpenCodeError

The response was truncated at the output token limit.

exception chia.models.opencode.UnknownOpenCodeError(node_id: str, exit_code: int = -1, raw_message: str = '', stderr: str = '')[source]

Bases: OpenCodeError

Unclassified opencode CLI error.

chia.models.opencode.parse_session_id(stdout: str) → str | None[source]

Pull the opencode session id out of run --format json stdout.

Each event is {type, sessionID, part:{sessionID, ...}}; the opening step_start line reliably carries it. Falls back to a regex scan if the JSON shape changes.

chia.models.opencode.parse_run_error(stdout: str) → dict | None[source]

Pull the first structured error out of run --format json stdout.

opencode emits {"type":"error", "sessionID":..., "error":{name, data}} events for failures that happen before/around the model request — notably an unknown model id, which surfaces as {"name":"UnknownError","data":{"message": "Model not found: ..."}}. These never reach the session export’s messages[].info.error because no assistant message is ever created, so the run stream is the only place they appear (confirmed against opencode 1.15.13). Returns the first such error dict ({name, data}), or None. Genuine provider errors (e.g. APIError 401) appear in both the run stream and the export; the export copy is richer (full responseHeaders), so callers prefer it and use this only as a fallback.

class chia.models.opencode.OpenCodeLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'opencode', logging_level: int = 10, log_dir: str | None = None, opencode_bin: str = 'opencode', agent_name: str = 'chia', work_dir: str | None = None, extra_cli_args: List[str] | None = None)[source]

Bases: LLMCallBase

Wraps the opencode CLI as an LLM backend.

Each prompt() call runs opencode run (to create a session and get its id) then opencode export (to read the assistant response + usage from opencode’s local DB). Returns the same QueryResult shape as the other backends; returncode is the run exit code.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]

Send user_message to opencode and return the response.

Returns:

QueryResult with success=True when opencode ran cleanly, or success=False when every retry attempt failed.

Raises:

RateLimitError / AuthenticationError / BillingError / –
InvalidRequestError – propagate immediately.
ServerError – after all retries with exponential backoff.
MaxOutputTokensError – after one retry attempt.

Openai Compat

OpenAI-compatible LLM backend built on the openai SDK Chat Completions API.

OpenAICompatLLM talks to any provider that implements the OpenAI Chat Completions wire format and drives the agentic tool loop client-side — executing each ChiaTool’s MCP server over HTTP, like the other backends.

It is deliberately not provider-specific. The OpenAI Chat Completions format is a de-facto multi-vendor standard, so the only provider-specific inputs are:

base_url — selects the provider’s endpoint (default: OpenAI itself).
auth — the credential for that endpoint.

The same loop/tool/parse/error code therefore covers OpenAI, Fireworks, Groq, OpenRouter, self-hosted vLLM/TGI, Vertex MaaS, etc. Each provider is configuration, not a new module.

Auth, by default, lives entirely in the environment (OPENAI_API_KEY / OPENAI_BASE_URL) — construct with no key and the SDK reads them, matching the other backends. For providers whose credential is a rotating token (Vertex MaaS GCP token, Azure AD), pass token_provider — a zero-arg callable that returns a fresh token; it’s invoked when the client is built.

WARNING: experimental. Only exercised by the tests in chia/models/tests/test_openai_compat.py. Not validated in production. openai is imported lazily, so importing this module does not require it.

exception chia.models.openai_compat.OpenAICompatError(node_id: str, error_type: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: Exception

Base for all OpenAI-compatible backend errors.

exception chia.models.openai_compat.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: int | None = None)[source]

Bases: OpenAICompatError

HTTP 429.

exception chia.models.openai_compat.AuthenticationError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: OpenAICompatError

HTTP 401 / 403.

exception chia.models.openai_compat.InvalidRequestError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: OpenAICompatError

HTTP 400 / 404.

exception chia.models.openai_compat.ServerError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: OpenAICompatError

HTTP 5xx, connection, or timeout.

exception chia.models.openai_compat.MaxOutputTokensError(node_id: str, status_code: int | None = None, raw_message: str = '', partial_text: str = '')[source]

Bases: OpenAICompatError

The response was truncated at max_tokens (finish_reason=’length’).

exception chia.models.openai_compat.ContextLengthExceededError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: InvalidRequestError

HTTP 400 (or 413) whose body shows the prompt exceeded the context window.

Subclasses InvalidRequestError so it inherits never-retry semantics (re-sending the same oversized prompt cannot help) while letting callers detect the context-overflow case specifically. Distinct from MaxOutputTokensError, which is about output truncation.

exception chia.models.openai_compat.BillingError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: OpenAICompatError

Payment / quota problem (HTTP 402, or a 429 carrying insufficient_quota).

Never retried: spending more requests cannot restore quota or credit.

exception chia.models.openai_compat.UnknownOpenAIError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]

Bases: OpenAICompatError

Unclassified OpenAI-compatible error.

class chia.models.openai_compat.OpenAICompatLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'openai_compat_llm', logging_level: int = 10, log_dir: str | None = None, base_url: str | None = None, api_key: str | None = None, token_provider: Callable[[], str] | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]

Bases: LLMCallBase

OpenAI-compatible Chat Completions backend with client-side MCP tools.

Returns the same QueryResult shape as the other backends so callers are interchangeable; returncode is synthesised (0 on success, -1 when every retry fails) and stderr is unused.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

Openai Providers

Lightweight per-provider presets over OpenAICompatLLM.

Each big OpenAI-compatible provider differs only in its endpoint (base_url), a default logging name, and the Ray resource its calls require. The providers are thin subclasses that set the first two as class defaults and re-decorate ``prompt`` with their own @ChiaFunction(resources=...) so that, when dispatched via prompt.chia_remote(self, ...), each lands only on workers advertising that provider’s credential resource. The decorated body just defers to OpenAICompatLLM.prompt().

Auth is environment-driven (see OpenAICompatLLM). The SDK only reads OPENAI_API_KEY automatically — for non-OpenAI providers export the provider key as OPENAI_API_KEY, or pass api_key= / a token_provider.

class chia.models.openai_providers.OpenAILLM(model: str, **kwargs)[source]

Bases: _OpenAICompatProvider

OpenAI itself (default endpoint).

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

class chia.models.openai_providers.FireworksLLM(model: str, **kwargs)[source]

Bases: _OpenAICompatProvider

Fireworks AI.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

class chia.models.openai_providers.GroqLLM(model: str, **kwargs)[source]

Bases: _OpenAICompatProvider

Groq.

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

class chia.models.openai_providers.OpenRouterLLM(model: str, **kwargs)[source]

Bases: _OpenAICompatProvider

OpenRouter (itself a multi-provider router).

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

Ollama

Ollama self-hosted LLM preset over OpenAICompatLLM.

Ollama serves open-weight models locally behind an OpenAI-compatible Chat Completions endpoint at /v1. From chia’s point of view a self-hosted Ollama is therefore just another OpenAI-compatible provider: point base_url at the Ollama server and the entire OpenAICompatLLM loop/tool/error stack applies unchanged. This is the same pattern as the cloud presets in chia.models.openai_providers.

Auth

Ollama requires no credentials, but the openai SDK refuses to build a client with an empty api_key. We therefore default it to the conventional dummy value "ollama" (overridable via api_key / token_provider / OPENAI_API_KEY).

class chia.models.ollama.OllamaLLM(model: str, **kwargs)[source]

Bases: OpenAICompatLLM

Self-hosted Ollama via its OpenAI-compatible /v1 endpoint.

model is the Ollama model tag to serve, e.g. "llama3.1:8b" or "qwen2.5:7b" (it must already be pulled on the server — see the OLLAMA_PULL build/run knob in dockerfiles/OllamaDockerfile).

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.

vLLM

vLLM self-hosted LLM preset over OpenAICompatLLM.

vLLM serves open-weight models on GPUs behind an OpenAI-compatible Chat Completions endpoint (vllm serve <model> → /v1 on port 8000). From chia’s point of view a self-hosted vLLM is just another OpenAI-compatible provider: point base_url at the vLLM server and the whole OpenAICompatLLM loop/tool/error stack applies unchanged — the same pattern as OllamaLLM.

Default hosting to port 8200 (not vLLM’s usual 8000): chia uses the low 8000s heavily — on tunneled workers (e.g. AWS) its SSH tunnels reserve 8000-8010 (head_tool_port), and ChiaTool MCP servers probe ports 8000-8099 (start_router base 8000 + 100 tries; Ray Serve’s proxy also defaults to 8000). Since chia runs containers --net=host, vLLM on 8000 would collide. 8200 clears the tool range with margin and stays below the Ray dashboard (8265) and all tunnel/ray ranges.

One model per server

Unlike Ollama (which serves many models and switches per request), a vLLM server serves exactly one model, fixed when vllm serve is launched. So model here must match the model the target server was started with (its HF id, or the --served-model-name if one was set). Serving multiple models means running multiple vLLM servers/workers.

Auth

vLLM started without --api-key ignores the credential, but the openai SDK refuses to build a client with an empty api_key, so we default it to the dummy "vllm" (overridable via api_key / token_provider / OPENAI_API_KEY — set a real one if the server was started with --api-key).

class chia.models.vllm.VLLMLLM(model: str, **kwargs)[source]

Bases: OpenAICompatLLM

Self-hosted vLLM via its OpenAI-compatible /v1 endpoint.

model is the model the target vLLM server is serving, e.g. "Qwen/Qwen2.5-3B-Instruct" (must match the server’s launch model / --served-model-name — see the VLLM_MODEL knob in dockerfiles/VLLMDockerfile).

prompt(user_message: str, tools: List[ChiaTool] | None = []) → QueryResult[source]: Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.