Model Backends
NOTE: We provide a single function to interface with each of these providers even though some are agents, some are LLM providers, and some are on-premises LLM servers. This interface is an agentic one, and for the non-agents, we turn the model into a (very primitive) agent, by placing it into a query->tool call->query loop.
We plan to in the near future expose an interface to the non-agent models (providers and servers), which is, instead of a primitive agent, the interface you would use to build your own agents.
For most serious tasks that don’t require on-premises LLM serving, we expect you will get better results using the agents (Claude Code, Codex, Antigravity, or OpenCode), with your preferred provider for credentials for the agent, as opposed to using the specific node for your provider (e.g. Claude Code with Bedrock credentials instead of the bedrock node).
API reference for chia.models. These pages are generated from the docstrings in the source, so they stay in sync with the code.
Claude
- class chia.models.claude.ClaudeCodeQueryResult(result: str, returncode: int, stderr: str, stream_result: str, success: bool = False, session_transcript: bytes | None = None, session_transcript_path: str | None = None)[source]
Bases:
QueryResultDerived QueryResult specialized for the Claude Code CLI backend.
Carries the on-disk
<session_id>.jsonltranscript bytes back to the caller so aresume_session=TrueLLM can continue the same conversation on a different worker — written byClaudeCodeLLM._capture_transcript()after a CLI run and consumed byClaudeCodeLLM._restore_transcript()before the next--resumeinvocation.
- exception chia.models.claude.ClaudeCodeError(node_id: str, error_type: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ExceptionBase for all Claude Code CLI errors.
Every subclass must implement
__reduce__for Ray serialization.
- exception chia.models.claude.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', exit_code: int = -1)[source]
Bases:
ClaudeCodeErrorRaised when the Claude CLI response indicates a usage-limit hit.
- exception chia.models.claude.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ClaudeCodeErrorRaised when the CLI’s auth token/API key is invalid or expired.
- exception chia.models.claude.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ClaudeCodeErrorRaised when the billing account has payment issues.
- exception chia.models.claude.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ClaudeCodeErrorRaised when the request is malformed (bad prompt, unsupported params, etc.).
- exception chia.models.claude.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '', retry_after: int | None = None)[source]
Bases:
ClaudeCodeErrorRaised when Anthropic’s API returns a server-side error (500/503).
- exception chia.models.claude.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '', partial_text: str = '')[source]
Bases:
ClaudeCodeErrorRaised when the LLM’s response was truncated by the output token limit.
- exception chia.models.claude.UnknownClaudeError(node_id: str, exit_code: int = -1, raw_message: str = '', stderr: str = '')[source]
Bases:
ClaudeCodeErrorRaised for unclassified CLI errors.
- chia.models.claude.parse_rate_limit_reset(text: str) datetime | None[source]
Parse a Claude rate-limit message and return the UTC reset time.
Expected format:
"You've hit your limit · resets 4pm (America/Los_Angeles)"Returns
Nonewhen no rate-limit message is found.
- chia.models.claude.parse_rate_limit_event(event: dict) datetime | None[source]
Parse a
rate_limit_eventJSON object and return the UTC reset time.Only triggers when
rate_limit_info.statusis"rejected"— the event is also emitted with other statuses as an informational notice, which should NOT be treated as a rate limit.
- class chia.models.claude.ClaudeCodeLLM(model: str = 'claude-sonnet-4-6', system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'claude_code', logging_level: int = 10, log_dir: str | None = None, resume_session: bool = False, projects_cwd: str | None = '/home/ray/.claude/projects/-home-ray-llm-env', extra_cli_args: List[str] | None = None, log_stream: bool = True, log_all: bool = False, backend: str = 'cli', api_key: str | None = None, max_tokens: int = 16000, thinking: str | None = 'adaptive', max_tool_iterations: int = 100)[source]
Bases:
LLMCallBaseWraps the Claude Code CLI (
claude --print) as an LLM backend.Each call to
prompt()spawns aclaudesubprocess that can optionally connect to MCP tool servers (e.g.BashTool).- prompt(user_message: str, tools: List[ChiaTool] | None = []) ClaudeCodeQueryResult[source]
Send user_message to Claude Code CLI and return the response.
- Returns:
ClaudeCodeQueryResultwithsuccess=Truewhen the CLI ran cleanly, orsuccess=Falsewhen every retry attempt failed (in which caseresultis empty andreturncodeis-1).- Raises:
RateLimitError – Usage limit hit — propagates immediately.
AuthenticationError – Auth failure — propagates immediately.
BillingError – Billing/payment issue — propagates immediately.
InvalidRequestError – Malformed request — propagates immediately.
ServerError – After all retries with exponential backoff.
MaxOutputTokensError – After one retry attempt.
Bedrock
Amazon Bedrock LLM backend built on the boto3 Converse API.
BedrockLLM talks to any tool-capable chat model on Amazon
Bedrock (Claude, Amazon Nova, Llama, Mistral, Command R, …) It uses
the Bedrock Runtime converse API, which normalises
messages, system prompts, and tool use across model families, and runs the
agentic tool loop client-side — executing each ChiaTool’s MCP server over
HTTP exactly like chia.models.claude.ClaudeCodeLLM’s API backend.
WARNING: experimental. Only exercised by the tests in chia/models/tests/test_bedrock.py (mocked unit tests, plus opt-in live tests). Not validated in production.
Auth/config come from the standard AWS chain (env vars, shared profile, or
IAM role); pass region or rely on AWS_REGION / AWS_DEFAULT_REGION.
boto3 is imported lazily, so importing this module does not require it.
- exception chia.models.bedrock.BedrockError(node_id: str, error_type: str, status_code: str = '', raw_message: str = '')[source]
Bases:
ExceptionBase for all Bedrock backend errors.
- exception chia.models.bedrock.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: str = '')[source]
Bases:
BedrockErrorThrottling / quota exhaustion (
ThrottlingExceptionetc.).
- exception chia.models.bedrock.AuthenticationError(node_id: str, status_code: str = '', raw_message: str = '')[source]
Bases:
BedrockErrorInvalid / expired / unauthorized AWS credentials.
- exception chia.models.bedrock.InvalidRequestError(node_id: str, status_code: str = '', raw_message: str = '')[source]
Bases:
BedrockErrorMalformed request or unknown model (
ValidationExceptionetc.).
- exception chia.models.bedrock.ServerError(node_id: str, status_code: str = '', raw_message: str = '')[source]
Bases:
BedrockErrorTransient service-side failure (5xx, model timeout, connection).
- exception chia.models.bedrock.MaxOutputTokensError(node_id: str, status_code: str = '', raw_message: str = '', partial_text: str = '')[source]
Bases:
BedrockErrorThe response was truncated at
maxTokens.
- exception chia.models.bedrock.UnknownBedrockError(node_id: str, status_code: str = '', raw_message: str = '')[source]
Bases:
BedrockErrorUnclassified Bedrock error.
- class chia.models.bedrock.BedrockLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'bedrock_llm', logging_level: int = 10, log_dir: str | None = None, region: str | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]
Bases:
LLMCallBaseBedrock Converse-API LLM backend with client-side MCP tool execution.
Returns the same
QueryResultshape as the other backends so callers are interchangeable;returncodeis synthesised (0 on success, -1 when every retry fails) andstderris unused.- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via the Bedrock Converse API and return the response, retrying transient failures with the same policy the other backends use.
Vertex
Google Vertex AI LLM backend built on the google-genai SDK.
VertexGeminiLLM runs Google’s Gemini models on Vertex AI and drives the
agentic tool loop client-side — executing each ChiaTool’s MCP server over HTTP,
exactly like the Bedrock and Claude API backends.
Vertex has no single unified API across model families —
Gemini goes through google-genai, Claude-on-Vertex through AnthropicVertex,
and Llama/Mistral through OpenAI-compatible MaaS endpoints. Two separate
classes are provided here, one for Gemini and one for MaaS.
WARNING: experimental. Only exercised by the tests in chia/models/tests/test_vertex.py (mocked unit tests, plus opt-in live tests). Not validated in production.
Auth/config: Vertex needs a GCP project + location and Application Default
Credentials (gcloud auth application-default login, a service-account key
via GOOGLE_APPLICATION_CREDENTIALS, or a workload identity). Pass
project/location or rely on GOOGLE_CLOUD_PROJECT /
GOOGLE_CLOUD_LOCATION. google-genai is imported lazily.
- exception chia.models.vertex.VertexError(node_id: str, error_type: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
ExceptionBase for all Vertex backend errors.
- exception chia.models.vertex.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: int | None = None)[source]
Bases:
VertexErrorQuota / rate exhaustion (HTTP 429,
RESOURCE_EXHAUSTED).
- exception chia.models.vertex.AuthenticationError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
VertexErrorInvalid / missing credentials or permission (HTTP 401 / 403).
- exception chia.models.vertex.InvalidRequestError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
VertexErrorMalformed request or unknown model (HTTP 400 / 404).
- exception chia.models.vertex.ServerError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
VertexErrorTransient service-side failure (HTTP 5xx).
- exception chia.models.vertex.MaxOutputTokensError(node_id: str, status_code: int | None = None, raw_message: str = '', partial_text: str = '')[source]
Bases:
VertexErrorThe response was truncated at
max_output_tokens.
- exception chia.models.vertex.UnknownVertexError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
VertexErrorUnclassified Vertex error.
- exception chia.models.vertex.ContentBlockedError(node_id: str, block_reason: str = '', raw_message: str = '')[source]
Bases:
VertexErrorThe model returned no usable content because the prompt or the response was blocked (safety, recitation, blocklist, …).
This is NOT an HTTP/API error — Gemini reports it as a 200-OK response whose candidate carries a blocking
finish_reason(or whoseprompt_feedbackcarries ablock_reasonwith no candidates). Surfacing it as a typed error keeps a block from masquerading as a successful empty answer. Never automatically retried: re-sending the same prompt will be blocked again.
- class chia.models.vertex.VertexGeminiLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'vertex_gemini', logging_level: int = 10, log_dir: str | None = None, project: str | None = None, location: str | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]
Bases:
LLMCallBaseGemini-on-Vertex LLM backend with client-side MCP tool execution.
Returns the same
QueryResultshape as the other backends so callers are interchangeable;returncodeis synthesised (0 on success, -1 when every retry fails) andstderris unused.- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Gemini on Vertex and return the response, retrying transient failures with the same policy the other backends use.
- class chia.models.vertex.VertexGenericLLM(model: str, project: str | None = None, location: str | None = None, **kwargs)[source]
Bases:
OpenAICompatLLMNon-Gemini Vertex models (Llama, Mistral, …) via the Vertex Model-as-a- Service OpenAI-compatible endpoint.
Vertex has no single unified API, so Gemini uses google-genai (
VertexGeminiLLM) while the open/partner families are reached through the OpenAI-compatible MaaS endpoint — which is exactly whatOpenAICompatLLMalready speaks. The only Vertex-specifics are the endpoint URL (built from project/location) and auth: a GCP ADC bearer token that rotates, supplied viatoken_provider. Everything else — the agent loop, tool handling, error translation — is inherited unchanged.modelis the MaaS model id, e.g.meta/llama-3.1-8b-instruct-maas.- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
Antigravity
Google Antigravity CLI LLM backend.
AntigravityLLM wraps Google’s Antigravity CLI (the agy binary, installed
via curl -fsSL https://antigravity.google/cli/install.sh | bash) behind the
same synchronous prompt shape as the other Chia LLM backends. It runs
agy --print (non-interactive “print mode”), which emits the model’s final
answer as plain text on stdout.
- Auth is OAuth-only.
agysigns in with a Google account (“Antigravity” / Gemini Code Assist) and stores a refresh token on disk; in a container it uses file-based token storage under the Gemini config dir. There is no API-key path.
Output is plain text, not JSON. Print mode has no structured/streaming output format, so there is no per-turn tool/usage trace to parse
The system prompt is folded into the user message (print mode has no
--system-prompt flag), mirroring CodexLLM.
- exception chia.models.antigravity.AntigravityError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ExceptionBase for Antigravity CLI errors. Subclasses are Ray-serializable.
- exception chia.models.antigravity.RateLimitError(node_id: str, reset_time: datetime | None = None, raw_message: str = '', exit_code: int = -1)[source]
Bases:
AntigravityError
- exception chia.models.antigravity.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- exception chia.models.antigravity.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- exception chia.models.antigravity.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- exception chia.models.antigravity.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- exception chia.models.antigravity.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- exception chia.models.antigravity.UnknownAntigravityError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
AntigravityError
- chia.models.antigravity.parse_rate_limit_reset(text: str) datetime | None[source]
Parse a human reset time such as
resets 4pm (America/Los_Angeles).
- class chia.models.antigravity.AntigravityLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'antigravity', logging_level: int = 10, log_dir: str | None = None, agy_bin: str = 'agy', work_dir: str | None = None, add_dirs: list[str] | None = None, gemini_dir: str | None = None, dangerously_skip_permissions: bool = True, sandbox: bool = False, extra_cli_args: list[str] | None = None)[source]
Bases:
LLMCallBaseWrap the Google Antigravity CLI (
agy --print) as a Chia LLM backend.- prompt(user_message: str, tools: list[ChiaTool] | None = None) QueryResult[source]
Send user_message to
agy --printand return the response.
Codex
Codex CLI LLM backend.
CodexLLM wraps codex exec behind the same synchronous prompt shape
as the other Chia LLM backends. Chia MCP tools are passed as per-run Codex
config overrides, so this backend does not mutate the user’s persistent Codex
configuration.
- exception chia.models.codex.CodexError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ExceptionBase for Codex CLI errors. Subclasses are Ray-serializable.
- exception chia.models.codex.RateLimitError(node_id: str, reset_time: datetime | None = None, raw_message: str = '', exit_code: int = -1)[source]
Bases:
CodexError
- exception chia.models.codex.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- exception chia.models.codex.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- exception chia.models.codex.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- exception chia.models.codex.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- exception chia.models.codex.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- exception chia.models.codex.UnknownCodexError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
CodexError
- class chia.models.codex.CodexQueryResult(result: str, returncode: int, stderr: str, stream_result: str, success: bool = False, session_id: str | None = None, session_state: dict[str, bytes] | None = None, session_state_paths: tuple[str, ...] = ())[source]
Bases:
QueryResultQueryResult specialized for the Codex CLI backend.
Codex stores resumable conversations under
CODEX_HOMErather than in a portable JSONL transcript. The state fields carry the needed opaque files back to the caller so a latercodex exec resume <session_id> -can run on a different Chia worker.
- chia.models.codex.parse_session_id(stdout: str) str | None[source]
Extract a Codex session id from JSONL stdout.
- chia.models.codex.parse_rate_limit_reset(text: str) datetime | None[source]
Parse a human reset time such as
resets 4pm (America/Los_Angeles).
- class chia.models.codex.CodexLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'codex', logging_level: int = 10, log_dir: str | None = None, codex_bin: str = 'codex', work_dir: str | None = None, extra_cli_args: list[str] | None = None, sandbox: str = 'workspace-write', approval_policy: str = 'never', dangerously_bypass_approvals_and_sandbox: bool = True, skip_git_repo_check: bool = True, ephemeral: bool = False, ignore_rules: bool = False, profile: str | None = None, reasoning_effort: str | None = None, resume_session: bool = False, auto_compact_token_limit: int | None = 200000)[source]
Bases:
LLMCallBaseWrap
codex execas a Chia LLM backend.- prompt(user_message: str, tools: list[ChiaTool] | None = None) CodexQueryResult[source]
Send user_message to
codex exec.
Opencode
opencode CLI LLM backend.
OpenCodeLLM wraps the opencode CLI (https://opencode.ai) as an LLM
backend, opencode is provider-agnostic: the model is given as provider/model (e.g.
anthropic/claude-sonnet-4-6) and opencode runs its own server-side agentic
tool loop, so there is no client-side MCP loop here.
WARNING: experimental. Only exercised by the tests in
chia/models/tests/test_opencode.py (mocked unit tests, plus opt-in live tests).
Not validated in production. Auth is environment-driven: opencode uses its own
stored credentials (opencode auth login) or provider env vars.
- exception chia.models.opencode.OpenCodeError(node_id: str, error_type: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
ExceptionBase for all opencode CLI errors.
- exception chia.models.opencode.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', exit_code: int = -1)[source]
Bases:
OpenCodeErrorThe provider behind opencode reported a usage/rate limit.
- exception chia.models.opencode.AuthenticationError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
OpenCodeErroropencode has no/invalid credentials for the selected provider.
- exception chia.models.opencode.BillingError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
OpenCodeErrorThe provider account has a billing/payment problem.
- exception chia.models.opencode.InvalidRequestError(node_id: str, exit_code: int = -1, raw_message: str = '')[source]
Bases:
OpenCodeErrorMalformed request — bad model string, invalid config, unknown agent, etc.
- exception chia.models.opencode.ServerError(node_id: str, exit_code: int = -1, raw_message: str = '', retry_after: int | None = None)[source]
Bases:
OpenCodeErrorTransient provider/server-side failure (5xx, overloaded, connection).
- exception chia.models.opencode.MaxOutputTokensError(node_id: str, exit_code: int = -1, raw_message: str = '', partial_text: str = '')[source]
Bases:
OpenCodeErrorThe response was truncated at the output token limit.
- exception chia.models.opencode.UnknownOpenCodeError(node_id: str, exit_code: int = -1, raw_message: str = '', stderr: str = '')[source]
Bases:
OpenCodeErrorUnclassified opencode CLI error.
- chia.models.opencode.parse_session_id(stdout: str) str | None[source]
Pull the opencode session id out of
run --format jsonstdout.Each event is
{type, sessionID, part:{sessionID, ...}}; the openingstep_startline reliably carries it. Falls back to a regex scan if the JSON shape changes.
- chia.models.opencode.parse_run_error(stdout: str) dict | None[source]
Pull the first structured error out of
run --format jsonstdout.opencode emits
{"type":"error", "sessionID":..., "error":{name, data}}events for failures that happen before/around the model request — notably an unknown model id, which surfaces as{"name":"UnknownError","data":{"message": "Model not found: ..."}}. These never reach the session export’smessages[].info.errorbecause no assistant message is ever created, so the run stream is the only place they appear (confirmed against opencode 1.15.13). Returns the first sucherrordict ({name, data}), orNone. Genuine provider errors (e.g. APIError 401) appear in both the run stream and the export; the export copy is richer (fullresponseHeaders), so callers prefer it and use this only as a fallback.
- class chia.models.opencode.OpenCodeLLM(model: str | None = None, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'opencode', logging_level: int = 10, log_dir: str | None = None, opencode_bin: str = 'opencode', agent_name: str = 'chia', work_dir: str | None = None, extra_cli_args: List[str] | None = None)[source]
Bases:
LLMCallBaseWraps the
opencodeCLI as an LLM backend.Each
prompt()call runsopencode run(to create a session and get its id) thenopencode export(to read the assistant response + usage from opencode’s local DB). Returns the sameQueryResultshape as the other backends;returncodeis therunexit code.- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message to opencode and return the response.
- Returns:
QueryResultwithsuccess=Truewhen opencode ran cleanly, orsuccess=Falsewhen every retry attempt failed.- Raises:
RateLimitError / AuthenticationError / BillingError / –
InvalidRequestError – propagate immediately.
ServerError – after all retries with exponential backoff.
MaxOutputTokensError – after one retry attempt.
Openai Compat
OpenAI-compatible LLM backend built on the openai SDK Chat Completions API.
OpenAICompatLLM talks to any provider that implements the OpenAI
Chat Completions wire format and drives the agentic tool loop client-side —
executing each ChiaTool’s MCP server over HTTP, like the other backends.
It is deliberately not provider-specific. The OpenAI Chat Completions format is a de-facto multi-vendor standard, so the only provider-specific inputs are:
base_url— selects the provider’s endpoint (default: OpenAI itself).auth — the credential for that endpoint.
The same loop/tool/parse/error code therefore covers OpenAI, Fireworks, Groq, OpenRouter, self-hosted vLLM/TGI, Vertex MaaS, etc. Each provider is configuration, not a new module.
Auth, by default, lives entirely in the environment (OPENAI_API_KEY /
OPENAI_BASE_URL) — construct with no key and the SDK reads them, matching
the other backends. For providers whose credential is a rotating token
(Vertex MaaS GCP token, Azure AD), pass token_provider — a zero-arg
callable that returns a fresh token; it’s invoked when the client is built.
WARNING: experimental. Only exercised by the tests in
chia/models/tests/test_openai_compat.py. Not validated in production.
openai is imported lazily, so importing this module does not require it.
- exception chia.models.openai_compat.OpenAICompatError(node_id: str, error_type: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
ExceptionBase for all OpenAI-compatible backend errors.
- exception chia.models.openai_compat.RateLimitError(node_id: str, reset_time: datetime, raw_message: str = '', status_code: int | None = None)[source]
Bases:
OpenAICompatErrorHTTP 429.
- exception chia.models.openai_compat.AuthenticationError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
OpenAICompatErrorHTTP 401 / 403.
- exception chia.models.openai_compat.InvalidRequestError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
OpenAICompatErrorHTTP 400 / 404.
- exception chia.models.openai_compat.ServerError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
OpenAICompatErrorHTTP 5xx, connection, or timeout.
- exception chia.models.openai_compat.MaxOutputTokensError(node_id: str, status_code: int | None = None, raw_message: str = '', partial_text: str = '')[source]
Bases:
OpenAICompatErrorThe response was truncated at
max_tokens(finish_reason=’length’).
- exception chia.models.openai_compat.ContextLengthExceededError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
InvalidRequestErrorHTTP 400 (or 413) whose body shows the prompt exceeded the context window.
Subclasses
InvalidRequestErrorso it inherits never-retry semantics (re-sending the same oversized prompt cannot help) while letting callers detect the context-overflow case specifically. Distinct fromMaxOutputTokensError, which is about output truncation.
- exception chia.models.openai_compat.BillingError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
OpenAICompatErrorPayment / quota problem (HTTP 402, or a 429 carrying
insufficient_quota).Never retried: spending more requests cannot restore quota or credit.
- exception chia.models.openai_compat.UnknownOpenAIError(node_id: str, status_code: int | None = None, raw_message: str = '')[source]
Bases:
OpenAICompatErrorUnclassified OpenAI-compatible error.
- class chia.models.openai_compat.OpenAICompatLLM(model: str, system_message: str = '', timeout_seconds: int = 600, retries: int = 3, logging_name: str = 'openai_compat_llm', logging_level: int = 10, log_dir: str | None = None, base_url: str | None = None, api_key: str | None = None, token_provider: Callable[[], str] | None = None, max_tokens: int = 16000, max_tool_iterations: int = 100, client_kwargs: dict | None = None)[source]
Bases:
LLMCallBaseOpenAI-compatible Chat Completions backend with client-side MCP tools.
Returns the same
QueryResultshape as the other backends so callers are interchangeable;returncodeis synthesised (0 on success, -1 when every retry fails) andstderris unused.- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
Openai Providers
Lightweight per-provider presets over OpenAICompatLLM.
Each big OpenAI-compatible provider differs only in its endpoint (base_url),
a default logging name, and the Ray resource its calls require. The providers
are thin subclasses that set the first two as class defaults and re-decorate
``prompt`` with their own @ChiaFunction(resources=...) so that, when
dispatched via prompt.chia_remote(self, ...), each lands only on workers
advertising that provider’s credential resource. The decorated body just defers
to OpenAICompatLLM.prompt().
Auth is environment-driven (see OpenAICompatLLM). The SDK only reads
OPENAI_API_KEY automatically — for non-OpenAI providers export the provider
key as OPENAI_API_KEY, or pass api_key= / a token_provider.
- class chia.models.openai_providers.OpenAILLM(model: str, **kwargs)[source]
Bases:
_OpenAICompatProviderOpenAI itself (default endpoint).
- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
- class chia.models.openai_providers.FireworksLLM(model: str, **kwargs)[source]
Bases:
_OpenAICompatProviderFireworks AI.
- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
- class chia.models.openai_providers.GroqLLM(model: str, **kwargs)[source]
Bases:
_OpenAICompatProviderGroq.
- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
- class chia.models.openai_providers.OpenRouterLLM(model: str, **kwargs)[source]
Bases:
_OpenAICompatProviderOpenRouter (itself a multi-provider router).
- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
Ollama
Ollama self-hosted LLM preset over OpenAICompatLLM.
Ollama serves open-weight models locally behind an
OpenAI-compatible Chat Completions endpoint at /v1. From chia’s point of
view a self-hosted Ollama is therefore just another OpenAI-compatible provider:
point base_url at the Ollama server and the entire
OpenAICompatLLM loop/tool/error stack
applies unchanged. This is the same pattern as the cloud presets in
chia.models.openai_providers.
Auth
Ollama requires no credentials, but the openai SDK refuses to build a client
with an empty api_key. We therefore default it to the conventional dummy
value "ollama" (overridable via api_key / token_provider /
OPENAI_API_KEY).
- class chia.models.ollama.OllamaLLM(model: str, **kwargs)[source]
Bases:
OpenAICompatLLMSelf-hosted Ollama via its OpenAI-compatible
/v1endpoint.modelis the Ollama model tag to serve, e.g."llama3.1:8b"or"qwen2.5:7b"(it must already be pulled on the server — see theOLLAMA_PULLbuild/run knob indockerfiles/OllamaDockerfile).- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.
vLLM
vLLM self-hosted LLM preset over OpenAICompatLLM.
vLLM serves open-weight models on GPUs behind an
OpenAI-compatible Chat Completions endpoint (vllm serve <model> →
/v1 on port 8000). From chia’s point of view a self-hosted vLLM is just
another OpenAI-compatible provider: point base_url at the vLLM server and the
whole OpenAICompatLLM loop/tool/error stack
applies unchanged — the same pattern as OllamaLLM.
Default hosting to port 8200 (not vLLM’s usual 8000): chia uses the low 8000s heavily — on tunneled
workers (e.g. AWS) its SSH tunnels reserve 8000-8010 (head_tool_port), and
ChiaTool MCP servers probe ports 8000-8099 (start_router base 8000 + 100
tries; Ray Serve’s proxy also defaults to 8000). Since chia runs containers
--net=host, vLLM on 8000 would collide. 8200 clears the tool range with
margin and stays below the Ray dashboard (8265) and all tunnel/ray ranges.
One model per server
Unlike Ollama (which serves many models and switches per request), a vLLM server
serves exactly one model, fixed when vllm serve is launched. So model
here must match the model the target server was started with (its HF id, or the
--served-model-name if one was set). Serving multiple models means running
multiple vLLM servers/workers.
Auth
vLLM started without --api-key ignores the credential, but the openai SDK
refuses to build a client with an empty api_key, so we default it to the
dummy "vllm" (overridable via api_key / token_provider /
OPENAI_API_KEY — set a real one if the server was started with --api-key).
- class chia.models.vllm.VLLMLLM(model: str, **kwargs)[source]
Bases:
OpenAICompatLLMSelf-hosted vLLM via its OpenAI-compatible
/v1endpoint.modelis the model the target vLLM server is serving, e.g."Qwen/Qwen2.5-3B-Instruct"(must match the server’s launch model /--served-model-name— see theVLLM_MODELknob indockerfiles/VLLMDockerfile).- prompt(user_message: str, tools: List[ChiaTool] | None = []) QueryResult[source]
Send user_message via Chat Completions and return the response, retrying transient failures with the same policy the other backends use.