ChiaFunction
ChiaFunction is the core primitive of a CHIA flow. The @ChiaFunction
decorator turns any Python function into a node, a unit of work that can
be scheduled onto a worker in the cluster, chained into a task graph, profiled,
cached, and bypassed. This page explains the Ray concepts the rest of the docs
lean on, then walks through every mode of execution a ChiaFunction supports.
A few Ray concepts
CHIA is built on the Ray distributed-computing platform, and the docs use Ray’s vocabulary throughout. You do not need to know Ray to use CHIA, but these terms recur:
Driver — the process running your flow script (the
main()you submit withpython ...orchia job submit ...). It runs on the cluster’s head node, dispatches work, and collects results.Worker — a process that executes dispatched work. In CHIA a logical worker also advertises resources (CPU/GPU/FPGA counts, software capabilities) and only maps onto physical machines that can satisfy them. See Architecture Overview for how logical workers map onto machines.
Task — a single asynchronous function invocation sent to a worker. Dispatching a
@ChiaFunctionwithfn.chia_remote(...)creates a task. Tasks are stateless: the worker runs the function and returns the result.Actor — a stateful worker: a remote Python object whose methods run on the worker that holds it. CHIA uses actors for things that must persist across many calls, like the profile collector, the cache, and MCP tool servers.
Object reference (``ObjectRef``) — a future: a handle to a result that may not exist yet.
chia_remote(...)returns immediately with anObjectRef[R]instead of blocking for the value. You resolve it later withget(), or pass it straight into another call as an argument.Resources — labels with quantities a task requires (
{"chipyard": 1}) and a worker advertises. Ray holds a task until a worker with enough free resources is available, then leases that worker for the task’s duration. This is how CHIA places work on the right machine.
Executing a CHIA node
CHIA provides a library of nodes decorated with @ChiaFunction. Most of them
follow the same shape: a plain Python class that bundles the node’s
construction parameters, instance state, and private helpers, with one (or more)
method decorated with @ChiaFunction(...) that is the actual dispatchable
node. The decorated method comes pre-assigned with the resources it needs.
chia.chipyard.chisel_build_node is a representative example. The
ChiselBuildNode class holds the node’s build configuration (chipyard path,
config, target, make flags) and a handful of private helpers. Its build
method is the node, pre-assigned the chipyard resource:
class ChiselBuildNode:
def __init__(self, chipyard_path, config,
target=BuildTarget.VERILATOR, ...):
... # instance state
@ChiaFunction(resources={"chipyard": 1})
def build(self) -> BuildArtifact:
... # runs `make` in sims/verilator
return BuildArtifact(...)
When you call build you can choose how to execute it: locally in the driver,
remotely on a worker, and with or without blocking for the result. The
pre-assigned resource options can also be overridden per-call. The rest of this
section walks through each mode using this node.
Local call
Calling the bound method directly runs it in the same process as made the call, exactly like an ordinary Python call:
cb_node = ChiselBuildNode("/home/ray/chipyard", "RocketConfig",
target=BuildTarget.VERILATOR)
artifact = cb_node.build() # runs here, in the driver
This is handy for using ChiaFunction features like profiling, bypassing, and caching without dispatching the function to a worker.
Remote, asynchronous (chia_remote)
fn.chia_remote(...) dispatches the function as a task onto a worker that
satisfies its resources. It is non-blocking and returns an ObjectRef
immediately. Resolve and block on the ref with get() when you need the value:
ref = cb_node.build.chia_remote(cb_node) # note: pass `cb_node` explicitly
# ... dispatch other work here; it runs concurrently ...
artifact: BuildArtifact = get(ref) # blocks until the build finishes
Remote, blocking (chia_remote_blocking)
When you want the value synchronously and have no other work to overlap,
chia_remote_blocking dispatches remotely and returns the unwrapped value:
artifact: BuildArtifact = cb_node.build.chia_remote_blocking(cb_node)
Chaining refs into a task graph
Any argument to a chia_remote call may be either a plain value of type T
or an ObjectRef[T]. Passing a ref directly without get() tells Ray
that this task depends on that one, forming an explicit edge in the task graph.
Ray resolves the dependency for you and only starts the downstream task once its
inputs are ready, so independent tasks run concurrently. Dispatch everything, wire refs
together, and get() only the final result:
# No get() between these — refs flow straight in as arguments.
bin_ref = compile_program.chia_remote(c_src)
build_ref = cb_node.build.chia_remote(cb_node)
# run() depends on BOTH upstream tasks; Ray waits for them automatically.
result = get(
verilator_node.run.chia_remote(
verilator_node, build_ref, bin_ref, "helloworld.riscv", "/home/ray"
)
)
The Quickstart: Say Hello (World) with CHIA builds up exactly this pattern step by step.
Per-call option overrides (.options)
To override the decorator-level options for a single dispatch, use
.options(...):
ref = cb_node.build.options(
num_cpus=4, scheduling_strategy="SPREAD"
).chia_remote(cb_node)
Any keyword that Ray’s .options() accepts is supported. The
ones CHIA flows reach for in practice are:
``resources`` — the resource labels (and quantities) a worker must offer to run the node. E.g. tag a function with
@ChiaFunction(resources={"chipyard": 1})needs a whole chipyard slot.``num_cpus`` — how many CPUs the task reserves. Number of CPUs per machine is automatically discovered in cluster setup and advertised to the cluster. E.g. a chipyard node that needs 2 threads:
@ChiaFunction(num_cpus=2, resources={"chipyard": 1}), with default value 1.``max_retries`` — how many times a task is rerun if the worker it ran on dies.
0disables automatic retry. E.g.@ChiaFunction(num_cpus=0.1, max_retries=0), with default value 3.``scheduling_strategy`` — how tasks are scheduled across nodes. Common values are
"DEFAULT", which prioritizes locality then load balancing, and"SPREAD"to fan tasks out evenly. E.g.run_synthesis.options(scheduling_strategy="SPREAD").chia_remote(design). This option also accepts a Ray scheduling-strategy object, most often aPlacementGroupSchedulingStrategy, which binds the task to a placement group so a set of related@ChiaFunctioncalls gang-schedule onto the same (or co-located) logical worker(s) instead of being placed independently.
A bare .options() (no resources) can run on any worker.
Functional form (ChiaCallRemote)
ChiaCallRemote(fn, *args, **kwargs) is equivalent to
fn.chia_remote(*args, **kwargs) but raises a clear TypeError if fn
is not a @ChiaFunction. Use whichever reads better at the call site:
from chia.base.ChiaFunction import ChiaCallRemote
ref = ChiaCallRemote(compile_program, src_contents)
Collecting results
get() is the basic collector and is usually all you need. For flows that
dispatch many tasks and want to react as each finishes, or that run long enough
to hit a wedged worker, CHIA provides chia_wait, a drop-in replacement for
ray.wait that operates on TrackedRef objects (an ObjectRef
paired with a closure that can re-dispatch it). It returns the usual
(ready, pending) split and can additionally detect tasks stuck in
PENDING_NODE_ASSIGNMENT while the cluster has free resources, then cancel and
resubmit them:
from chia.base.ChiaFunction import chia_wait, TrackedRef
tracked = [
TrackedRef(compile_program.chia_remote(s),
submit_fn=lambda s=s: compile_program.chia_remote(s),
label=f"compile_{i}")
for i, s in enumerate(sources)
]
# Return when at least num_returns tasks are ready, or after pending_timeout seconds of no progress.
ready, pending = chia_wait(tracked, num_returns=1,
pending_timeout=120, retry=True)
for tr in ready:
result = get(tr.ref)
Cancellation
chia_cancel(ref) cancels a running task. Unlike a bare ray.cancel, it
first looks up any subprocesses the task spawned and kills them on the correct
remote nodes (process-group kill for start_new_session children) before
cancelling the Ray task, part of CHIA’s process-leak prevention:
from chia.base.ChiaFunction import chia_cancel
ref = build_megaboom.chia_remote(config)
chia_cancel(ref, force=True)
Worker-side setup and cleanup hooks
A chia_remote call may carry reserved kwargs that run side-effecting
callables on the worker around the function: _chia_setup (runs before the
function) and _chia_cleanup (runs after, in a finally, so it fires even
if the function raises). Each takes an optional _chia_setup_args /
_chia_cleanup_args tuple. Neither can see or replace the function’s return
value:
ref = run_sim.chia_remote(
design,
_chia_setup=mount_scratch, _chia_setup_args=(scratch_dir,),
_chia_cleanup=unmount_scratch, _chia_cleanup_args=(scratch_dir,),
)
If setup raises, the function and cleanup are skipped and the task fails. If cleanup raises, the error is logged but never re-raised, so a failing teardown cannot mask the function’s result or its own exception.
Defining your own node
Decorate any function with @ChiaFunction. The resources argument (and any
other keyword) names what a worker must offer to run it:
from chia.base.ChiaFunction import ChiaFunction, get
@ChiaFunction(resources={"chipyard": 1})
def compile_program(src_contents: str) -> bytes:
...
return elf_data
Wrapping actors (chia_actor)
A plain Ray actor handle can be given the same call surface as a
ChiaFunction with chia_actor, so actor calls read like node dispatch:
from chia.base.ChiaFunction import chia_actor, get
store = chia_actor(some_actor)
n = get(store.size_bytes.chia_remote())
get(store.write.chia_remote(key, value))
We currently do not support profiling,
bypass, or cache machinery on actors. Recover
the raw Ray handle with store.actor (e.g. for ray.kill).
Cross-cutting features
The same chia_remote dispatch path also drives three of CHIA’s framework
features, all of which are transparent to your function’s body:
Profiling. Once a collector is running, every
ChiaFunctionfunction call is instrumented automatically. Execution time, the worker it ran on, and the dependency edges between calls are recorded. Callget_profiler().add_info(...)from inside a node body to attach domain metrics. See Profiling.Caching and bypass. Pass a per-call
_chia_tagto name a dispatch. A function markedcache: truethen has its result persisted to disk under that tag, and a function markedbypass: truecan replay a pre-recorded value (or a registered provider’s output) instead of running — while still dispatching through Ray so scheduling is exercised. See Caching and Bypass.# The _chia_tag names this call for both caching and bypass. ref = run_verilator_test.chia_remote(design, _chia_tag=f"iter{i}_opt{j}")
These features compose: a single dispatch can be profiled, served from cache, and relayed through the head’s dispatch proxy (on reverse-tunneled workers) all at once, with no change to the decorated function.
See also
Quickstart: Say Hello (World) with CHIA — a hands-on flow that uses every mode above.
Architecture Overview — how nodes, edges, workers, and clusters fit together.
Caching and Bypass — the
_chia_tagcache/replay workflow.Profiling — recording and visualizing a flow’s execution.
chia viz-profile <log...> — rendering a recorded profile from the CLI.