Caching and Bypass

CHIA’s cache mechanism persists a function’s real output to disk, and bypass replaces a function’s computation with pre-recorded data while still dispatching the call through Ray. These mechanisms enable redundant and costly work to be skipped on reruns, easy testing of cluster orchestration, and improved fault tolerance by storing in-progress results. Both are built around ChiaFunction, keyed by a per-call tag (_chia_tag), and configured in the same YAML file passed to your loop.

At a high level, the cache is the write path (populate from a real run) and bypass is the read path (replay it). They share the tag.

Tags 

Every dispatch can carry an optional _chia_tag that names this specific call — e.g. f"iter{i}_opt{j}". Both caching and bypass use it as the key, and both can be restricted to tags matching a regex:

# The tag identifies this call for both the cache and bypass.
for i in range(N_ITERS):
    for j in range(N_PARALLEL):
      ref = run_verilator_test.chia_remote(design, _chia_tag=f"iter{i}_opt{j}")

Bypass 

When a function is bypassed it is still dispatched according to its resource requirements, but the real computation is replaced by a provider that returns pre-recorded data. This lets you test cluster orchestration without paying for the underlying work (builds, simulations, LLM calls, …).

Configuration

List functions under a bypass: section. bypass: true opts a function in; an optional tags: list (regex patterns) restricts it to matching calls; an optional data: path supplies a file to serve.

bypass:
  # Bypass every call to this function.
  simple_add:
    bypass: true

  # Bypass only calls whose _chia_tag matches one of these patterns.
  run_verilator_test:
    bypass: true
    tags: ["iter0_.*"]

  # Shorthand: bypass: true with no extra options.
  simple_multiply: true

  summarize_perf:
    bypass: true
    data: /path/to/recorded_perf.md   # served to the worker as a string

A function not listed (or bypass: false) runs normally. If no YAML is loaded at all, bypass is a complete no-op.

Setup and providers

Create one Bypass instance as part of loop setup and register a provider for each bypassed function. The provider runs on the worker (with the same scheduling as the real function) and returns the replacement value:

from chia.base.bypass import Bypass
from chia.base.ChiaFunction import ChiaFunction, get

# If yaml_path is None, bypass does nothing.
bypass = Bypass(yaml_path=args.bypass_config)

# Required function signature:
# tag: the function's _chia_tag
# data_path: the YAML "data:" path or None
# *args, **kwargs: the original call's
def provider_42(tag, data_path, *args, **kwargs):
    return 42

# Remote ChiaFunction
@ChiaFunction(resources={"adder": 1})
def simple_add(x, y):
    return x + y

bypass.set_provider("simple_add", provider_42)

# Dispatched to a node with "adder" resource,
# but returns 42 from the provider instead of running.
result = get(simple_add.chia_remote(1, 2))   # -> 42

A function is only bypassed when it is bypass: true and has either a registered provider or a data: path.

Serving a file instead of a provider

If you give a function a data: path but no provider, CHIA serves that file’s contents as the result. The read is routed through a Ray actor pinned to the node that constructed the Bypass (the head), so workers on other nodes can read it without a shared filesystem:

bypass:
  summarize_perf:
    bypass: true
    data: /path/to/recorded_perf.md   # served to the worker as a string

Gating with a condition

Besides a provider, a function may register a condition — an extra gate that decides, at dispatch time, whether the bypass actually happens. It runs on the caller as the last step of the bypass decision (after the bypass: true flag, the provider/data check, and the tag patterns) and returns a bool:

# cond(tag, data_path, *args, **kwargs) -> bool  (same args as a provider)
def cache_hit(tag, data_path, *args, **kwargs):
    return get(get_active_cache().has.chia_remote(tag))

bypass.set_cond("run_verilator_test", cache_hit)

A falsy return means the call is not bypassed and runs for real; a truthy return lets the bypass proceed. When no condition is registered the default is True (no extra gate). Use it to make the decision depend on runtime state, most usefully to only replay from the cache when the value is actually present (see Putting it together: populate, then replay below).

Cache 

The cache is an LRU key/value store of pickled (tag, data) files on disk that warm-starts by scanning the cache directory, so cached values survive across loop runs. Since the cache is implemented as a remote Ray actor located on the head node, it must be accessed using chia_remote. Start it once on the driver after ray.init():

from chia.base.cache import start_cache

start_cache(size=4, units="GB",
            cache_dir_path="/data/chia_cache",
            yaml_path="path/to/yaml.yaml")

# Access via chia_remote and get
cache_hit = get(get_active_cache().has.chia_remote(tag))

Writing is automatic

In the yaml, mark a function cache: true in the cache: section (same file as bypass:). Its output is then written to the cache automatically, keyed by the call’s _chia_tag. An optional tags: list gates which calls are written.

cache:
  run_verilator_test:
    cache: true
    tags: ["iter.*"]      # only cache calls whose tag matches

  # shorthand (cache, no tag filter):
  build_megaboom: true

# With run_verilator_test cache:true, the real result is written under "iter0".
result = get(run_verilator_test.chia_remote(design, _chia_tag="iter0"))

Reading is manual (via bypass)

To replay a cached value, register a bypass provider that reads it back:

from chia.base.cache import get_active_cache
from chia.base.ChiaFunction import get

def cache_provider(tag, data_path, *args, **kwargs):
    hit, value = get(get_active_cache().read.chia_remote(tag))
    if not hit:
        raise KeyError(f"cache miss for tag {tag!r}")
    return value

bypass.set_provider("run_verilator_test", cache_provider)

# "iter0" was written on the real run above; this serves it without rerunning.
served = get(run_verilator_test.chia_remote(design, _chia_tag="iter0"))

Putting it together: populate, then replay 

The common workflow is one real run that populates the cache, then later runs that replay it so you can iterate on the loop/orchestration without recomputing the expensive steps.

See examples/bypass_cache/bypass_cache_loop.py for a runnable version (run it twice: cold then warm).