Example: Agentic RoCC Accelerator (MemCpy)

A worked, end-to-end example of an agentic hardware-design loop: an LLM (Claude Code) designs a RISC-V RoCC accelerator in Chisel, CHIA builds it into a MegaBoom SoC, runs it against a bare-metal test, and, on any failure, feeds the error back to the LLM to debug and retry until the test passes.

The example lives at examples/memcpy/ and is essentially self-contained: it depends only on the installed chia package and the shared DRAMSim2 ini files in examples/common/dramsim_ini. It is a good starting point for building your own generate, build, simulate, debug loops on top of CHIA’s chipyard nodes.

What it does 

The accelerator is a hardware memcpy: it copies an array of 64-bit elements from a source region to a destination region, driven by two custom instructions on opcode custom1. The target design is MegaBoomV3HumanCommitLogConfig (MegaBoom V3 with the human-readable commit-log harness, so a failing run yields a readable instruction trace for the debugger).

The loop runs two nodes in parallel, then builds, runs, and debugs:

test build  (parallel)        implement  (parallel)
  copy memcpy.c into            Claude writes memcpy.scala
  $chipyard/tests, run cmake    (the RoCC accelerator) and
  -> build/memcpy.riscv         wires it into the target
     build/memcpy.dump          config, via chipyard_bash
     └──────────────┬──────────────┘
                    ▼
      chisel build  (ChiselBuildNode)     target: MegaBoomV3HumanCommitLogConfig
                    ▼
      verilator run (VerilatorRunNode)    memcpy.riscv, +loadmem +verbose
                    ▼
     build failed / sim failed / incorrect?
        │ yes (≤ NUM_DEBUG_ATTEMPTS)        │ no
        ▼                                   ▼
     debug (Claude) ── rebuild + rerun     DONE (passed)

Components 

File	Role
`memcpy_loop.py`	Main orchestration: parallel test-build + implement, then the build → run → debug loop. Dumps all collateral to `out/`.
`test_build.py`	`build_test` `ChiaFunction` — copies `memcpy.c` into `$chipyard/tests`, registers a CMake target, builds `build/memcpy.riscv` + `build/memcpy.dump`, and reads them back. Runs on the chipyard container.
`claude.py`	The implement + debug LLM nodes (`chia.models.claude.ClaudeCodeLLM`) and the failure-feedback formatters. LLM calls are dispatched onto the dedicated `llm` (claude) node, sharing one session.
`helpers.py`	Run-outcome classification (`classify_run`), the `out/` dumper, the chipyard git-diff node (`collect_diff`), and dramsim-ini loading.
`prompts/`	The implement (`implement.md`) and debug (`debug.md`) prompt text, with `${VAR}` placeholders filled from `constants` at load time.
`constants.py`	Every tunable knob (loop counts, configs, paths, timeouts, resources).
`cluster.yaml`	Minimal cluster: one chisel-build, one verilator, one claude (`llm`) node.
`memcpy.c`	The bare-metal test: issues the two RoCC instructions and checks the copy.

The accelerator contract 

memcpy.c drives the accelerator at opcode custom1 with two instructions; the implement prompt specifies exactly this so the generated design matches the test:

funct == 0 — rs1 = source base address, rs2 = destination base address. Latch both; write rd = 1.
funct == 1 — rs1 = array length (number of 64-bit elements). Copy the whole array source → destination via the RoCC memory port; write rd = 1 when done.

Correctness is judged from the MEMCPY Num Correct: N line the program prints: the run passes iff N == DATA_SIZE (constants.DATA_SIZE, kept in sync with memcpy.c).

Running it 

The bundled cluster.yaml brings up three node types — one chisel-build (chipyard), one verilator (verilator_run), and one claude (llm) node. The llm node mounts your Claude Code credentials; see the note at the top of cluster.yaml.

export THIS_MACHINE="your_machine_ip"
chia up   examples/memcpy/cluster.yaml
chia job submit -- python $PWD/examples/memcpy/memcpy_loop.py   # run from the repo root
chia down examples/memcpy/cluster.yaml

Pass the absolute path to memcpy_loop.py. The driver runs on the cluster head, where the repo lives, so out/ is written into the real examples/memcpy/out.

The LLM calls run on the dedicated llm node: chia.models.claude.ClaudeCodeLLM.prompt() is itself a ChiaFunction, so the loop dispatches it with llm.prompt.options(resources={"llm": 1.0}) and threads the session transcript from each call into the next, so the debugger resumes the implement conversation. (Session persistence for other backends is in development.)

Tunable parameters 

All knobs live in constants.py; container paths and cluster knobs are MEMCPY_* environment-overridable, so nothing is hardcoded into the loop.

Constant	Meaning
`NUM_DEBUG_ATTEMPTS`	Max debug-and-retry rounds after the first failure (default 3).
`NUM_PERF_OPT_ITERS`	Reserved for a future post-correctness performance-optimization phase (defined, not yet used).
`BUILD_CONFIG`	Chisel config to build (`MegaBoomV3HumanCommitLogConfig`).
`DATA_SIZE`	Element count; must match `memcpy.c`.
`VERILATOR_TIMEOUT_CYCLES` / `_SECONDS`	Simulation caps so a hung design fails fast.
`*_RESOURCE`	Ray scheduling tokens, sized so the persistent `chipyard_bash` actor coexists with the builds.

Debug feedback 

On a failure the debug node (the same Claude session, resumed) receives:

Build failure — build stderr tail plus stdout windowed on the first error.
Simulation failure (runtime / timeout / incorrect) — the simulator stdout (the commit log) and spike-dasm output tails, plus the last COMMIT_LOG_TAIL_LINES lines of the commit log and the last DUMP_TAIL_LINES lines of memcpy.dump (the test disassembly).

Output and per-iteration diffs 

Every node result and piece of collateral is written to out/, each filename prefixed with the timestamp at the moment it is written (so files sort by when they were produced):

20260626_144501_implement.md
20260626_144502_test_build_memcpy.riscv
20260626_144502_test_build_memcpy.dump
20260626_144503_chisel_diff_attempt0.diff       # chipyard diff for this iteration
20260626_144503_chisel_diff_attempt0.json       # per-repo diff dict
20260626_145012_chisel_build_attempt0.stdout.txt
20260626_145230_verilator_run_attempt0.log
20260626_145231_feedback_attempt1.md
20260626_145950_debug_attempt1.md
20260626_153044_summary.json

Each iteration’s Chisel diff is captured with collect_diff (in helpers.py) just before that attempt’s build, so it reflects the exact source built from the implement node’s work on attempt 0, and the cumulative implement + debug edits on later attempts.

Example: Agentic RoCC Accelerator (MemCpy)

What it does

Components

The accelerator contract

Running it

Tunable parameters

Debug feedback

Output and per-iteration diffs