Verifiable AI Agents for On-Chain Fund Management

"Can we trust the LLM with the keys?" is one of the dumber questions in crypto right now. The interesting question is what would have to be true for the answer to be yes. We spent a weekend at ETHGlobal Cannes building one possible answer.

The premise

A DAO treasury sits in a smart-contract vault. The treasury would like to delegate small rebalancing decisions to an AI agent — say, between USDC, ETH, and a staking position. The agent reads market data, runs an LLM forecast, and proposes a trade. The smart contract decides whether to execute.

The problem: the smart contract has no way to know whether the agent actually ran the LLM, what model, on what inputs. The agent could be making up the forecast. The agent could be running a different model than it claims. The agent could be running the right model on different inputs. Reputation systems only tell you that someone vouched for the agent, not that the inference is correct.

What you want is a cryptographic receipt. The agent doesn't just say "the model recommends 60/30/10." The agent says "the model recommends 60/30/10, and here is a proof that this specific model on this specific input produced exactly that output."

The 36-hour build

We built three components, in this order, with one engineer on each:

1. The treasury vault contract (Cairo)

A Starknet smart contract that holds the treasury, accepts trade proposals, and executes them only if a valid Obelyzk proof is attached. The contract is parameterized by a Merkle root of the agent's model weights — the contract is pinned to a specific model.

The verifier itself is the recursive STARK verifier from Obelyzk, deployed as a class hash and called from the vault. The vault checks: (a) the proof verifies, (b) the model root matches the one the vault was instantiated with, (c) the input to the model matches the market data the agent claimed to be reacting to, (d) the output matches the proposed trade.

2. The agent (Python + LangChain)

A LangChain agent with three tools: get_market_data, run_model, and propose_trade. The model in question is a fine-tuned SmolLM2-135M — small enough that we could actually prove a forward pass inside the hackathon's time budget, large enough to produce believable trading recommendations.

The agent loop is mundane: pull market data, format a prompt, run the model, parse the output as a JSON object, format the trade payload. The interesting bit is the run_model tool — it doesn't just return the model output, it returns a tuple of (output, proof) by calling into Obelyzk under the hood.

3. The proof transport (TypeScript)

The middleware that takes the (input, output, proof) tuple from the agent and submits it to the vault contract on Starknet. About 200 lines of TypeScript using starknet.js, plus a small queue for handling fee estimation and retries when Sepolia is congested.

What it actually proves

Three things are cryptographically guaranteed by the time a trade settles:

The exact model that ran (committed at vault deploy).
The exact input the model saw (committed in the proof's public inputs).
The exact output the model produced (committed in the proof's public outputs).

What is not guaranteed:

The input was the truth. If the agent claims market data was X but it was actually Y, the proof is still valid — it just proves the model ran on X. You need an oracle layer for the truth of the input.
The trade is profitable. The proof says nothing about whether the model is any good. It just says the model said what it said.
The model is honest. If the model itself is adversarial, the proof faithfully attests to its dishonest output. The proof is about computation, not about values.

What this does rule out. An operator who runs a different model than promised. An operator who runs the model on different input than promised. An operator who simply lies about what the model said. Those are most of the trust failures in delegated AI systems today.

Limitations we hit

Three real ones.

Proving time vs. trading time. SmolLM2-135M proves a forward pass in roughly 90 seconds on a single H100. That's fine for treasury rebalancing on a weekly cadence. It's not fine for HFT. We need another 10× to be in the conversation for tighter loops.

Fee budgets. Each verification costs roughly 0.001 STRK on Sepolia. On mainnet that becomes meaningful for small trades. We sketched a "fee envelope" pattern where the vault only verifies if the trade size exceeds the verification cost by some configurable multiple.

Model swaps. The vault is pinned to a single model root. Upgrading the model means re-deploying the vault, or threading a governance mechanism through. Not a hard problem, but a real one.

What it's good for

This is not the architecture for high-frequency trading. It's the architecture for the long tail of "I want to delegate medium-stakes financial decisions to an AI agent and have a real receipt." Treasury rebalancing. Predictions markets. DAO governance with AI-assisted proposals. Any system where the cost of a wrong delegation is high enough to justify a verification step.

The point isn't that we trust the AI. The point is that we don't have to — the smart contract checks.

Demo deck, vault address, and a Sepolia transaction trail are linked from the Obelyzk repo. Reach out if you want to deploy a treasury vault of your own — we'd love to see what it does in the wild.

Verifiable AI Agents for On-Chain Fund Management

The premise

The 36-hour build

1. The treasury vault contract (Cairo)

2. The agent (Python + LangChain)

3. The proof transport (TypeScript)

What it actually proves

Limitations we hit

What it's good for

Verifying a 14-Billion Parameter Neural Network On-Chain

A Practitioner's Guide to GKR Sumcheck Over Circle STARKs

Custom CUDA Kernels for Proving over M31