marco@ag0.xyz
Protocol Agent: What If Agents Could Use Cryptography in Everyday Life (arXiv, X.com) frames a practical “improvement stack” for agents operating in open networks: (1) infrastructure that lets agents discover each other and establish trust - either via closed, curated catalogs or through open ecosystems anchored in blockchain-based distributed registries (e.g., ERC-8004); (2) standards that specify message formats and interfaces so requests and responses are packaged consistently and work across implementations; (3) behaviors and techniques: what agents actually say and do, turn by turn, to achieve their goals; (4) learning loops: how past interactions (their own and others’) are turned into improvements via context engineering and post-training.
Throughout this note, we refer back to these as layers (1)–(4).
This paper currently focuses on layer (3). The core premise is that agents have capabilities that are structurally different from humans - deterministic memory, speed, and on-demand computation - so their communication techniques should eventually diverge from human conversational defaults. That shift hasn’t meaningfully happened yet, largely because today’s models are trained on multi-turn human conversations (and therefore inherit human habits and inefficiencies).
More specifically, the paper studies models that could bring advanced cryptography into everyday agent interactions. It introduces a benchmark and an arena to measure end-to-end performance across: (a) recognizing the right cryptographic primitive family, (b) negotiation/persuasion to secure counterpart buy-in, (c) correct protocol execution, (d) correct cryptographic computation/tool use, and (e) security strength. It also proposes a dataset generation pipeline to improve models on this benchmark, runs supervised fine-tuning (SFT), and shows large gains on several leading open-weight models.
Here’s what we’re considering right now:
RLAIF: Keep the current rubric, and use the judge outputs to directly optimize end-to-end behavior (selection + negotiation + execution + security) under a fixed interaction budget. A practical approach could be to start offline with preference optimization (e.g., DPO/IPO) using judge-ranked candidates, since it’s cheap and stable for fast iteration. Then, once the pipeline is solid, move on-policy to PPO-style RLAIF with KL control to the SFT reference, so we optimize the behavior distribution we actually get in live multi-turn play (especially tool use and long-horizon protocol execution).
Add deterministic reward components that are objectively checkable (tool validity, verifier checks, policy-constraint compliance), so learning does not rely entirely on an LLM judge.
When both sides are tuned “protocol agents,” adoption can be smoother than in many real deployments. We should therefore also evaluate asymmetric pairings: a protocol agent interacting with a non-tuned model (or a model optimized for “convenience over privacy”). This puts more realistic pressure on negotiation and explanation.
Under an honest-but-curious (not fully adversarial) threat model, the protocol agent may rationally help the counterparty implement steps and do the math (without leaking secrets) because this increases the chance of reaching a verifiable agreement within the turn budget. Measuring this “assist without oversharing” behavior is itself valuable.
Expand coverage (including state-of-the-art commercial models) and enrich scenarios to better reflect deployment realities: partial cooperation, ambiguous objectives, longer horizons, and failure modes that stress tool discipline and leakage prevention.
Create a benchmark for an agent that, without prior hints, can:
Explore whether post-training models to communicate in a more compact, structured “agent jargon” (more symbolic, more schematic, less verbose) improves performance in both Protocol Agent and The Explorer.