1 series · 2 pieces

Series

Multi-part essays — read in order, or jump to any part. Each series has a thread holding it together; the parts compound.

Series

Reliability series

A multi-part argument. Reads best in order; each part references the last.

2 of 3 published

Engineering

How to build agent workflows you can replay, diff, and certify — when the underlying LLM call is none of those things.

Engineering

Unit tests and benchmarks miss the failures that actually break agents. A pattern for evaluating the system as a whole.

Part 3 · drafting

Coming up

Title under embargo. Subscribe to get it the day it ships.