Topic · 2 pieces

Engineering

Long-form essays on the engineering side of the craft.

How to build agent workflows you can replay, diff, and certify — when the underlying LLM call is none of those things.

Unit tests and benchmarks miss the failures that actually break agents. A pattern for evaluating the system as a whole.