Trust,
but Trace
A journal on what it takes to make ML systems trustworthy in production — traces, fixtures, contracts, replays. Written from the engineering side.
“Most of what an LLM agent does in production has never appeared in any evaluation set.”
Topics
Recurring threads
Six threads that keep coming back. Click one to see every piece tagged with it.
On the desk
Reading list
What’s on the desk this season.
- 01BookAI Engineering: Building Applications with Foundation Models
- 02PodcastLatent Space — Artificial Analysis on independent LLM evals
- 03BookMultimodal, Real-Time AI Agent Systems
- 04ReferenceEU AI Act — Article 15: Accuracy, Robustness, Cybersecurity
- 05BookHands-On Large Language Models
- 06PaperConstitutional Classifiers
- 07PodcastPragmatic Engineer — Inside engineering at frontier labs