I thought we already monitored our models. We had dashboards. We had alerts. We had an on-call rotation that woke someone when latency spiked.
None of it was a post-market monitoring system. Not in the sense Article 72 means.
A dashboard tells you the system is up. Article 72 asks a different question: is it still compliant, and can you prove it for as long as the system runs? Different question, different artifact, different reader. This post maps the distance between the two.
What Article 72 actually says
Article 72 of Regulation (EU) 2024/1689 sits in the post-market chapter. It governs what providers of high-risk AI systems owe after the system ships — after conformity assessment, after the CE mark, for as long as the system stays on the market. Four paragraphs. Each describes a piece of a loop.
Paragraph (1) sets the obligation. Providers shall establish and document a post-market monitoring system, proportionate to the nature of the AI technologies and the risks involved. Read document. The deliverable is a documented system, not a monitoring habit. A notified body has to be able to read it.
Paragraph (2) is the engine. The system must actively and systematically collect, document, and analyse relevant data on performance throughout the system’s lifetime. Throughout its lifetime — not at acceptance, not at the last audit. And one clause carries most of this post: “post-market monitoring shall include an analysis of the interaction with other AI systems.” The multi-agent case is written into the law. Not implied. Named.
Paragraph (3) says the system runs on a plan, and the plan is part of the technical documentation under Annex IV. The Commission was to hand providers a template for that plan — an implementing act, due 2 February 2026. Hold that date. Its fate is a later section.
Paragraph (4) is the reuse clause. Providers already regulated under sectoral EU law can fold these elements into systems they run today, as long as the protection is equivalent. Financial institutions under EU financial-services law get the same accommodation.
Four paragraphs that describe a loop: collect, analyse, document in a plan, feed the result back.
What the ecosystem actually monitors
That’s the loop the law describes. The monitoring stack a team actually runs is narrower, and the distance between them is the rest of this post. Most of that stack falls into four groups, and each answers a real question.
Drift detection. Tools like Evidently and NannyML watch the input and output distributions and flag when they move. Drift detection needs no labels, which is why teams reach for it first. It catches the thing it is built to catch: the data today doesn’t look like the data the model trained on.
Slice metrics. Performance broken down by segment — by cohort, by region, by input type. They surface degradation that an aggregate number hides. Useful, and underused.
Observability stacks. Prometheus, Grafana, OpenTelemetry. Latency, error rate, throughput, cost per request. This is the operational backbone, and it is mature. It answers whether the system is up and what it costs to keep it there.
Tracing. Span-level traces of agent steps and tool calls. The newest of the four, and the least settled. It records what the agent did, step by step.
Four honest categories. None of them answers the question Article 72 asks. The mismatch comes in three parts.
First, all four are operational, not compliance. They tell you the system is healthy. They do not tell you whether the appropriate level of accuracy, robustness, and cybersecurity that Article 15(1) makes you define and justify still holds in the field. Healthy and compliant are different claims, and only one of them is in the law.
Second, nothing maps a signal to the articles. A drift alert fires. Nobody — and no tool — connects that alert to the appropriate level Article 15(1) made you justify, or to a line in the Article 9 risk register. The signal exists; the mapping doesn’t.
Third, the interaction clause has no schema. Analysis of the interaction with other AI systems assumes you can say which system influenced which. OpenTelemetry is building agent and multi-agent spans for exactly this, but the GenAI conventions are still marked Development — a draft, not a standard. The clause is in the law today; the schema to satisfy it is not finished.
The loop, mapped
Six requirements that follow from Article 72 and its neighbours, against what the monitoring stack delivers today.
Two clean gaps. The rest are partial — present in pieces, complete in none.
Where the toolchain misses
Four places where the distance between the article and the tools is widest.
Compliance signals, not ops metrics. A drift alert is an operations event. It becomes a compliance event only when someone ties it back to the appropriate level Article 15(1) made you justify, and records that judgement in the Article 9 risk-management system — and that step is manual, or it doesn’t happen. The stack produces signals in the vocabulary of SRE: latency, saturation, error rate. The article speaks a different vocabulary: accuracy, robustness, cybersecurity, sustained across the lifecycle. Nothing translates between the two automatically.
The plan is a document, not a dashboard. Paragraph (3) wants a post-market monitoring plan — versioned, dated, sitting in the Annex IV technical documentation, read by a notified body. A Grafana board is none of those things. It is live, ephemeral, and read by an engineer at 3 a.m. Both are useful. They are not the same artifact, and shipping the second does not discharge the obligation to produce the first.
Cross-system causation. This is the clause I promised to come back to. In the Article 15 post I flagged multi-agent coordination as the gap the toolchain hadn’t reached. Article 72(2) names it directly: analyse the interaction with other AI systems. To do that you have to say which system’s output shaped which system’s next action — causation across a boundary with no observability. Picture a triage agent that down-ranks a case because a retrieval agent handed it a stale document. The output looks normal. The metric stays green. Nothing records that the first agent’s answer is why the second one decided as it did. Multiply that across a fleet of agents calling each other, and interaction stops being something you can audit after the fact. OpenTelemetry’s agent spans are the closest thing to a schema, and they are still in development. Until they land, interaction with other AI systems is an obligation you can write down and cannot yet measure.
Closing the loop into Article 9. Monitoring that never updates the risk register is an open loop. Article 72 feeds Article 9 — the risk-management system that runs across the whole lifecycle — and the two are one discipline, not two. The toolchain stops at the dashboard. The article expects the dashboard’s signal to change the risk register, the test plan, and the next release. That last hop, from observation to action recorded in the documentation, is the one no tool makes for you.
The implementing act that didn’t arrive
The article gave providers one piece of official scaffolding for the plan: a template. Paragraph (3) tasked the Commission with adopting an implementing act — a standard template for the post-market monitoring plan — by 2 February 2026.
It didn’t arrive on those terms. The Digital Omnibus on AI, the Commission’s simplification package, would remove the Commission’s power to adopt that mandatory template. Post-market monitoring stays mandatory; the template would become voluntary guidance, developed over 2026. Parliament and the Council reached a provisional agreement on the package on 7 May 2026. It is not yet formally adopted — endorsement is expected before 2 August 2026, and until then none of it binds.
So the date passed, and the one standardised artifact that would have told you what a compliant plan looks like would be downgraded to guidance. You write the plan anyway, without a template, against an obligation that did not go away.
The clock would move, too. The same package would defer the high-risk obligations for Annex III systems from 2 August 2026 to 2 December 2027. That would be real breathing room. It is not a reprieve from the work. The monitoring system still has to exist, and building it well takes most of the time the deferral buys. As with every moving piece of this regulation: plan against the law as it stands, not the version you hope is coming.
One deadline did not move: the alarm. Article 73 requires reporting a serious incident within 15 days of becoming aware of it — 10 days if a person died, 2 days for a widespread infringement or a disruption of critical infrastructure. Those clocks start whether or not your monitoring caught the incident. A post-market monitoring system that cannot detect and escalate inside those windows is not a system. It is a dashboard with a compliance label.
What to do meanwhile
Four moves you can make now, without waiting for guidance, a template, or a harmonised standard.
Write the plan now, and borrow a template that exists. No official template is coming on the original terms, but you don’t need the Commission’s. Medical-device regulation has run post-market surveillance for years; its plan structure — the MDR and IVDR post-market surveillance plan — is the maturest analogue, and it maps cleanly onto Article 72. Start from that, make the plan a versioned document in your technical file, and revise it on every model, prompt, and dependency change.
Wire OpenTelemetry’s GenAI conventions, draft status and all. They are not finished. They already define spans for model and agent operations — the closest thing to a schema for the interaction clause. Adopt them now and your cross-system traces will exist when the conventions stabilise, instead of starting from zero the day they do.
Build the incident playbook against the Article 73 clocks. Wire detection to triage to report, so the 15-, 10-, and 2-day deadlines are runnable procedures, not paragraphs in a policy nobody has tested. Run the drill once before you need it.
Build the deployer feedback channel. If you are the provider, the deployer holds half your monitoring data. Article 26 requires them to watch the system, keep logs for at least six months, and tell you when something looks wrong. That signal closes your loop. But only if there is a channel for it to arrive on, and a place in the risk register for it to land. Build both.
None of this needs the template to exist. The obligation is to run a loop: observe the system in the field, analyse what you see, write it into a plan, and let it change the next release. The tools give you the observation. The rest is engineering you own.
A monitoring stack that never updates the risk register is an open loop. Article 72 named the loop. The ecosystem built the instrumentation. Closing it is the work.