Remote work, digital products, and AI systems have turned organizations into instrumented environments. Almost everything leaves a trace: clicks, tickets, classifications, recommendations, rankings, escalations, delays.
That instrumentation created a managerial reflex: if we can measure it, we can optimize it; if we can optimize it, we can ship it. A/B testing became the crown jewel of that reflex — fast feedback, clean comparisons, “data-driven” legitimacy.
But as systems became more entangled with decisions that shape people’s trajectories, a subtle category error crept in: we started treating governance like UI. We started testing the legitimacy in the way we test button copy.
We began calling it experimentation when, in practice, it was policy-making via dashboards. The result is a recurring pattern: metrics improve, trust erodes; efficiency rises, contestability disappears; averages look healthy, cohorts accumulate silent harm.
This article draws a hard line and then builds what comes next: A/B for UI: yes. A/B for governance: no.
And the replacement is not “more sophisticated A/B.” The replacement is a different unit of design and evaluation: Wisdom Testing — the testing of regimes, licenses, and reversible commitments, not isolated variants.