Flipping the Kill Switch

Fable 5 as political allegory.

Jun 16, 2026

At 5:21 p.m. on Friday, the federal government ordered Anthropic to cut off access to its two most capable AI models for any foreign national, including some of the company’s own employees. Selective compliance was impossible, so the models went dark for everyone, just three days after the safer of the two, Claude Fable 5, had gone on sale.

Set aside whether it was wise and pay attention to what it reveals: A government that — 10 days after it rejected any move to license or preclear new AI — reached for emergency national-security authority to kill a model anyway. You don’t do that to something you believe is harmless. But is it really that dangerous? They haven’t yet shown us the proof, but the people reading the classified briefings acted as if the answer was yes, in the most expensive way available to them. I agree with the decision, and have been arguing for it for months: in the Times in April and again last week.

Does anyone really know why the switch was thrown? Anthropic says the government supplied only verbal evidence of a narrow jailbreak, the kind that asks a model to read a codebase and flag its flaws, reproducible on public models nobody is recalling. The administration’s David Sacks says Anthropic was asked to either fix the jailbreak or pull the model and refused. Same Friday; two different stories. Anthropic has its own stake in how this is remembered, and its account is not gospel — but it is the only one anyone has put on the record. The fight is over who controls the switch, and the people who carry the risk are not in the room to weigh in. It happened in the dark, and the dark is the real story.

Fable’s edge did not die with the model. The same week the switch was thrown, OpenRouter showed that a panel of cheaper models, fused and synthesized by a judge, landed within a point of Fable on a hard research benchmark, with most of the gain stemming from the combination, not any single model. That benchmark measures research, not offense, so it does not rebuild the cyber capability the government feared. It makes a narrower point: the value was never one weight file. You can switch off a model. You can’t switch off a recipe, and the ingredients are on the shelf.

Fear the Soft Middle

Let’s start with this: Switching something off is not a plan. If Fable is dangerous enough to justify the government’s action, it is dangerous enough to justify the work required to neutralize it. We know what the work looks like, because we did it once before, even if it’s wrongly remembered as a joke. Y2K is the most successful example of a disaster that never happened. Faced with a flaw threaded through every bank, hospital, and power company, the government signed an executive order, spent some seven and a half billion federal dollars, and coordinated years of unglamorous repair. On January 1, 2000, almost nothing broke. People have called it a hoax ever since, the cruelest possible review. What they didn’t see is that the quiet was the product.

The work this time is the same shape. A spring briefing from the SANS Institute and the Cloud Security Alliance warned that AI has collapsed the time from finding a flaw to exploiting it from weeks to hours. Finding is cheap now. Fixing is the entire job. Anthropic’s newest models found 271 vulnerabilities in Firefox in one release. By late May, its public tally showed nearly 1,600 disclosed across open-source projects and fewer than 100 fixed. (Disclosure: I’m the CTO of Mozilla, which created Firefox.) Discovery is commoditizing; the scarce asset is the capacity to remediate. And that backlog is the business.

The strongest independent test locates where the danger sits. The UK’s AI Security Institute found Mythos to be the first model to run a 32-step corporate intrusion end to end, then noted its test ranges had no live defenders, so it could not say the model would beat a hardened target. The fear is not a model storming a fortress. It is the soft middle: the weakly defended network and the unfunded code under everything, where the defender — if there is one at all — is a single volunteer.

The backlog has a face. Somewhere, a maintainer has spent 20 years patching, for free, open-source code that runs inside software used by billions, with no security team and no budget. Now a firehose of newly found flaws is pointed at the projects s/he keeps alive. A few miles away, a hospital runs critical systems with the same exposure and no one to call. Neither was made one bit safer on Friday.

A kill switch makes a visible event and delivers almost no safety. A remediation campaign makes no event and delivers nearly all of it. The work that protects the hospital is boring while the gesture that protects no one for long is dramatic — and we keep reaching for the drama. Come on. We have done this the right way before.

The Transparency We’re Owed. The Defensive Layer We Need.

To be fair, the plan exists on paper. The AI executive order issued on June 2 directed a cybersecurity clearinghouse to coordinate vulnerability remediation and patch distribution, as well as access to defensive tools for rural hospitals and local utilities, and take a look at how to cobble together funding — all within 30 days. The federal cyber agency shortened some patch windows to three days. That is the beginning of the right direction. But it is voluntary, a month away from existing, and unfunded, whereas Friday’s switch was instant and absolute.

In the same stretch of days, the Wall Street Journal reported, officials told the government’s own AI testing office to stop publishing its assessments. They wrote the plan, then reached for the switch less than two weeks later. Today, we are being asked to trust a remediation effort that was made, by design, harder to see. The two sides can’t even agree on what happened, because it happened in the dark.

Y2K hid none of the effort and published none of the broken code. Today, the transparency the country is owed is the scoreboard: the threshold that triggered the action, the reasons in writing, an appeal, and a referee free to report whether the fix is working. I backed the Biden AI approach in 2023 because it was building toward exactly that. Anthropic’s own position is that any halt should run through a process that is transparent, fair, and grounded in fact. The objection isn’t that someone pulled a lever. It is that no one gave us a full explanation as to why.

The administration’s own national-security memo says that no private company should be able to disable an AI system that American soldiers depend on without government approval. They are right, but the principle cannot stop at the soldier: the hospital, the public utility, and the lone developer carry the same exposure. A nation that rents its defenses does not own its resilience. So the country needs a defensive layer it can own outright, open-weight models a hospital can run, audit, and keep alive when a vendor changes its terms or an order lands at 5:21 on a Friday.

I recently argued in “Make It Stop” that the ability to stop has to live inside the thing itself, not in someone you trust to say the word. Where that line falls is not a matter of taste. Draw it at demonstrated offensive uplift, judged by the same independent evaluators who raised the alarm — the kind of end-to-end test the UK’s AI Security Institute already runs. The systems that clear that bar can stay gated. The layer beneath it should be open, because you cannot revoke a copy that lives on a machine you hold, just as you can’t revoke a recipe once it’s been published.

How to Build a Resilient System

Critics of an open layer are not wrong: An open model capable enough to defend a hospital is capable enough to attack one. Yes, but that is the world we already live in. The cyber capability that frightened the government is not Anthropic’s alone. Before Friday, the UK’s AI Security Institute found GPT-5.5 was the second model to run that same 32-step intrusion end to end, and deemed offensive cyber a property of the frontier, not one vendor. The day after the switch, a Chinese lab shipped an open-weight frontier model pitched as the alternative to monopolized access. Offensive cyber is coming whether or not our defenders are armed, so withholding the layer disarms only the hospital, never the foreign service that can build its own. One tier down, the security move and the open one are the same. Friday repriced every dependency routed through someone else’s endpoint, and a premium built on gating is rent against a shrinking moat.

One truth runs under all of it: concentration and secrecy make a system fragile, and distribution and daylight make it resilient. Last week, I argued that it needs a Y2K-scale mobilization, run as a standing job and not a countdown. Fund the repair, not just the discovery. Pay for the patching, the triage, the maintainers the industry has freeloaded off of. Put real tools in the hands of the hospitals and utilities. Keep the thresholds and the due process. Turn the referee back on. None of it is dramatic. All of it works. And we did none of it on Friday.

If you’re not in charge, you’re not off the hook. If you depend on an open-source project, this is the week you fund or staff the maintainer you have sponged off of. If you ship an agent, write the eval before you trust it, so that a failing check — and not your waning attention — is what catches it. If you run anything you cannot afford to lose, stop betting it on a single copy someone else can switch off: Keep a layer you own, with a kill switch that is entirely yours. Building in this direction? Tell me what you are running, and I will feature it here.

On Friday, in our name, the country was told this is dangerous. I believe us. We have the playbook, and we have run it before. And yet we are intentionally choosing the version that makes news and protects no one for long. Whoever is right about that one model, David Sacks or Dario Amodei, a switch thrown in the dark is not a national plan. The darkness is the tell. If done right, the disaster we are trying to prevent will never arrive and no one will thank us, because it will mean we did our jobs.

So I will stop swallowing the question I have been asking since Friday and ask it at full volume: What, exactly, are we doing?

Discussion about this post

Ready for more?