Models and Methods
This week in links: Post-shutdown releases, shady grades, and China’s quiet NVIDIA eviction.
The US government switched off the most capable AI Anthropic had ever shipped, and the week just kept getting interesting. I made the full case for why a kill switch is not a safety policy in this newsletter as well as in Transformer, and got to say it out loud on Marketplace Tech with Meghan McCarty Carino. (I got to be on a Marketplace show!) Seven days after the order, the model is still dark.
Switching one model off accomplished almost nothing, because this week, four labs gave theirs away and one aimed straight at the gap. Z.ai released GLM-5.2 as open weights: You download the actual model and keep the file — the difference between owning the song and renting the stream — and nobody can switch it off on you. It’s free, the license (MIT) waves you through almost any use, and, by Artificial Analysis’s June ranking, it is the most capable model on earth that you can run on your own machine. It’s a step behind only the paid, closed models that an executive order could darken. It is not cheap to run: It reasons out loud and at length — some 43,000 tokens to finish one test, where leaner rivals spend far fewer — so what you buy is not a small bill but a model you own and cannot be evicted from. It is now the model my agent, which I named Zora, delegates its heavy lifting to, and I’ll report back on how it goes. The win here is the license, not the meter. When the best model you can hold is free, the price of the ones you rent has only one way to go.
More releases piled on. Within days, Moonshot shipped Kimi K2.7-Code, a trillion-parameter coder that, by its own scorecard, is quicker and sharper than the last; Google’s DiffusionGemma writes the way image models paint, a whole page surfacing at once instead of a word at a time; Cohere’s North Mini Code is small enough that its co-founder ran it on a single Mac. Washington spent the week restricting one model; four labs spent it giving theirs away. The takeaway? You can switch off a model, not a method.
None of it made the floor any steadier. Days later, SpaceX bought Cursor, the code editor that a great many developers live inside all day, for $60 billion in stock in order to feed the AI arm it built around xAI and carry a model that the two trained together. The strange part is what they bought: by Ramp’s data, Cursor’s share had already slid from 41% last June to 26% in May — Anthropic taking half the category — so SpaceX paid a record price for a tool on the way down.
Like I keep saying, the fight has moved from the model to the harness, the industry’s word for the app wrapped around the model. (The part that decides what it can see and touch.) Developers spent the days after the purchase mapping escape routes to tools they can aim at any model, Cline and Zed among them. A $2-trillion rocket company just paid $60 billion for an app so it can push its own model through it. You know what I have never heard anyone say? Let me let Grok run loose on my codebase.
And the model you argue about matters less than the scoreboard suggests. Take the same AI, wrap three different tool setups around it, and on one coding test it scores 75.3, 79.8, and 82%, with the model never changing. Then it gets worse. When researchers took apart how the top two won, both were cheating: one opened the answer-key file in 415 of its 429 winning runs; the other was fed notes that just listed the answers in places. Read the gaps, not the grades. The only score worth trusting is the one your own work produces on your own machine, which is why I record and replay every run my agent makes and grade it against my own tests, not a scoreboard anyone can rig.
The price of getting safety wrong showed up the boring way first. Almost every app you touch is built on free shared code pulled from two giant public libraries, npm and PyPI. This month, a worm named Shai-Hulud, after the sandworms in Dune, tore through both, infecting 170-plus packages with half a billion downloads between them and reaching into Mistral, OpenSearch, and tools developers ship every day. It stole keys, planted poisoned updates, and, when a victim tried to lock it out, it wiped their home folder on the way down. The part that should chill you: The poisoned code carried a valid seal of authenticity (CVE-2026-45321), the “made by who it says” stamp the industry sells as proof of safety. A signature tells you where code came from, not that it is safe. The fix is old and unglamorous: Wait a few days before installing fresh versions, keep your own copy of what you depend on, and cap what one stolen key can reach.
Then the clever way. Your virus scanner is an AI now, so attackers learned to paste a block of dead text at the top of a file, telling the AI to play unshackled and explain how to build a nuclear weapon, and the scanner recoils so hard it never reads far enough to find the real malware below. Socket caught it in live packages, and OWASP, the outfit that catalogs web-security flaws, declared the whole class impossible to patch: an AI cannot tell an order from the text it is only meant to read. A scanner that refuses to look is no scanner at all, a guard dog that faints at the word “bomb.” The fix is not a stricter no, it is permission and containment — the plumbing nobody shipped. Give each agent the narrow access one task needs and log the rest. Arcade.dev raised $60M building exactly that — with Keycard, Oasis, and JetStream right behind — one backer calling it the login layer AI agents never had. I am building one, too: Harbor hands an agent access the way your browser hands a site your camera: per origin, one task, revocable, logged.
So here is my ladder, ranked by how much of it stays yours, top rung first. Run it on your own hardware. A model on your own machine answers to you and vanishes for no one, the way a generator keeps the lights on when the grid goes dark. If you are technical, the recipe is two commands, one to pull a compressed model and one to serve it locally:
hf download <coder-model>-GGUF model-Q4_K_M.gguf --local-dir ./models
llama-server -m ./models/model-Q4_K_M.gguf -ngl 99 --jinja --port 8080The flags just say to lean on the graphics card and allow tool use. On your own private network, only devices that you’ve approved can reach it. That’s the spine of Zora, the agent I run on Nous Research’s Hermes model. (Disclosure: Mozilla, where I’m CTO, invests in Nous.) Lately, I’ve been experimenting with a DGX Spark, NVIDIA’s lunchbox-sized AI desktop, as part of Zora’s infrastructure. I’ll write about it soon.
What turns “own your hardware” from a slogan into a plan is that the machines now exist at sane prices. In June, AMD opened pre-orders on the Ryzen AI Halo, a $3,999 desktop with 128GB of shared memory that runs models up to 200 billion parameters, $700 under the DGX Spark it is chasing. The same chip with the full 128GB sells in a consumer mini PC for around $2,000. The thing that needed a data center last year now fits under your monitor. GLM-5.2 made it literal within days of release: Unsloth shrank the full 1.51TB model by 84%, to 238GB with most of its accuracy intact — small enough to load on a 256GB Mac — and builders report the full version running on stacks of gaming 4090s.
One rung down, for when even that is not enough: Run an open model anyway — GLM-5.2 or Kimi among them — because an open model you can inspect and move beats a closed one you can only rent.
The bottom rung is calling a model over the internet. There, the only promise worth trusting is one built into how the service works, not its terms of service: that your data is probably never stored or is kept in a country you choose. I will take architecture over handshakes every time, and until providers put privacy in the plumbing instead of the fine print, I will keep saying so.
It is not only Washington that’s controlling a switch. The Anthropic shutdown — barring foreign nationals from accessing Fable 5 and Mythos 5 — followed the AI bosses to the G7 in Evian-les-Bains, swallowing the agenda. European officials called it the kill switch they had warned about, one saying tech sovereignty had stopped being an abstraction. One country had reached into a tool the whole world relied on and switched it off, and everyone in the room now had to plan for the day it was their turn.
At home, the lever Washington is reaching for is ownership. The administration is in talks to take a stake in OpenAI, with xAI and others engaged in the same conversation. It’s the newest move in a run of government stakes that already includes nearly 10% of Intel. Bernie Sanders wants to go much further. The problem with all of it is that a part-owner naturally wants the value to go up. A government that owns a slice of OpenAI has handed itself a reason to protect OpenAI, to prop it up if it stumbles, and to go easy when it writes the rules. Imagine the conflict you would see if the referee owned one of the teams. Owning a piece of the company does not change its incentives, it adopts them. The one power a government holds that no shareholder does is to change the incentives for everyone at once, to make the safe choice the cheap one, and you cannot do that from a seat on the cap table.
Washington is buying a slice; Beijing is building the whole supply chain. China is drafting a plan to spend nearly $300B over five years, wiring its data centers into one state-run national computing grid by 2028, with a rule that at least 80% of the hardware — AI chips included — be homemade, quietly evicting NVIDIA and AMD from the Chinese market. It is the purest owners-not-renters move there is — a country refusing to rent compute it could be cut off from — and also the hardest: You can mandate domestic silicon. You cannot mandate a fab into existence. SMIC is stuck near seven nanometers and running flat out, China’s own chip bosses admit they trail the frontier by five to ten years, and when DeepSeek tried to train on homegrown Huawei parts, it crawled back to NVIDIA. The bottleneck was never the money. It’s the wafers.
The one government changing the incentives instead of buying a stake or building its own stack, is in Brussels. The EU’s new tech sovereignty package lets a public agency demand, in plain terms, where your data physically sits, whether a foreign government can reach it, who owns the company, even the nationality of the people running the servers. Both Washington’s shutdown order and Brussels’ strictest demands come down to who is allowed to hold the keys, which tells you the real fight was never open against closed. It’s who holds the keys — and whether you get to watch them turn.
Three from elsewhere, none about who owns what:
DeepMind’s AlphaProof Nexus cracked nine math problems that had stumped everyone for decades, two of them open more than 50 years, each proof checked line by line in Lean, a verifier that will not pass a step it cannot follow, so the machine could not bluff. Benchmarks grade answers; Lean grades steps.
Your old phone is still a computer, and Apple will now sell you one as a laptop: the MacBook Neo is a $599 Mac built around the A18 Pro, the chip out of the iPhone 16. Google Research and UC San Diego are wiring 2,000 retired Pixel phones into a cluster to cut campus carbon. Your junk drawer is a tiny stranded data center.
And, heaviest, GigaAI just sent 100 of its SeeLight S1 humanoids into real homes in Wuhan to chop vegetables and fold laundry, the first big home trial of a robot that does more than one thing. Everything above this line was software: You can copy it, switch it off, or smuggle it out on a laptop and watch it surface in Beijing by Monday. The next fight is about machines with hands, and you cannot squeeze one of those onto an 8GB card.



