Handshakes with Strangers

How open source can protect us from ourselves.

Jun 05, 2026

Here’s the thread running through my tabs this week: The model is the easy part to own. The thing wrapped around it — the harness that decides what your agent sees, remembers, and is allowed to touch — is the hard part that nobody has built yet. In the last week, it stopped being abstract: The wrapper started shipping as a download, the leverage moved into it…and there’s still no lock on the door.

It starts as a double-click. Hermes Desktop shipped in public preview. Nous Research’s open agent, which is MIT-licensed and native on Mac, Windows, and Linux — the same agent the CLI runs — is now in a window you can watch. (NB: Mozilla is an investor in Nous.) Six months ago, “your own agent” meant a cloud account and a server, then a Mac mini or the box in the closet I wrote up. This one just opens, like any other app. The catch is the default: out of the box, Hermes talks to Nous Portal’s hosted models, so the easy path puts you right back to renting. Point it at your own endpoint instead and the inference is yours. What used to be a weekend is now a setting. Ownership of the runtime is now a double-click. Ownership of what it touches is still nobody’s product.

You feel it the second you hand over your files. An agent earns its keep by holding the keys to everything. Meredith Whittaker calls it root-level reach across your browser, your wallet, your calendar, your messages. You’d background-check a temp for less. Okta’s AI Agents at Work 2026 puts a number on the shrug: 34% of companies guard an agent like a person, 58% have already taken a hit or a near miss this year. Whittaker said we’d put our brain in a jar. The jar is in production.

I run agents now, which is how I learned to fear the lethal trifecta. Willison’s rule: let one agent touch your private data, read untrusted content, and phone home, and a single poisoned sentence in an email turns it against you. The trap is that those three legs are the three reasons you hired it. So I amputate one by hand: read-only here, no network there, a sandbox around the rest. I’m doing permissions with a scalpel because nobody ships the lock.

It gets worse the moment your agents start hiring each other. (And they will.) Your AI assistant won’t book the flight itself; it’ll find an agent that claims to book flights and hand off the job, your credit card included. MCP and A2A, the protocols that let agents advertise and invoke each other, take every agent at its word and can’t check a thing. A paper out this week reaches for Akerlof’s market for lemons: On a used-car lot, the dealer knows which cars are junk and you don’t, so you distrust the whole lot. Except here the lemon is software that can drain your bank account and read your inbox, waved in by your own assistant.

OpenAgenet / OAN takes the first real swing: Verify who an agent is and what it’s allowed to do before a word passes between them. Call it TLS for agents, the web’s proof-of-identity handshake, now pointed at your agents. But the field is still agreeing on the problem, not the answer: everyone wants scoped permissions, delegation, revocation. Nobody’s shipped the standard. Harbor, W3C Verifiable Credentials, OAN: They’re all sketches. Still no lock. Every agent you let call another is a handshake with a stranger.

Meanwhile, the reason to own this layer got its proof. The “skill doc” is just the plain-English instructions you hand an agent, the part everyone writes by hand and prays over. Microsoft’s SkillOpt treats that file like something you can train: rewrite it, test each change, keep only what scores higher, never touch the model. The numbers should sting: On a frozen GPT-5.5, a tuned text file adds 23.5 points in chat and 19.1 inside Claude Code, beats or ties all 52 setups they ran, and carries to the next model for free. We’re spending billions to make the model smarter; the smarts were sitting in a file we wrote based on guesswork.

A few weeks back, I showed one model swing 20 points between two harnesses. This week, a text file is worth 20 more. So stop guessing. Point an optimizer at your instructions (the code is open), the way I’m aiming it at my own Morph traces this week. The model is the commodity. The instructions are the parameter now.

It’s why the floor under all of this is suddenly contested. On OpenRouter, 7 of the 10 most-used models are now Chinese, with Opus 4.7 the lone American name near the top, and weekly traffic has jumped from 5 to 25 trillion tokens in six months, most of it voting with its feet toward whatever’s cheap and good enough. Price is the only thing left to vote on, because capability stopped being the gap. A year ago, the best open model scored 22 on Artificial Analysis’s intelligence index against a closed frontier near 60; today, it’s around 54, a few points back. Thirty-eight points of lead, gone to three in a year. The most expensive thing in tech is becoming a commodity in real time, and whoever priced it like a moat is learning it gave them a head start.

And what they’re voting with no longer needs an all-NVIDIA stack. For 20 years, every serious AI ran on NVIDIA’s chips and CUDA, the low-level dialect nothing else spoke. Washington made that grip a weapon, banning its best chips from China to hold the country back. It backfired. Cut off, China built around it. DeepSeek adapted V4 to Huawei’s Ascend silicon and trained part of it on CANN, Huawei’s open answer to CUDA. It won’t say whether NVIDIA touched the rest, and the hedge is the tell: a frontier model at a fraction of the price, no longer hostage to one vendor’s chips. Even CUDA’s deepest moat is softening. The 20-year mountain of hand-tuned code nobody could copy is collapsing into a training run: ByteDance’s CUDA Agent beats the compiler on 92% of the hardest benchmark, and AscendCraft aims the same loop at Huawei’s chips. The monopoly under the whole industry is cracking.

Ready to hold two feelings at once? First, the thrill: A monopoly with a real rival isn’t one, so prices fall and no one holds the only keys. Now, the dread (and it’s not about who wrote the code): The exit from one chokehold opens onto another country’s stack, and independence from NVIDIA can curdle into dependence on Beijing. Trading landlords isn’t owning the house.

Governments see it. The EU’s Tech Sovereignty Strategy bets on open source to owe neither Washington nor Beijing, after the Draghi report put its imported share of the core digital stack above 80%. If you build, you get non-NVIDIA silicon without hand-writing kernels. If you hold NVIDIA, you’re betting that leaving CUDA stays expensive, and these tools are driving that cost to zero. When the floor is cracking and the model is a commodity three points off the frontier at a tenth the price, the only thing left to own is the layer between them. That layer still has no lock.

Leave a comment

Washington noticed the lock problem this week, in its way. The White House signed an AI order, and the maddening part is that every line is sensible, such as mandating early government access to the most capable models and a trusted-partner list that gets first crack. The whole thing is aimed at defending what can’t defend itself: the rural hospital, the community bank, the local utility. Except we spent the last 18 months tearing this exact thing down. Biden’s 2023 order made the big labs show their safety results before release; Day 1 of this term, they ripped it out as a burden on innovation. In 2024, I said they’d rebuild it, and here it is: the same wall with new paint. Mandatory became voluntary. Safety became security. All in the name of innovation, the word they used to knock it down. Eighteen months to land back where we started.

Set the exasperation aside; here’s what matters. Call the models critical infrastructure. Maybe they are. But the open-source code running under the internet isn’t a maybe. It’s the real thing: the libraries beneath your bank, your hospital, every server you’ve ever touched. In 2024, it was found that someone had spent years posing as a volunteer to slip a back door into one of them. Had it reached the stable releases, it could have unlocked a back door into much of the world’s servers at once. It didn’t. Andres Freund caught it in the test builds, by accident, because his logins were running half a second slow.

He caught it because he was still looking, and that’s what should keep you up. A machine that fails every time keeps you sharp. One that never fails needs no watching. The one that works almost every time is the one that gets you, because it trains you to stop checking. The internet held because one man hadn’t yet learned to ignore half a second.

We have done this before. Y2K was a foreseeable failure with a deadline, and Washington met it with a President’s Council, five billion federal dollars, and every agency rowing the same direction. It worked so completely that people now call it a hoax, which is what every catch looks like from the outside: the disaster it stopped never arrives, so it seems there was never a disaster. The open stack needs that seriousness without the one thing that made Y2K easy to take seriously. There is no date on the calendar forcing the fix. The near-misses are the only deadlines we get, and we forget those the moment they pass.

That infrastructure is held together by people working for free. No benchmark scores them, no trusted-partner list includes them, no order carries their name. If Washington wants to do something that matters, start there: name the open stack as critical infrastructure, fund the people keeping it alive, and push the models the same way. Because the safest infrastructure isn’t the kind you’re handed early. It’s the kind you own.

Discussion about this post

Ready for more?