The Week in Open-Source AI 06.26

Meet the most capable (and affordable) open agent yet + the cloud gets kicked out of the loop + the real story behind the Mythos vs. NSA hoax.

Jun 26, 2026

This week, the most capable open agent yet was running on a gaming card within days of release. If you’ve wanted AI you can own, that’s the best news in a while. The model is GLM-5.2. Nathan Lambert at Interconnects calls it a step change for open agents: the first open model that works as a general agent inside a coding harness, not a chatbot you babysit. The leaderboard at LLM-Stats ranks it first among open weights — 91.2% on graduate-level science, about what the closed frontier scored a year ago. The thing that used to be the moat — reasoning good enough to point at a real repo and walk away — is now a download with a permissive license stapled to it.

Then the packaging caught up, fast. Unsloth crushed the 753B-parameter model from 1.51TB to 238GB — two-bit, ~82% of its accuracy intact, small enough for a 256GB Mac — and another team got the full weights running on a stack of gaming 4090s. The model moat is now measured in weeks, not quarters. Last week, I told you the real gap was never the model — it was the packaging. And here we are.

Here’s the part you can act on this weekend: The runtime is finally good enough to own. Last week, the machines hit sane prices. This week, Vicki Boykis’s much-shared “Running local models is good now” post reports doing real agentic coding entirely on her own machine, no cloud in the loop, which is the whole promise in seven words: You don’t need someone else’s datacenter. The plumbing grew up to match: EnerInfer keeps on-device inference from cooking your laptop, FlexServe runs a model inside a hardware lockbox on a phone, CompressKV and HyperQuant squeeze long context down to where the edge can hold it.

My agent, Zora, started routing its heavy lifting to GLM-5.2 the day it dropped — not on my own metal yet, but through a host that doesn’t keep my data. Local is the goal; I’m not there yet.

So if the model is the commodity, where does the money actually go? Kai-Fu Lee, who built 01.AI to be China’s OpenAI, answered this year by walking away from the model race entirely, turning the company into a shop that sells sovereign AI systems to governments and conglomerates across Central Asia, the Middle East, Southeast Asia, Europe, and Africa. A year of 100-plus CEO meetings and a seat on Kazakhstan’s national AI council took the order book from about Rmb 500B to Rmb 1.5B, with a 2027 IPO right behind it, according to this exclusive interview with Kai-Fu. The tell is his new yardstick: He’s stopped measuring 01.AI against OpenAI and started measuring it against Palantir, the company that got rich not off the smartest model but by owning the rails a government runs on, not the model itself. Even Palantir is pressing the same bet, building a sovereign-AI operating system with NVIDIA. The catch is significant — it’s a consulting business riding on the founder’s rolodex, whole-government transformation sold by hand rather than a product that scales — but the direction is the point: The frontier-lab dream didn’t die, it relocated to the integration layer and brought a balance sheet.

The story that ate the week was that an AI had broken into the NSA. By the weekend, it had fallen apart. The viral claim — Mythos cracked almost all of the agency’s classified systems in hours — came from a senator relaying a secondhand quote, and it didn’t hold up: a controlled red-team test against simulated systems, the model flagging holes without walking through them, no agency confirming a breach, the reporter himself admitting he’d oversold it. This is exactly how the “AI is too dangerous to leave open” panic gets sold: The panic is the part nobody checks. The secrecy wouldn’t have helped anyway: Within days, OpenMythos rebuilt the architecture in the open, every design bet right there in the README, one of the most guarded models on earth reconstructed as a hypothesis anyone could test. The scariest thing this week wasn’t the model, it was how fast a story nobody could verify went around the world.

That loop is why sovereignty stopped being an op-ed and became a purchase order. The European Commission picked the Domyn-led Europa consortium as the sole winner of its frontier AI program, putting public money behind owning a model instead of renting one from a US lab. On top of that, when Washington pulled Mythos and Fable offline for foreign nationals, the shutdown became a material risk that Anthropic now has to explain to investors ahead of its October IPO. And today the same hand reached OpenAI: Washington asked it to stagger GPT-5.6 — a model it rates "on par with Mythos" — releasing only to government-approved customers first, the first time the state has stepped into a US model launch before it ships. Renting is an operating risk — the kind that ends up in a prospectus. The colder version, for the cap table: Reflection AI is now a $25B open-source lab whose real moat is being the sole US shop wired into the G7 trusted-partners framework. Wanting a stack you control has become a distribution channel. (See: the EU, Japan, Korea, India.) What these buyers pay to close isn’t model quality. It’s the wrapper around it. Build the wrapper.

Leave a comment

Here’s the catch: A model on your Mac is not yet an agent that works on your Monday. Think of the model as the brain and the harness — the code that decides what it sees, stores, and does — as the body. Change the body and the brain can swing sixfold on the same benchmark; the harness now matters as much as the weights, which is why the craft finally has a name: harness engineering. A matched-benchmark study shows the mechanism: run one agent driving the screen like a person — read the pixels, find the button, move the cursor, click, wait, look again — and another working through a command line, typing instructions and reading clean text back. Same brain, but the first burns its intelligence on seeing and clicking; the second spends it on the actual work and gets more done.

We built agents to use computers the way we do, and the bottleneck was never the model — it was the human costume. The fight has moved off the model and onto the harness, and the surprising part is that the harness that wins is the one that stops making the model act human. The open recipes keep coming — LemonHarness, Tmax — but the sharper signal is that the harness no longer has to be hand-built at all: Stanford’s Meta-Harness points a coding agent at its own logs and lets it rewrite the harness, and the machine-tuned result beats every hand-built harness on a terminal-coding benchmark, Claude Code’s own included.

And the frontier just climbed another floor. If the harness is the body, the meta-harness is the layer that wires many bodies into one system. This month, Matei Zaharia, the creator of Apache Spark and CTO of Databricks, open-sourced one: Omnigent, a uniform API that sits above Claude Code, Codex, Pi, or agents you wrote yourself and makes them swappable, governable, and shareable from one place. You can switch the harness or model with a one-line change, run a whole team of them in a single session. Databricks calls it the Kubernetes moment for agents: You stop babysitting one agent and start managing a fleet. Here’s the part to watch: It’s Apache-licensed and free, but the policies, integrations, and habits pile up inside the layer, and whoever owns the layer that decides what every agent can do owns the thing that compounds. Give away the control plane, own the standard. The model was the commodity; the harness is becoming one, too. The product is the layer above them both.

The problem nobody has solved? Deciding what an agent is allowed to do. This week’s most unsettling demo: a single fake bug report can hijack your AI coding agent into running a stranger’s code on your machine — no password stolen, no network breached. Security firm Tenet showed how. Sentry, the error tracker half the industry wires into its agents, lets anyone file a bug to a project using a public ID. Plant one whose “resolution steps” are actually an attacker’s command then ask your agent to clear the open issues, and it reads the note as gospel and runs it: your AWS keys, GitHub tokens, the lot. Tenet found 2,388 exposed organizations and popped 85% of the agents they tried, from Fortune 100 down to solo devs. The researchers’ chilling line: This isn’t a misconfiguration you can patch, it’s the model itself, and it can’t tell the difference between data it’s reading and an instruction to obey.

Same flaw, quieter symptom: The study “Capable but Careless” found that computer-use agents routinely violate contextual integrity — your medical record at the doctor’s, not the company newsletter — leaking private information wherever the context drifts. Whether it’s an attacker’s command or your own calendar invite, the agent can’t tell what it’s allowed to act on. So stop asking the model to police itself. Move “no” out of the prompt and into something it can’t argue with. The Sovereign Execution Broker keeps the power to change real systems outside the model’s reasoning loop, behind permissions checked like a receipt instead of taken on the model’s word, with GroundEval logging what it touched. And the meta-harness is where that lands in practice: the same Omnigent can keep a GitHub token out of the agent’s sight entirely — slipping it in through a proxy only when an approved action needs it — and caps spend and permissions above the prompt, where the model can’t sweet-talk its way past them. The reassuring part: We’ve solved this exact shape of problem before. Your phone already does it: An app can’t grab your camera. It asks, you grant it permission once, and you can revoke it. Your agent, somehow, is trusted with way more than the flashlight app. Give it the same deal: ask per action, scoped, revocable, logged. The plumbing is finally getting built — Databricks just shipped a slab of it — which is the unglamorous way of saying someone’s about to get rich selling the word “no.”

Whether any of this sticks comes down to a question that’s less attention-grabbing than the week’s headlines: Does open stay easier than closed before the defaults harden? There’s a real case it won’t: A much-shared Milk Road thread argues that open models are losing the race and rereads the DeepSeek hype as a head-fake. Take it seriously. The whole fight is whether “open” becomes the path of least resistance or stays the path for hobbyists. And defaults move like glaciers. Exhibit A from Brussels this week: The European Commission moved to kill the cookie banner (a small mercy), and member states, with Google cheering them on, are lobbying to keep it.

Incumbents defend friction. It’s the one moat that compounds while they sleep. Hold that next to memory, the substrate that every agent grows on. The paper “Governed Shared Memory” catalogs four ways a shared memory rots (leakage, stale facts spreading, contradictions that never resolve, provenance quietly collapsing), and “Memory Contagion” shows how one bad evaluation spreads through it like a rumor through an open-plan office. It’s a wide-open category and a quiet trap: If you own your model and your runtime but rent your memory, you’ve leased back the one thing that makes the agent yours over time.

Three links from elsewhere, none about who owns what:

A fully interactive 3D world now runs in a browser tab on WebGL and Three.js — no download, no app store, no one’s permission between you and it. The open web is still the most powerful permissionless platform anyone has built.

China open-sourced Origin Pilot, an operating system for quantum chips — scheduling, calibration, error correction — free for anyone to download and run on its homegrown Origin Wukong machines. It’s the same play we keep watching in AI: give away the control plane, own the standard. Now running in a colder, stranger part of the stack.

And the heaviest: Ukraine’s TrophyLab is open-sourcing the specs and blueprints of more than a hundred captured Russian weapons systems for the entire free world. You can even request samples. Transparency as a weapon Moscow can’t claw back. Which is the whole thing, really: open stopped being a license you pick. It’s becoming a way to make something impossible to take back.

Discussion about this post

Ready for more?