AI Agents, Robot Data, and Maxing out Claude
The week in the world of open-source.
Here’s what I’ve been reading — and thinking about — this week.
ClawGUI argues that GUI agents — systems that drive apps through taps and swipes instead of APIs — are stuck less on modeling than on infrastructure. Training environments crash. Nothing reproduces across labs. Agents ship in simulators and never touch a real phone. ClawGUI opens the whole stack: RL training, standardized eval, deployment to Android, HarmonyOS, and iOS. The model isn’t the bottleneck. The scaffolding is. There’s still one piece missing: a permission model for agents acting on your behalf. Harbor sketched one for web agents: scoped, per-origin, revocable. Ported down to the device layer, it might be the primitive GUI agents need to run somewhere real.
Vercel just showed what happens without Harbor. VentureBeat called it “the first major proof case that AI-agent OAuth integrations are a breach class most security programs can’t detect, scope, or contain.” Operational fixes are circulating — admin-managed consent, default-sensitive variables, scope audits — all useful, all patches on a permission model that predates agents. Every broad OAuth grant is a pivot point waiting to be used. Scoped, revocable, per-origin isn’t a nice-to-have. It’s the architectural primitive we needed yesterday.
The scaffolding argument shows up in the economics, too. Anthropic’s pricing looks simple until you normalize to revenue per token. The math, according to @exponentialview, lands somewhere uncomfortable: frontier API economics are nonlinear. Cache hits price at a fraction of misses. Long contexts with heavy reuse subsidize themselves; short one-shots don’t. And none of this was designed with agents in mind. A single coding session can burn tokens at rates that would embarrass a month of chat. The sticker price looks stable. The cost per task doesn’t. If your agents are eating flash paper, the only durable fix is owning the runtime.
Until then, you’re still on their stack. This power-user guide for Claude quietly reframes what getting value from a frontier model means. It isn’t prompts. It’s persistent context: Projects, custom instructions, skills, Claude Code, Cowork. If you stop pasting context and start structuring it, Claude behaves less like a chatbot and more like a system you’ve configured. Prompt engineering is giving way to workflow engineering. Don’t copy someone else’s setup whole. Pick one real project, bring over one piece at a time, then keep what earns its place. That’s the first step. The structure still lives in their product, not yours. The exit ramp is missing.
If you’ve been thinking about memory as longer context windows, Plastic Labs is going in a different direction. (Disclosure: It’s a Mozilla.vc portfolio company; I’m CTO at Mozilla.) Their Honcho is the exit ramp the platforms haven’t built. It treats memory as a reasoning layer, not storage — patterns learned across users, agents, and sessions, carrying state across tools and providers. Less chat history, more shared context layer.
Zoom out and the same pattern repeats. World models are having their closed-vs.-open moment, almost on schedule. Google’s Genie 3 is still in research preview — no weights, no API. World Labs’ Marble ships commercially at $20-$95 a month. Meanwhile, the open side has been quietly filling in: Ant Group’s open-sourced LingBot-World dropped in January under Apache 2.0, with fast-model updates in April: Tencent followed with HY World 2.0 in mid-April, explicitly benchmarking against Marble. It’s the same playbook as coding models: closed labs meter access, Chinese labs open the weights and cede the field. Which means capability isn’t the gate anymore. Control is — whose worlds, running on whose hardware, with weights you can actually inspect.
But speed costs something. A link that’s been making the rounds reminded me of something I wrote last year: the most important thing we can do for data is preserve it. As more of the internet moves behind APIs, logins, and AI interfaces, the default shifts from public-but-fragile to private-and-inaccessible. And the scope of what needs preserving just got bigger. A decade ago, it was pages and datasets. Now it’s agent interactions — the conversations, the tool calls, the reasoning traces that are starting to stand in for what used to be documented decisions. An agent does the research, drafts the reply, negotiates the terms. If nobody captures the session, the decision exists nowhere at all. The individual version of this is why Morph exists — capture the trace, keep the receipts. It’s what I called a RAID array for civilization: redundancy across drives, geographies, institutions, and funding models, so no single hard drive, agency, or administration can silently lose the only copy. Preserving the open web stopped being passive. Preserving agent memory hasn’t even started yet.
And while the machines learn, someone is teaching them. This clip is getting framed as new. It isn’t. It’s how humanoid robotics gets trained now. Pi and Google run teleop farms. Tesla pays people in motion-capture suits. Sunday ships capture gloves. And in lower-cost labor markets, workers repeat simple tasks for hours so models can learn how to move. There’s no Common Crawl for robot data; it has to be made. The same labor arbitrage that trained AI on text is showing up in physical space. The output isn’t words anymore. It’s bodies.
Bailey Pumfleet’s post reads as provocation first, argument second. The observation is real: AI makes scanning and exploiting code cheaper at scale, and security pressure on open projects is rising. But “audits are expensive” to “open source is dead” is a leap, and I think Bailey is absolutely wrong. (Disclosure: Thunderbird and Thunderbolt are projects at Mozilla.) The same forces making attacks cheaper make audits, forks, and alternatives faster. Last year, Thunderbird committed on the record that Thunderbird is and always will be free and open source, even as the team builds paid services around it. This month, the same team shipped Thunderbolt, a self-hostable open-source enterprise AI client pitched directly against Microsoft Copilot, ChatGPT Enterprise, and Claude Enterprise. Open source isn’t dying. It’s showing up in the places where closed platforms have gotten too comfortable.
To end on something completely different. This is a great episode for the nerds who haven’t listened to The Daily yet: the Satoshi teardown. I’m an avid listener. When I was recording Technically Optimistic, my producer and I had a running refrain every time a take landed too earnest or too probing: Sounds too much like Michael Barbaro. Which is another way of saying The Daily’s format makes narrative journalism look easy, and it isn’t: Whoever’s hosting on a given day is working inside Barbaro’s template, and on most days you can hear the effort. This episode isn’t “Who is Satoshi?” It’s a methodical teardown of a 17-year mystery using stylometry, timelines, and Bayesian reasoning to narrow thousands of candidates down to one plausible person. What makes it land is that even after all that evidence, it’s still not definitive. You walk away less convinced that the mystery is solved and more with a gorgeous story of how much of the modern internet was built by people who could disappear completely.



