The Complete ChatGPT Algorithm in 200 Lines + Minecraft’s Huge Secret

What I've been thinking about in open source and AI

May 08, 2026

Here’s what I’ve been reading, thinking about, and playing with. Open weights stopped being the headline this week. The stack around them — runtime, harness, audit, distribution — is where the action moved.

Andrej Karpathy — formerly of OpenAI and Tesla — published microgpt.py: the entire architecture behind ChatGPT compressed into 200 lines of pure Python, no dependencies. The header reads: “This file is the complete algorithm. Everything else is just efficiency.” The internet took it as a dare. Five thousand stars. Two thousand forks. Ports in Rust, OCaml, Julia, JavaScript, CUDA, plus a pure-C version. Then someone shipped it to silicon.

A footnote: Karpathy said in a recent No Priors interview that he tried to get an agent to write microgpt for him. It can’t do it, he said. So he wrote it himself. A data point about who’s still doing the cognitive work at the bottom of the stack.

Back to silicon. People build custom silicon — special hardware for a single purpose — because it’s supposed to be faster than your CPU, which has to be general-purpose. Luthira Abeykoon’s TALOS-V2 is the same Karpathy model running on an FPGA — a programmable chip you wire up for one job — on a $350 hobby development board. The writeup at v2.talos.wtf is one of the best pedagogical hardware-design docs I’ve read this year. A few weeks ago, Alex Cheema’s benchmark put TALOS-V2 head-to-head with a MacBook running the same model multiple ways. Tokens per second, slowest to fastest:

MacBook, MLX (Apple’s GPU framework): 3,300
MacBook, pure Python: 7,400
MacBook, NumPy: 40,000
TALOS-V2 (FPGA): 53,000
MacBook, hand-tuned C on one CPU core: 3,760,000

The MLX result is the surprise. GPUs win by doing thousands of math operations in parallel — but every batch has to be shipped over with some fixed setup cost. On a normal model, that cost is invisible. On a model this tiny (~4,000 multiplications per token), the setup is bigger than the work. The GPU sits there, waiting. The same story explains why Python and NumPy lose to the FPGA, and why hand-tuned C — almost zero overhead — wins by 71x. The FPGA’s remaining moat is form factor: It runs off a battery on something the size of a credit card, and a MacBook does not. The right question for tiny on-device AI isn’t How fast is your custom chip? It’s Do you actually need the form factor? If not, the laptop you already own, running plain C, beats everything.

Most “I got this running locally” stories are bloat-stripping stories. Quantization, custom runtimes, dropping the GPU when the GPU is overhead — same move at different layers. What open weights buy you isn’t the weights. It’s the right to choose your abstraction tax.

But what about a translator that handles 1,056 language directions, runs offline on your phone, and outperforms Microsoft Translator, Doubao, and open models 20–40x its size (Tower-Plus-72B, Qwen3-32B) on Flores-200? That model is 440MB, fits on a USB stick, and Tencent open-sourced it as Hy-MT1.5-1.8B-1.25bit last week.

How they got there is the part you’ll want to steal: 1.25 bits per weight. Most weights end up at one of three states, and the model still wins. Tencent paired the new compression with a custom mobile-CPU runtime fast enough to actually use on the hardware in your pocket. Model + compression + data + runtime + open-source packaging is a five-layer practice that didn’t cohere as an industry discipline 18 months ago. The model is the headline. The discipline is the alpha. If your app currently sends translation traffic to a commercial API, that’s now a build-vs.-buy decision rather than an obvious buy.

Training stays on the device, too. Federated learning lets a million phones (or hospitals, or banks) collaborate to train a shared model without any of their raw data ever leaving the device. You probably typed something today using one: Gboard’s next-word prediction has been trained this way since 2017, on billions of phones, without Google ever seeing what anyone typed. The catch is that it falls over in production because real devices are slow, flaky, or both. MIT’s FTTE, out this past week, handles that at scale: 81% faster convergence, 80% lower on-device memory, 69% lower communication overhead, validated on four Raspberry Pi 5s with up to 500 simulated clients, and 90% of them lagging behind. Privacy-preserving AI moves from research demo to deployable shape. If your domain has a privacy requirement that’s been keeping AI off the table, the deployment shape just got real.

Your dependence on closed top-tier APIs is now a procurement choice rather than a technical one. Ant Group's Ling-2.6-1T (MIT license) is the reason. A trillion parameters, eight H100s minimum to load — roughly $250K of GPUs, or ~$50/hour to rent inference — post-trained for intelligence-per-token rather than benchmark-per-token. Fewer process tokens burned per useful answer, which is the metric that shows up on your inference bill. Until this quarter, swapping a closed-frontier API for an open model meant accepting a meaningful capability drop. Ling closes that gap, and it plugs into the agent runtimes builders are already using (Claude Code, OpenClaw, OpenCode). The swap is one config line.

SenseTime open-sourced SenseNova-U1 the same week. Today’s multimodal AIs are chains — one model sees, another reads, a third writes — and detail leaks out at every handoff. SenseNova-U1 fuses them. The 8B variant fits on consumer hardware. Closed labs still own the frontier. They’ve lost their monopoly on how to think about it.

The open stack now runs from $0 to $1M of hardware, with a real choice at every price point.

Open weights without a deployable harness don’t change the build-vs.-buy math — that’s a gap a lot of my Mozilla work points at. The harness is the software that wraps the model: filesystem, bash, memory, scheduled re-prompting, skills as just-in-time tools. LangChain’s Anatomy of an Agent Harness is the cleanest map of what a harness actually is, and the slogan is the right one: if you’re not the model, you’re the harness.

On Terminal Bench 2.0, identical Claude Opus 4.6 spreads 20 points — 58.0 in Claude Code, 79.8 in ForgeCode. Models are commoditizing. Harness engineering is where the leverage is. The missing piece in LangChain’s catalog is a real permission model: What the agent can do without asking, what requires confirmation, what’s forbidden. That’s the gap Harbor (sketched in an earlier post and in last week’s piece) takes a first run at.

Five steps. That’s how long it took to put an AI agent on a Mac. wexare-ai/openbrowserclaw fits an entire agent into one browser tab: Web Worker for the server, IndexedDB for the database, OPFS for files, a Linux VM compiled to JavaScript for bash.

Solstone is open source, runs locally, and captures what you see and hear into a searchable timeline. The interesting part is corporate, not technical. Solstone’s parent, Sol PBC, wrote, “We don’t sell, license, or lease your data” into its charter, and bound any future acquirer to the same terms. Marketing usually says, “You own your data.” Solstone made it legally enforceable.

A chatbot’s mistakes cost tokens. A robot’s mistakes break things. The body and simulator side of robotics has been opening up (Newton, Asimov, RoboParty, tracked here last month). Three pieces released this past month do the same for methodology and data, the layers that matter more before you let a robot loose in your house.

NVIDIA’s RoboLab is the first serious audit benchmark: 120 tasks deliberately built so the model can’t just memorize objects from the most popular training set. RAI Institute’s ExpertGen takes 200 imperfect human demos and turns them into 80% real-world success on a manipulation task — where standard imitation training on the same data lands at…0%. Peking’s LDA-1B trains a 1.6B-parameter robot model on 30,000 hours of mixed-quality data and shows that adding the noisy data makes the model better, not worse — same instinct as FTTE, applied to a body. The audit surface for a robot is its evals, its methods, and its data. Next time you scope a robotics vendor, that’s the spec sheet to ask for.

To end on something completely different: Reporters Without Borders re-opened the Uncensored Library in March with a new US wing: banned and censored journalism distributed inside Minecraft, because Minecraft is the rare piece of internet infrastructure no government has bothered to block. Over a million visits. Ten million books read. RSF has been smuggling press freedom into authoritarian countries through a video game since 2020. It’s the same instinct as the rest of this letter, applied to a sandbox where 12-year-olds build castles. The gatekeepers can’t reach a layer they don’t take seriously. That’s the layer to build on.

What have you been diving into this week? What have I missed? Leave a comment below.

Discussion about this post

Ready for more?