skip to content
vinny neves
back

the gemma lives on my laptop. the fable lasted three days.

6 min read
cinematic cyberpunk illustration: a bald, bearded man with glasses holds a black laptop at the center of a cyan energy sphere covered in japanese characters, a white shih tzu sitting at his feet; behind him a nighttime city and, in the upper right, a dark figure made of digital fragments dissolving, bound by a chain and a glowing red padlock
the model that's mine glows around me; the most powerful one in the world dissolves, chained to a padlock. sovereignty stopped being a slide.

in under two weeks, two launches told the same story from opposite angles.

early in june google put gemma 4 12b on my laptop — multimodal, apache 2.0, running local. the following week anthropic launched fable 5, the most powerful model ever made available to the public. three days later it was already offline.

one landed on my machine to stay. the other landed in the cloud and got switched off before i could even test it.

what gemma unlocked

the launch got its quota of walkthroughs — the most-shared was addy osmani’s, and it’s a good roundup: genuinely multimodal (text, image and audio on the same backbone, no separate encoder), runs local, apache 2.0.

i won’t repeat the whole list. the detail that matters for anyone who writes code isn’t the model. it’s a single line of cli:

litert-lm serve

that spins up an openai-compatible endpoint on your machine. then you point opencode, aider, continue — anything that speaks the openai dialect — at localhost:9379 and you’re done. your code never leaves the laptop.

plugging in a local agent wasn’t born yesterday. ollama, lm studio, llama.cpp already did this. what changed isn’t the possibility — it’s the model on localhost finally being good enough that you’d trust it with a real task.

before you copy the “16GB” figure that went around: that’s ram or unified memory, not vram. if it were vram, half the macs (which are exactly the use case) would be out. in 4-bit unsloth’s docs cite something close to 8gb. the comfortable floor is 16. the real minimum is lower.

hold onto that, because the other half of the week changes the weight of that “runs local.”

sovereignty stopped being a slide

i was going to write here that digital sovereignty is a nice, abstract principle. i’ve talked about it on hipsters in that same tone.

then the abstraction knocked on the door.

fable 5 launched on a tuesday, the 9th. by friday, the 12th, it was already offline. it wasn’t a bug: it was an export-control directive from the us government, citing national security, and anthropic had to suspend access to fable 5 and mythos 5. me, in braga, i lost the most capable model in the world three days after it existed. without having done anything. (times brasil/cnbc)

and it wasn’t hype. i used fable in those three days — it’s not a souped-up opus, it’s another tier. it closed out long tasks that opus got stuck on. it burns a lot more tokens, sure, but the result paid for it. what was taken from me wasn’t a marginal upgrade. it was real capability.

note that anthropic isn’t the villain here. they were ordered. amodei reportedly pushed back and still had to comply. that’s exactly the point: not even the lab that trained the model holds sovereignty over access to it.

and it doesn’t stop at the government. microsoft — which resells fable 5 to its own customers via azure — blocked its own employees’ internal access to the same model, because of the data-retention policy (reported by the verge). the people selling it don’t trust it enough to use it at home.

and even when the model is up, you don’t always control which model answers you: fable’s safeguards silently swap to opus 4.8 in certain areas. the capability you think you’re using may not be the one that’s running.

put it all together: government, vendor legal, silent classifier. three layers of decision you don’t control and don’t negotiate, any one of them able to change what you get overnight.

it’s rented. and the owner can switch it off.

tiering, not replacement

so i’m not going to tell you to drop the cloud. that’d be dumb — for the hard problem, it still wins by a mile.

what i will tell you is to stop calling in the crane to hang a picture frame.

renaming a variable, a commit message, the first review pass, boilerplate — routine load. runs local, free, no latency, no burning expensive context. the architecture, the bug that won’t reproduce, the ten-file refactor — that goes up to the cloud, when the problem pays the bill.

you take off the expensive tier what never needed to be there. savings, day to day.

but fable’s week adds a second reason, a harder one: the local tier is the only one nobody switches off. no government, no vendor legal, no classifier. it’s not the best tier. it’s the tier that’s left when the others vanish.

you don’t drop the cloud. you stop depending on it alone.

how i’d use it tomorrow

the rule is simple: if the task forgives a mediocre result, it goes local. if it doesn’t, it goes up to the cloud.

local (gemma 12b, no network, never switched off):

cloud (claude, when the problem pays the bill):

the price nobody mentions

having a tier nobody switches off has a cost. and the cost is you.

a smaller model doesn’t forgive a loose harness. claude is that senior who gets what you meant even when you explain it badly. gemma 12b is the diligent intern: it does exactly what’s written. ask for a rename and it nails it. ask for a refactor crossing five modules and it starts to lose the thread along the way. an ambiguous instruction that claude fixes on its own tends to come back from the 12b as a literal error.

if your CLAUDE.md is vague, the output comes out vague — and that’s on you, not on it.

in other words: the sovereign tier doesn’t lower the bar on your context engineering. it raises it. all that care with the harness that the big model let slide becomes a requirement.

for two years the conversation was quality: which model codes better, reasons better, costs less. last week added a variable almost nobody was measuring: availability.

the best model in the world doesn’t help you when someone can switch it off. that’s why the local tier matters — not because it’s the most capable (it isn’t), but because it’s the tier nobody switches off.

and the price of that tier is you. a model that’s truly yours won’t cover for your laziness with context — it forces you to write the instruction. which is, in the end, the one skill that doesn’t change its name when the model of the year changes.


share this post:

testing in production

an occasional newsletter on claude code, ai-assisted dev, and what it's like to teach code for a living. hosted on linkedin — you subscribe there and it lands in your feed and inbox.

subscribe on linkedin

next post
what survives when everything changes