on llms (Ⅴ)
Again starting with my my reading list (since last time):
what i've read these last three months (would you look at that? it's three months to the day!)
- Transformers Represent Belief State Geometry in their Residual Stream
- Simulators
- RFC 406i - The Rejection of Artificially Generated Slop (RAGS)
- Obligatory AI Opinions Post
- Yes, and…
- Everything changes, and nothing changes
- On the Problem of LLM-assisted Contributions to Open Source Projects
- activation energy
- You don’t have to if you don’t want to.
- Giving LLMs a personality is just good engineering
- Does that use a lot of energy?
- AI And The Ship of Theseus
- I don’t know if my job will still exist in ten years
- Newsletter 1: Catching folks up on agents
- LLM time
- I Created My First AI-assisted Pull Request and I Feel Like a Fraud
- Linux kernel czar says AI bug reports aren’t slop anymore
- “I agree with the article that you generally shouldn’t use LLMs for writing, but this is a really bad reason why.”
- I used AI. It worked. I hated it.
- Programming (with AI agents) as theory building
- Mouthwords
- Legibility is Ruining You
- Thoughts on slowing the fuck down
- Eight years of wanting, three months of building with AI
- The machines are not, in fact, fine.
- Where is it like to be a language model?
- The AI water issue is fake
- Taking Jaggedness Seriously
- Let’s talk about LLMs
- The Center Has a bias
- I don’t want a screenshot of your Claude conversation
- Many anti-AI arguments are conservative arguments
- Multiple things can be true at the same time
- BEWARE OF SOFTWARE BRAIN
- Contributor Poker and Zig’s AI Ban
- Why I Don’t Vibe Code
- The age of fast food
- Agentic Coding is a Trap
- Content for Content’s Sake
- How LLMs Distort our Written Language
- Tokens and Dreams
- Why hasn’t longer-horizon training slowed AI progress?
- Add an LLM policy for `rust-lang/rust`
- Profession (novella)
- What’s left to say
Iris Meredith’s What’s left to say—speaking to concepts now seemingly alien like virtue, noting that hardline positions have sheared from reality, identifying that slop being in fact broadly desired explains its proliferation more than the tools! —we have plenty of machinery to create many other things en masse —and much more cheaply!— and yet do not— —is what has made now feel like a good time to checkpoint my own feelings again.
What is left to say? I think I have little left to comment on capabilities—they’re spiky (jagged, per Helen Toner1), quite decent at many things, extremely inconsistent, the right harness/prompt/context increasingly one-shots larger and larger things (but see also: “extremely inconsistent”), even smaller changesets carry decisions you have to ask about if you want to understand the premises of those decisions; often ones that do not fully hold, sometimes ones that don’t hold at all, and despite this you will tend, over time, to ask less and less often than you know you should. Gotta go fast!
Overall I have continued to use them at work, as is increasingly measured/expected/required (and I mean at my company, but broadly true of the industry too!), and still I don’t feel there is moral penalty in an individual’s use. But as for work: the pitfalls continue to show underfoot, and I find myself having to contort and constrain my use of them to avoid them invisibly infecting the outcome of the next thing I do. This is as an expert in my domain, with full ownership of it across the entire codebase! It gets really bad when well-meaning people use these things in areas they don’t understand fully23, and it seems not like a far stretch to make this (extremely well-supported) observation and then think: hmm. Perhaps there are deleterious effects yet, even when the understanding is there?
I feel like I digress; am I still talking about capability, on which I said I had little left to comment? In some ways yes, but I want to get at that which isn’t, so let’s adjust our perspective: assume these tools are perfectly capable. They make the correct change, write the feature perfectly, get the answer right 100% of the time. What then?
So you don’t need to learn to program (or whatever your field is), you can just have the thing. What thing is it you needed? Is it satisfactory?, having the thing?
We recently had this experience: enbi is the lovely Operator which builds the things our cluster runs. I was super proud of our efforts here (+1 Kubebuilder), and like, it works solidly. Only taken the cluster down a few times4! It’s incredibly helpful for me, and just, yeah. Gosh.
While evaluating the current era of agentic coding, one thing I thought would be a well-scoped task was to add a web dashboard to enbi; show NixBuilds in the system, live refresh, stream logs from builds in progress, that kind of thing.
And, with not much effort, it worked. I’ve used it ever since then and it’s needed zero effort. While most of it was rote work (add listener, fight html/template, cache in SQLite, &c.), and the resulting interface forgettable, I had no idea how to do the log streaming thing, and it just made it happen!
This was, overall, a bad outcome. Despite reading the diffs closely at the time, I did not gain an iota of understanding, and while I now know, in theory, how to do the thing (though, not having actually gone through the motions myself, could not recall to you now how that is!), what I actually know (of) is this one specific way that happens to do the thing, but cannot speak to any tradeoffs about it whatsoever. I don’t know if it’s a good way to do the thing, a bad way, the only way; I don’t know at all. I don’t know if there was some obviously bad decision made in the implementation. I could speak for the system as a whole before; now there was a sizeable component that was opaque without considerable work, and any such work done to understand it post-hoc—and any outcomes of such work—would necessarily be of completely different character to that if I’d instead actually done the thing myself to begin with.
Now I’m back to k enbi ga and k enbi l -f pod/podwatcher-859ffb94cf-859ffb94cf-8xj27 with the pod name selected by C-t P RET, and if that’s friction, that’s good: to be felt until I care enough to fix it myself, to understand the shape of the problem- and solution- spaces, and the topography surrounding5 them ..
It turns out, for me at least, that having the thing isn’t satisfactory. Even if the tool was 100% capable! Like, remove all the negative “I don’t knows” from two paragraphs earlier:
This was, overall, a bad outcome. Despite reading the diffs closely at the time, I did not gain an iota of understanding, and while I now know, in theory, how to do the thing (though, not having actually gone through the motions myself, could not recall to you now how that is!), what I actually know (of) is this one specific way that happens to do the thing, but cannot speak to any tradeoffs about it whatsoever. I don’t know if it’s a good way to do the thing,
a bad way,the only way; I don’t know at all.I don’t know if there was some obviously bad decision made in the implementation.I could speak for the system as a whole before; now there was a sizeable component that was opaque without considerable work, and any such work done to understand it post-hoc—and any outcomes of such work—would necessarily be of completely different character to that if I’d instead actually done the thing myself to begin with.
Is that better? Do trade-offs actually cease to exist; cease to need consideration? Are we starting to think that maybe the concept of a tool which can be “100% capable” is deliriously underspecified? Ignoring the immensity of questions which must needs go unanswered in this hypothetical, suppose the tool now says:
To do this next thing you’ve asked of me, I’ll have to rearchitect the whole thing due to earlier decisions as were made necessarily without unlimited foresight—and I’ll do it immediately now, perfectly.
Are we still happy, as programmers, as thinkers, as crafts[non]people and living beings? Do we press “allow” and consider no more? Did we --dangerously-skip-permissions when deciding to forgo consideration as a whole? There’s a pigeon sitting on the roof outside, not struggling with this question at all.
My hobby/craft/profession/practice/way of being isn’t about the things I’ve made; it is not about value, (the) economy, outcomes or output. It is about me; about what is inside of me.
And speaking of “me”: my computing system extends me in thought and ability; how careful I should be when delegating any part of that practice of self to another! How unthinkable to delegate to a trillion-parameter black box!
-
hi helen! ↩
-
avoiding using an example from work for obvious reasons, but they aren’t few! ↩
-
that story at least did have a happy ending. ↩
-
← did not GC the PV which stores its Nix cache, no alerting on disk space; now intimately familiar with disk pressure node taint … :S ↩
-
I recall seeing some discussion of using
fzffor changeset selection recently; seems like a really good direction to try. ↩