on llms (Ⅴ)

23 May 2026

Again starting with my my reading list (since last time):

what i've read these last three months (would you look at that? it's three months to the day!)

Iris Meredith’s What’s left to say—speaking to concepts now seemingly alien like virtue, noting that hardline positions have sheared from reality, identifying that slop being in fact broadly desired explains its proliferation more than the tools! —we have plenty of machinery to create many other things en masse —and much more cheaply!— and yet do not— —is what has made now feel like a good time to checkpoint my own feelings again.

aside on punctuation

I also thought I might excessively parenthesise using em-dashes, as (a) I’ve been reading Brontë and Austen and wow, they don’t give a fuck, do they?, and (b) increasingly I see commentary ever-more hardline on typography in its declarations of origin, with statements such as:

This commentary in the bundled translation of zlib cites an RFC section with §, which is (IME) rare in human-authored code but frequently used by LLMs:

Dear reader, if we have discussed CommonMark and I have linked you a section, there is a 99% chance I have referred to “CommonMark § 6.2 Emphasis and strong emphasis” or similar. I use em-dashes quite a lot, particularly in comments. There’s another certain kind of autist who loves to italicise the names of media (such as when talking about Jane Eyre) and while that’s not for me (for whatever reason it actually irks me really badly!?), it is just one more trait. Any given model is going to pick some traits and not others. I am not going to contort my writing style to avoid theirs despite my incredible disdain for slop; if anything I will dig my heels in, not be pushed on out of myself.

What is left to say? I think I have little left to comment on capabilities—they’re spiky (jagged, per Helen Toner¹), quite decent at many things, extremely inconsistent, the right harness/prompt/context increasingly one-shots larger and larger things (but see also: “extremely inconsistent”), even smaller changesets carry decisions you have to ask about if you want to understand the premises of those decisions; often ones that do not fully hold, sometimes ones that don’t hold at all, and despite this you will tend, over time, to ask less and less often than you know you should. Gotta go fast!

Overall I have continued to use them at work, as is increasingly measured/expected/required (and I mean at my company, but broadly true of the industry too!), and still I don’t feel there is moral penalty in an individual’s use. But as for work: the pitfalls continue to show underfoot, and I find myself having to contort and constrain my use of them to avoid them invisibly infecting the outcome of the next thing I do. This is as an expert in my domain, with full ownership of it across the entire codebase! It gets really bad when well-meaning people use these things in areas they don’t understand fully²³, and it seems not like a far stretch to make this (extremely well-supported) observation and then think: hmm. Perhaps there are deleterious effects yet, even when the understanding is there?

I feel like I digress; am I still talking about capability, on which I said I had little left to comment? In some ways yes, but I want to get at that which isn’t, so let’s adjust our perspective: assume these tools are perfectly capable. They make the correct change, write the feature perfectly, get the answer right 100% of the time. What then?

So you don’t need to learn to program (or whatever your field is), you can just have the thing. What thing is it you needed? Is it satisfactory?, having the thing?

We recently had this experience: enbi is the lovely Operator which builds the things our cluster runs. I was super proud of our efforts here (+1 Kubebuilder), and like, it works solidly. Only taken the cluster down a few times⁴! It’s incredibly helpful for me, and just, yeah. Gosh.

While evaluating the current era of agentic coding, one thing I thought would be a well-scoped task was to add a web dashboard to enbi; show NixBuilds in the system, live refresh, stream logs from builds in progress, that kind of thing.

screenshot of jj log showing commits over an evening building the feature out

And, with not much effort, it worked. I’ve used it ever since then and it’s needed zero effort. While most of it was rote work (add listener, fight html/template, cache in SQLite, &c.), and the resulting interface forgettable, I had no idea how to do the log streaming thing, and it just made it happen!

This was, overall, a bad outcome. Despite reading the diffs closely at the time, I did not gain an iota of understanding, and while I now know, in theory, how to do the thing (though, not having actually gone through the motions myself, could not recall to you now how that is!), what I actually know (of) is this one specific way that happens to do the thing, but cannot speak to any tradeoffs about it whatsoever. I don’t know if it’s a good way to do the thing, a bad way, the only way; I don’t know at all. I don’t know if there was some obviously bad decision made in the implementation. I could speak for the system as a whole before; now there was a sizeable component that was opaque without considerable work, and any such work done to understand it post-hoc—and any outcomes of such work—would necessarily be of completely different character to that if I’d instead actually done the thing myself to begin with.

git commit "undebase myself a bit", diffstat shows 1978 deletions, 4 insertions

Now I’m back to k enbi ga and k enbi l -f pod/podwatcher-859ffb94cf-859ffb94cf-8xj27 with the pod name selected by C-t P RET, and if that’s friction, that’s good: to be felt until I care enough to fix it myself, to understand the shape of the problem- and solution- spaces, and the topography surrounding⁵ them ..

It turns out, for me at least, that having the thing isn’t satisfactory. Even if the tool was 100% capable! Like, remove all the negative “I don’t knows” from two paragraphs earlier:

This was, overall, a bad outcome. Despite reading the diffs closely at the time, I did not gain an iota of understanding, and while I now know, in theory, how to do the thing (though, not having actually gone through the motions myself, could not recall to you now how that is!), what I actually know (of) is this one specific way that happens to do the thing, but cannot speak to any tradeoffs about it whatsoever. I don’t know if it’s a good way to do the thing, ~~a bad way,~~ the only way; I don’t know at all. ~~I don’t know if there was some obviously bad decision made in the implementation.~~ I could speak for the system as a whole before; now there was a sizeable component that was opaque without considerable work, and any such work done to understand it post-hoc—and any outcomes of such work—would necessarily be of completely different character to that if I’d instead actually done the thing myself to begin with.

Is that better? Do trade-offs actually cease to exist; cease to need consideration? Are we starting to think that maybe the concept of a tool which can be “100% capable” is deliriously underspecified? Ignoring the immensity of questions which must needs go unanswered in this hypothetical, suppose the tool now says:

To do this next thing you’ve asked of me, I’ll have to rearchitect the whole thing due to earlier decisions as were made necessarily without unlimited foresight—and I’ll do it immediately now, perfectly.

Are we still happy, as programmers, as thinkers, as crafts[non]people and living beings? Do we press “allow” and consider no more? Did we --dangerously-skip-permissions when deciding to forgo consideration as a whole? There’s a pigeon sitting on the roof outside, not struggling with this question at all.

My hobby/craft/profession/practice/way of being isn’t about the things I’ve made; it is not about value, (the) economy, outcomes or output. It is about me; about what is inside of me.

And speaking of “me”: my computing system extends me in thought and ability; how careful I should be when delegating any part of that practice of self to another! How unthinkable to delegate to a trillion-parameter black box!

hi helen! ↩
avoiding using an example from work for obvious reasons, but they aren’t few! ↩
that story at least did have a happy ending. ↩
← did not GC the PV which stores its Nix cache, no alerting on disk space; now intimately familiar with disk pressure node taint … :S ↩
I recall seeing some discussion of using fzf for changeset selection recently; seems like a really good direction to try. ↩