More

ColinEberhardt · 2026-01-23T06:30:06 1769149806

I know it’s a minor point, but it bugs me every time this form pops up…

Captive (noun): a person or animal whose ability to move or act freely is limited by being kept in a space; a prisoner, especially a person held by the enemy during a war.

Not an ideal term to use from a user perspective.

ColinEberhardt · 2026-01-22T07:12:25 1769065945

Nice work!

D3fc maintainer here. A few years back we added WebGL support to D3fc (a component library for people building their own charts with D3), allowing it to render 1m+ datapoints:

https://blog.scottlogic.com/2020/05/01/rendering-one-million...

ColinEberhardt · 2026-01-14T06:30:42 1768372242

An important point here is that it isnt doing a 1-shot implementation, it is iteratively solving a problem over multiple iterations, with a closed feedback loop.

Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.

This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)

ColinEberhardt · 2026-01-05T07:45:52 1767599152

Looking at the commit log, almost every commit references an AI model:

https://github.com/nibzard/awesome-agentic-patterns/commits/...

Unfortunately it isn’t possible to detect whether AI was being used in an assistive fashion, or whether it was the primary author.

Regardless, a skim read of the content reveals it to be quite sloppy!

nkko · 2026-01-05T15:36:09 1767627369

The flow was, me finding interesting pattern -> Claude ingesting the reference and putting it in a template -> Me figuring out if it makes sense -> push

ColinEberhardt · 2026-01-02T13:36:21 1767360981

Ironic, yet practical. This is so that the issue can be 'pinned' to the top of the Issues page, making it more visible:

https://github.com/ghostty-org/ghostty/issues

ColinEberhardt · 2026-01-02T10:19:09 1767349149

Nolan shares some interesting reflections at the end of this post. Here are a few direct quotes:

> "I’m somewhat horrified by how easily this tool can reproduce what took me 20-odd years to learn"

> "I’ve decided that my role is not to try to resist the overwhelming onslaught of this technology, but instead to just witness and document how it’s shaking up my worldview and my corner of the industry"

> "I have no idea what coding will look like next year (2026), but I know how my wife will be planning our next vacation."

ColinEberhardt · 2026-01-01T10:47:49 1767264469

It’s still around, and tends to be adopted by big enterprises. It’s generally a decent product, but is facing a lot of equally powerful competition and is very expensive.

ColinEberhardt · 2025-12-23T07:07:44 1766473664

“When an LLM can generate a working high-quality implementation in a single try, that is called one-shotting. This is the most efficient form of LLM programming.”

This is a good article, but misses one of the most important advances this year - the agentic loop.

There are always going to be limits to how much code a model can one-shot. Give it the ability to verify its changes and iterate, massively increase its ability to write sizeable chunks of verified and working code.

ColinEberhardt · 2025-11-22T17:29:45 1763832585

Oh, so _that_ is what a sub-agent is. I have been wondering about that for a while now!

ColinEberhardt · 2025-11-22T17:28:08 1763832488

Likewise. I have a nasty feeling that most AI agent deployments happen with nothing more than some cursory manual testing. Going with the ‘vibes’ (to coin an over used term in the industry).

heljakka · 2025-11-24T05:05:24 1763960724

I can confirm this after hundreds of talks about the topic over the last 2 years. 90% of cases are simply not high-volume or high-stakes enough for the devs to care enough. I'm a founder of an evaluation automation startup, and our challenge is spotting teams right as their usage starts to grow and quality issues are about to escalate. Since that’s tough, we're trying to make the getting-to-first-evals so simple that teams can start building the mental models before things get out of hand.

radarsat1 · 2025-11-22T19:10:20 1763838620

A lot of "generative" work is like this. While you can come up with benchmarks galore, at the end of the day how a model "feels" only seems to come out from actual usage. Just read /r/localllama for opinions on which models are "benchmaxed" as they put it. It seems to be common knowledge in the local LLM community that many models perform well on benchmarks but that doesn't always reflect how good they actually are.

In my case I was until recently working on TTS and this was a huge barrier for us. We used all the common signal quality and MOS-simulation models that judged so called "naturalness" and "expressiveness" etc. But we found that none of these really helped us much in deciding when one model was better than another, or when a model was "good enough" for release. Our internal evaluations correlated poorly with them, and we even disagreed quite a bit within the team on the quality of output. This made hyperparameter tuning as well as commercial planning extremely difficult and we suffered greatly for it. (Notice my use of past tense here..)

Having good metrics is just really key and I'm now at the point where I'd go as far as to say that if good metrics don't exist it's almost not even worth working on something. (Almost.)