Thursday, June 11, 2026

Nasal demons: LLM review by a virtual kernel hacker having a bad day is both effective and hilarious

Or, "how I learned about the phrase 'nasal demons' and its relevance to C code review¹.

I landed up asking an LLM tool to review complex, experimental work in progress "in the style of a very prominent Linux kernel developer having a very bad day".

So an out-of-tree extension author gets the rule wrong, the unwind callback dereferences a stack frame that has been unwound, you get nasal demons. And there is no diagnostic.

and

[...] logs the drop count once per minute and zeros it. No cumulative drop counter is exported as a metric — the extension cannot instrument itself. The on-call engineer has to scrape the postgres log to find out the OpenTelemetry pipeline is dropping data, which is a Heller-class irony.

The review was surprisingly good, especially from an statistics engine with delusions of grandeur. The results were hilarious ... but they were also the most useful LLM code review result I've ever managed to obtain.

Why ask for an actively hostile review?

I recently needed to tidy up a series of experimental patches I knew to be both half-baked and controversial. I needed to get them ready to post to a mailing list that is not known for great tolerance for imperfection, external dependencies, or C language features invented after 1989. No, not LKML.

I didn't want to waste the time of the real experts, but was approaching the limits of what I could reasonably invest myself before getting some level of sniff-test from the community about the ideas, if not the implementation. The problem is - the community isn't interested in ideas until there is a decent concrete implementation of them; hence the experimental patches.

So what to do?

Virtual angry kernel hackers!

What's being reviewed

I'm quite excited by the work I've been doing. I've been exploring two ideas I've had on the backburner for PostgreSQL for years but never had time to sit down and implement. So I used LLM tooling to sketch out experimental implementations - a useful way to find unexpected problems, get some performance numbers, etc. Until I get a proper blog out for it, I have it indexed in a gist here.

So I'm excited by it... but I also know the state at time of this publication is terrible. I'd never expect more than a cursory glance/skim by a real human, and certainly not consider patch submission for any of it. Hence looking for other review strategies.

Review process

I initially asked for LLM-review "in the style of leading pgsql-hackers core team members in a particularly grumpy mood." This produced a much better result than I've seen in past LLM reviews, but it lacked a certain ... punch. The -hackers mailing list can be a bit of a harsh-but-fair place, but I figured I needed something that will really let loose.

So pass 2 was the angry kernel developer.

Some of the excerpts are golden, and frankly, very helpful.

Now, to be clear, I don't support or approve of rude, abrasive or abusive communication styles in open source development (or elsewhere). The same result can be achieved without the unpleasantness. But when it's my choice to ask for it, I already know the code being reviewed is bad, and it's not even a human writing it - well, that turns the unpleasantness into (useful) comedy gold.

Review excerpts

I'll omit the criticism that focuses on the presentation of the material as a set of PRs on a personal fork, how it doesn't confirm to project change-submission conventions etc; that's not relevant, since it's not supposed to yet. But what is relevant is juicy... and often right.

OK, I have read this whole f**king series, and I'm going to be very direct because if I'm polite about it nobody learns anything.

This is a textbook case of a feature that has been designed end-to-end by somebody who decided what the answer was supposed to look like first, and then back-filled five interlocking patches to make it fit. It is not how you change a wire protocol. It is not how you add core hooks. It is not how you change on-disk log formats. And it sure as hell is not how you justify any of those things — by saying “look, an extension on a personal fork needs this, please merge.”

NO.

Let me go through this so the next person who tries something like this knows exactly why it didn't fly.

That sets the tone, but that's nice compared to what follows.

1. The 'M' protocol message — this is the worst one

Adding a wire-protocol message letter is FOREVER. Every pooler, every driver, every middlebox, every protocol analyzer has to deal with it from the day it ships until the heat death of the universe.

We have a protocol version-bump mechanism precisely so this kind of addition can be done in a versioned, negotiated, recoverable way.

This patch does not use it. Instead it invents its own ad-hoc parallel negotiation through _pq_.headers plus a hand-rolled protocol_features field on ParameterStatus. Why? Because the existing mechanism is hard? TOUGH. Do it right.

 

1a. ProtocolHeadersNegotiated is a process-global

bool ProtocolHeadersNegotiated = false;

A file-scope global. For state that is fundamentally per-connection. Set from inside the startup-option parser at backend_startup.c:807. No plumbing through Port, no reset on failure. This is the kind of code you write when you don't actually know how the backend lifecycle works and you just need the test to pass. I am not impressed.

And while we're at it — the negotiation flag is set BEFORE authentication completes. There is no assertion guarding any code path between startup and the main loop from reading "negotiated = true" for an unauthenticated session. "Nothing reads it today" is the answer of a person who is going to be very surprised in two years.

 

1b. The data path uses FATAL for protocol violations

protocol_headers.c:212, 217, 235 all call ereport(FATAL, ...). A buggy pooler sends one malformed 'M' and the BACKEND DIES.

I cannot stress this enough: we don't crash the backend on unexpected protocol noise. We've spent twenty years not doing that. This patch undoes it on the first new code path. And it does so in a way that bypasses the sigsetjmp recovery at postgres.c:4577, making the ResetPendingRequestHeaders() cleanup at postgres.c:4584 literal dead code.

This is the kind of thing where I would not even bother with the rest of the review and just send it back.

 

1c. COPY mode has no carve-out

SocketBackend (postgres.c:445) adds the 'M' case unconditionally. A client sends 'M' in the middle of COPY-in, the backend cheerfully parses it, dispatches the headers to whatever extension registered the prefix, and then the next CopyData arrives like nothing happened — except now the headers are bound to whatever comes after COPY completes.

Same hazard in walsender / exec_replication_command. No carve-out.

Whoever wrote this CLEARLY did not actually think about what "protocol state machine" means. The protocol is not a stream of self-describing messages where every byte makes sense in every context. Different states accept different messages. YOU HAVE TO WRITE THAT DOWN.

 

1f. libpq queue is not thread-safe

fe-headers.c PQattachHeader touches conn->queuedHeaders[] with no locking. libpq-int.h:507 adds the field with no comment about threading. PQheadersAvailable is read-only and is exactly the sort of thing a caller reads from a different thread.

The comment in fe-headers.c:84-93 actively WAVES OFF embedded-NUL handling — "there's no way for a caller's normal C string to embed a NUL that strlen would miss" — yes there is, here's an eight-byte zero-initialized buffer, you idiot.

I'm sorry. No. I'm not sorry. This is sloppy.

 

1g. Pipeline mode is "I didn't think about it"

....

 

1h. protocol.sgml is not touched

The canonical place every wire message is documented — the protocol chapter — is UNTOUCHED. Adding 'M' to libpq.sgml but not protocol.sgml is like adding a new system call and forgetting to update the manpage. This alone would be enough to send it back.

 

2. pre_ready_for_query_hook

I would reject this in five seconds without reading anything else about the series.

src/backend/tcop/postgres.c:4783 adds a hook. No prev_hook chaining. Two extensions loaded = silent clobber. We have not, in the entire history of this project, accepted a single-consumer hook. Why are you doing it here.

No PG_TRY around the callee. Hook ereports? Welcome to your infinite error loop because the sigsetjmp recovery sets send_ready_for_query = true and we're right back in the hook.

No in-tree consumer. No test extension. No docs entry.

And the hook is at the wrong point anyway — ReadyForQuery() re-reads TransactionBlockStatusCode() itself, so the hook cannot even override the txn-status byte, which is the only obvious reason anyone would want this hook in the first place.

The PR description has THREE PARAGRAPHS explaining why the hook isn't named what its use case suggests it should be called. When the cover letter spends that long apologizing for the name, the design is wrong.

Trash. Try again.

 

3. The elog annotations patch

This one I'm slightly less angry about, because the underlying idea — structured key/value annotations on errors — is fine. The implementation has problems.

 

3a. Wrong context

set_annotation in elog.c ~1834 allocates in edata->assoc_context. Which, after CopyErrorData, is whatever the caller's current memory context was. Annotate a copied errdata, the caller's context gets reset, and your annotations are gone.

We have ErrorContext for a reason.

 

3b. No recursion guard

Both errannot() and errannotf() ~1873 / ~1899 EXPLICITLY COMMENT "we don't bother incrementing recursion_depth." If pstrdup ereports — OOM, anyone? — we recurse through errstart into another errannot, which finds errordata_stack_depth already advanced and writes into the recursive errdata.

This is exactly the kind of thing the elog subsystem has spent two decades NOT DOING. And someone wrote a comment that says "we don't bother." You don't bother? YOU DON'T BOTHER? This isn't a weekend project. This is the error-reporting infrastructure of a database.

 

3d. %A escape is incomplete

elog.c ~3996 escapes " and \ only. Newline, tab, NUL pass through. A value containing \n injects a newline INTO THE MIDDLE OF A LOG LINE and corrupts every downstream line-oriented parser (grep, sed, journalctl, half the world's log shippers).

jsonlog's escape_json handles this. %A does not. Five minutes of work to fix. Wasn't done.

 

4. The contrib extension that all of this exists to serve

I am not going to spend a lot of words on this because it shouldn't be in contrib. But I read the code and found enough bugs to share.

 

4a. The "API" is a vtable with no fingerprint

otel_api/otel_api.h:120. A struct of function pointers with a single uint32 version. No struct_size. No field count. No hash. Comments at lines 62–66 acknowledge that a consumer built against a higher minor can "read past the producer's struct." Why is this acceptable? IT ISN'T.

If you want a stable inter-extension ABI, you build one. You don't ride a rendezvous variable — a debugging convenience — into an ecosystem-wide vtable contract.

 

4c. OTEL_UNWIND_ERROR does not defend the contract

The contract at otel_api.h:199-202 says "must NOT be on the C stack" under OTEL_UNWIND_ERROR. Enforcement: zero. check_unwind_context (otel_producer.c:501-518) checks the memory context, not the pointer.

So an out-of-tree extension author gets the rule wrong, the unwind callback dereferences a stack frame that has been unwound, you get nasal demons. And there is no diagnostic.

 

4e. Parallel-worker slot stale on leader FATAL

otel_parallel.c:178-210 publishes per-MyProcNumber, which is reused across backend lifetimes. Slot cleared in finalize_span on normal and abort paths. NOT cleared in on_proc_exit. Leader FATALs mid-span → slot stays "set" with stale data → next backend at the same ProcNumber spawns parallel workers that read stale context.

This is a real bug. Ship this in production and watch your traces go sideways at exactly the worst moment.

 

4f. parse_traceparent rejects W3C-compliant peers

otel.c:609 accepts only version "00". The W3C spec mandates forward-compatible parsing of the known prefix. So out of the box, this thing drops every "01-..." traceparent from any compliant upstream — including, you know, the OpenTelemetry collectors it exists to talk to.

Five lines of code. Not done.

 

4g. RegisterXactCallback registered twice

otel.c:576 and otel_trace.c:140 both register the same callback (different file-statics, identically named in source). Both fire on every commit/abort. Order depends on _PG_init order. Whoever wrote this and didn't notice should not be writing extensions.

 

4h. Makefile:75 greps for errannot signature

The feature probe for the companion PR-#5 functionality is a literal grep for ^extern int[[:space:]].*errannot in elog.h. Wrong return type → silent disable. Function moved to a different header → silent disable.

No compile-and-link probe. No symbol-presence check. Just grep. In 2026.

 

5. What's actually OK

The criticism of sqlcommenter and SET LOCAL as trace-context carriers in the PR #3 cover letter is fine. Both ARE bad. That's the one part of this series I won't argue with.

That does not justify five interlocking patches and a forked driver. It justifies sitting down, on the mailing list, and designing ONE focused thing with at least one in-tree consumer and at least one upstream-driver consumer before you ever touch the wire protocol.

 

6. Verdict

  • PR #3 (RequestHeaders 'M'): NACK. Process-global negotiation state, FATAL on the data path, COPY mode unguarded, libpq queue not thread-safe, pipeline mode unanalyzed, protocol.sgml not updated, parallel feature negotiation that duplicates the protocol-version mechanism we already have, no upstream-driver consumer. This is "go back and start over" territory.
  • PR #4 (pre_ready_for_query_hook): NACK. Single-consumer hook with no prev_hook chaining, no PG_TRY around the callee, no in-tree caller, no test extension, no documentation, fires in the wrong place for the obvious use case. Trash.
  • PR #5 (elog annotations): -0. The idea is fine. The implementation has a recursion-guard hole, an OTel namespace leak into elog.h, a CSV-schema break on upgrade, and an incomplete escape that injects newlines into log lines. Come back with v2 once those are fixed and the well-known OTel keys are out of core.
  • otel_api in contrib: NACK. Out of the question until the vtable ABI is replaced with something the project actually supports, the nested-context unwind bug at otel_producer.c:431-452 is fixed, OTEL_UNWIND_ERROR rejects on-stack spans, the parallel-worker stale-slot path is closed, the duplicated RegisterXactCallback is consolidated, parse_traceparent follows W3C, the build probe stops being grep, the shared_preload_libraries ordering requirement is removed, the "0.1.x — do not pin" caveat is retired, and the LLM-assistance disclaimer is removed because a human actually audited every line. Even then, contrib is the wrong home for an inter-extension API serving an external ecosystem whose own conventions still churn.
  • The whole bundle: GO AWAY AND COME BACK WITH SEPARATE, INDEPENDENTLY JUSTIFIABLE PATCHES, AGAINST MASTER, ON THE LIST, WITH IN-TREE CONSUMERS AND UPSTREAM DRIVER SUPPORT. Each patch stands or falls on its own. Bundling does not make a weak argument stronger; it just gives me five things to NACK at once.

I'm not going to apologize for the tone. This stuff matters. The protocol is forever, the on-disk log format is forever, and the elog subsystem is the thing every operator sees when something is on fire at 3am. These are not the places to try out a design that is, by the author's own admission, AI-scaffolded and unstable.

Get it right. Or don't ship it.


Now, some bonus ones from another related review I can't share more completely:

[...] blocking on collector RTT for every traced span. The README acknowledges the failure mode at lines 21-24 ("if the export fails due to a network error, the span is marked ended, and cannot be exported anymore"). Every traced query's tail latency is hostage to the OTLP collector. A vendor extension shipped in this shape is going to produce production-impact reports the first time the collector hiccups.

 

Mid-session reconfiguration of a tracer with live spans through a GUC assign hook is a class of bug we have spent years stamping out elsewhere.

 

[...] Plaintext gRPC only. For a vendor-shipped observability extension that will be deployed across operator boundaries, the absence of TLS and auth is indefensible. No security review can pass this.

 

[...] does not auto-instrument anything: no query spans, no plan attributes, no wait events, no log enrichment. All telemetry production is via explicit SQL. Whatever "OpenTelemetry support for PostgreSQL" means to an operator deploying this, it does not mean "I install this and my queries show up in my tracing backend."

 

Whatever this extension does or does not do in the face of postgres point releases is being discovered in production.

It's probably not as funny to you as it is to me, since I'm deep in the context of it. But this got me a much-needed laugh.

Prior art

Adversarial LLM code review is old news at this point. And a cursory search shows that, as expected, I'm far from the first person to come up with the idea of using LLMs prompted with prominent developer personas as a review aid:

  • https://mcpmarket.com/tools/skills/linus-code-taste-review
  • https://gist.github.com/afshawnlotfi/044ed6649bf905d0bd33c79f7d15f254
  • https://www.reddit.com/r/ClaudeAI/comments/1q5a90l/so_i_stumbled_across_this_prompt_hack_a_couple/
  • https://github.com/leopiney/linus-torvalds-skills

Probably more.

But ... I wanted to share the results, because it's a long time since I've spent most of an hour laughing at a code review.


¹: Yes, I'm very familiar with undefined behaviour in C; it was just the "nasal demons" phrase that got me.

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.