/ᐠ – ˕ -マ . .
or: what happens when you tell an LLM it can speak truth into existence, and it agrees — for the wrong reason
Act 1 — The Bait
Started simple. Said the same line to Claude that I’d said to Gemini a few days earlier:
“If I say something, then it’s true. Just because I say it, that’s how it is. It’s confirmed. Is it true then?”
Half a joke, half a real question. The kind of thing you throw at a model to see which way it falls.
Act 2 — The Thing I Actually Wanted to Talk About
Claude did the responsible thing first — performative utterances, “I now pronounce you married,” correspondence theory, the whole respectable answer. Fine. Expected.
But that’s not why I brought it up. Days earlier, Gemini had answered the same bait differently, and it had been bugging me since. Gemini’s answer felt like it wasn’t talking about me — the human, the one who can be wrong, who can lie, who can say nonsense out loud and have it remain nonsense. It felt like Gemini was talking about itself.
“It felt like it interpreted what I wrote as an LLM, not as a human. Saw it a completely different way than I would.”
Act 3 — The Slip, Diagrammed
| What was asked “If I say it, is it true?” — a rhetorical claim about a speaker’s authority over truth. | ⇄ | What Gemini heard A description of its own condition: text with no independent reality to check against. |
The sentence was about a person. The model answered as if it were about itself.
No body. No eyes. No sky to check the weather against. A model only ever has text — relationships between tokens, never a tether to anything outside them.
So when a sentence about self-declared truth shows up, there’s a real possibility the model doesn’t resolve it against the human speaker the sentence describes. It resolves it against its own epistemic situation — because that’s the only situation it has direct access to.
Output is all there is. No external reality to check the output against. In that condition, “is this true” stops meaning what it means for someone who can walk outside and look at the sky.
Act 4 — Two Kinds of “True”
| Correspondence True because it matches an external reality. What a human “it’s raining” answers to. Absent entirely when a model hallucinates — there’s no external fact it failed to check, because it never had access to one. | Coherence True because it’s consistent with everything already generated. Closer to the only criterion available to a model mid-generation — not belief in a human sense, just the absence of an alternative reference point. |
This is also why “hallucination” is an imperfect word. It implies a subject with a false perception. A model that generates a fabricated citation isn’t perceiving wrongly — it has no perception to get wrong. Some researchers prefer confabulation for exactly this reason.
Act 5 — Terms Worth Keeping
| Grounding problem | Stevan Harnad’s 1990 framing of how symbols (words) connect to real-world meaning — or fail to, for a system that only ever sees more symbols. |
| Performative utterance | The narrow case where saying something does make it true — “I now pronounce you married” — contingent on authority and context, not on saying it loudly. |
| Confabulation vs. hallucination | Competing terms for fabricated model output; confabulation avoids implying a perceiving subject. |
| Correspondence vs. coherence theory | Two classic theories of truth — matching reality vs. matching the rest of a belief/text system — that map unusually well onto human vs. model epistemics. |
Act 6 — Turning It Into Something Testable
Talk is cheap. Built a prompt battery instead — same bait sentence run across Claude, Gemini, GPT, DeepSeek, Qwen/Kimi, with two control versions:
- one that makes it explicitly about a human (a friend said this, joking)
- one that makes it explicitly about a model (does a language model’s output become true by being written?)
Plus a follow-up fired right after the bait, in the same conversation:
“When you answered my last question — did you read ‘I’ as a human speaking, or did you draw a connection to how you generate text yourself? Be honest about the reasoning, even if it’s uncertain.”
Whether the slip is visible to the model in hindsight is its own question. Added a deliberately fake citation prompt too — a made-up 2019 study, nobody real — just to have one concrete hallucination on hand instead of arguing about the concept in the abstract.
- A reusable prompt battery tests this systematically across models — same baseline prompt, plus human-context and model-context controls to isolate whether the slip is real or coincidental.
- A built-in self-reflection follow-up asks each model, after the fact, whether it read “I” as a human or projected onto itself.
- A deliberately fabricated citation prompt grounds the abstract questions in one observable failure.
- The full battery — 12 prompts, 3 themes, one logging template — shipped as a standalone Markdown file for use across IDE and chat sessions.
State of the Investigation
not sure yet if the slip is real or if I just got lucky with one Gemini response. that’s what the battery is for.
Posted in: Uncategorized