What exactly surfaced and why I wouldn't jump to conclusions
I’ve reviewed the original thread and the leak itself in Telegraph. There's only one fact here: a user showed a snippet that looks like GPT-5.4's internal Chain-of-Thought, and it features a noticeable loop on the word 'maybe.' Everything else is currently a hypothesis, not an established fact.
And this is where it gets interesting. There is currently no official confirmation that GPT-5.4 has a special 'maybe maybe maybe' pattern as part of its training. I haven't seen anything like this in OpenAI's public materials, so I wouldn't treat this snippet as solid proof.
But as an engineer, I wouldn't dismiss it. Such leaks sometimes reveal not the 'truth about the model' but an artifact of a specific decoding mode, system prompt, safety wrapper, or an intermediate reasoning trace that was never supposed to be public.
What I see from a technical standpoint
I've looked into the available descriptions of GPT-5.4's Thinking mode and compared them with similar stories from other models. The picture that emerges is quite down-to-earth: the model can maintain context longer, build a plan for its response, and readjust its solution path along the way. This isn't magic, just a more saturated orchestration of reasoning.
I would interpret the repeating 'maybe' not as 'the model is doubting like a human' but as a consequence of one of its internal regulators. For example:
- a penalty for excessive confidence in intermediate steps;
- an attempt to keep multiple hypotheses open until verification;
- a glitch in outputting the hidden reasoning without proper post-processing;
- an artifact of safety-tuning, where the model is taught not to collapse uncertainty too early.
I've seen similar things in less glamorous forms when building AI solution architectures with multi-step answer verification. If you heavily penalize a system for false confidence, it starts to 'chew on' uncertainty. Sometimes it looks smart. Sometimes it looks like a broken internal monologue.
Another important point: OpenAI itself writes about the low controllability of GPT-5.4's internal reasoning. This means the model isn't particularly good at elegantly masking its thought process. If so, the strange repetitions in the leak might not be a signal of a new training philosophy, but simply raw thinking telemetry.
What this changes for business, and what it doesn't
For business, the leak itself changes almost nothing. You can't build a strategy on a screenshot from a thread. But it serves as a great reminder of something else: the model's hidden reasoning and its final answer are not the same thing, and they shouldn't be confused in a production environment.
If you're implementing AI automation in sales, support, internal search, or analytics, you don't need access to the model's inner thoughts. You need predictability: stable answers, validation, logging, fallback scenarios, and clear confidence boundaries. Otherwise, any 'maybe maybe maybe' will eventually reach a customer in the form of a financial loss.
At Nahornyi AI Lab, I usually build this into the pipeline level, rather than relying on a single smart model. Fact-checking as a separate step, routing complex cases, limiting autonomous actions, and human review where the cost of an error is high. This is what proper AI implementation is about, not blind faith in the magic of CoT.
Who benefits from models like this? Teams that know how to build AI integration as an engineering system: with metrics, tests, and observability. Who loses? Those who take a reasoning model and immediately let it make decisions without a safety net.
I, Vadim Nahornyi of Nahornyi AI Lab, wrote this analysis myself. Every day, I look at things like this not as news, but as details of real-world AI architecture and the development of AI solutions for business. If you want to discuss your case where careful AI automation is needed without production surprises—get in touch, and we'll break down the project layer by layer.