Skip to main content
interpretabilityLLMmodular-arithmetic

Why LLMs Think in Circles

Transformers learning modular arithmetic spontaneously adopt circular, clock-like representations. For businesses, this isn't just an interpretability curiosity; it's a key insight. It shows that effective AI implementation relies on understanding the optimal internal structures that models discover on their own, which can lead to more robust and predictable systems.

The Technical Context

I love these kinds of findings from the interpretability field because they cut through all the hype about the 'magic' of models. The picture here is quite down-to-earth: in tasks like x + y mod p, a transformer doesn't have to memorize a table. It can find a more compact way to encode the information, and that happens to be circular geometry. For me, this is a direct hint as to why proper AI integration must consider a model's internal mechanics, not just the polished interface on top.

To put it simply, the remainders in modular arithmetic can be laid out as points on a circle using cosine and sine for each value. Then, modular addition becomes almost like a rotation by a certain angle. In other words, the model isn't actually 'remembering the answer' but is effectively turning a hand on an internal clock face.

And I wouldn't be quick to say it just copied this from its dataset. Mechanistic research on modular addition and grokking shows that this circular structure emerges in the activation and embedding space as an efficient computational framework. This is visible through PCA, SVD, and especially through analysis in the Fourier space, where the necessary frequencies begin to dominate.

I particularly like the observation about multiple 'clock faces.' It's like an ensemble of representations: the model maintains not one circle, but several frequency projections of the same value. When they converge, the confidence is higher. And yes, this is no longer just a 'well, it's kinda like a clock' metaphor but a functional description of how the network builds a stable answer.

An important nuance: this insight isn't entirely new; it's more of a good reason to revisit the 2024-2026 findings on grokking and modular arithmetic. But I find these retrospectives useful because they explain why a model sometimes finds a better algorithm on its own than what we might have hard-coded.

Impact on Business and Automation

My practical takeaway is simple. If a model can spontaneously discover the compact geometry of a task, then in AI automation, it's not always beneficial to suffocate it with rigid rules at every step. Sometimes it's better to give the architecture space to learn the correct internal representation and then wrap the system in validation logic.

The winning teams are those who build pipelines that check internal signals, not just the final accuracy. Those who view LLMs as black boxes and are surprised by strange failures on edge cases are the ones who lose out.

At Nahornyi AI Lab, we solve these kinds of problems in practice: deciding where to give a model freedom and where to constrain its logic with external tools, retrieval, or rules. If your AI solution development is hitting a wall due to model unpredictability, we can break down the task at the architectural level and build a system that performs reliably in a real-world process, rather than just 'guessing'. This is exactly where Vadym Nahornyi and the Nahornyi AI Lab can help—not with magic, but with solid engineering.

We've already explored how analyzing LLM graphs helps understand their 'extended thought' processes and optimize architecture. This complements our current understanding of how models can form internal representations for abstract concepts like modular arithmetic.

Share this article