Technical Context
I delved into the new part of Nikolai Yudina's work on manifold features and got stuck not on the pretty pictures, but on a more unsettling thought: it seems many geometric effects truly survive a change in architecture. For those doing AI implementation, this isn't an abstraction. It's a direct hint that some model behaviors can be caught and used before they break the production pipeline.
In the second part, the author discusses four algorithms, and replications on toy Mamba-2 and Kimi Linear immediately popped up in the discussion. I particularly liked the moment where the div-geometry in a Mamba-like implementation appears right at the SSM output, even before the gate, norm, and out projection. For add, the picture is different: there, the signal seems to be assembled not by a single simple circle, but by a mixture of frequencies.
This is where it gets interesting for more than just researchers. If the same patterns appear in Transformer, Mamba, and linear variations, the conversation shifts from "which architecture will win" to "where exactly is the feature encoded and how early can it be extracted." And yes, the author himself states directly in the thread: architecture doesn't matter, this works everywhere.
Another powerful piece in the discussion that I wouldn't skip is the observation about the "malleability" of models to conflicting knowledge. The old Qwen turned out to be more suggestible, GPT-3.5 was more stable, and Llama also failed. This means manifold features are now connected not only to interpretability but also to the topic of self-improvement without labeled data, knowledge mixing, and the stability of internal memory.
Impact on Business and Automation
For applied teams, the conclusion is simple: I would look at such works as a debugging tool, not as another beautiful theory. If I can see earlier where the necessary feature is born in a layer, I can more accurately design the AI architecture, filters, checks, and cheap probes instead of blindly fine-tuning.
Those who build complex pipelines with multiple models win, especially where reliability and explainability are important. Those who still believe that it's enough to "just get a better model" and it will magically solve problems with memory, bias, and unstable output lose out.
At Nahornyi AI Lab, we ground these concepts in practical scenarios: where to place an interpretability probe, when it's better not to touch the weights at all, and how to build automation with AI without extra retraining costs. If your model is behaving strangely and the production deadline is looming, let's analyze the architecture and build an AI solution development plan for your real task flow, not for someone else's demo.