IBM z17 with AI Accelerators: Reviving COBOL and Shifting Budgets

IBM moved AI inference directly into the mainframe with z17. Telum II provides on-chip acceleration, while Spyre (coming Q4 2025) adds generative AI capabilities. For business, this is critical: running models on transactional data without data migration or rewriting COBOL reduces latency and operational risks significantly.

Technical Context

I view IBM z17 not as “just another hardware release,” but as a highly pragmatic move: IBM is cementing AI directly inside the mainframe, where money, risks, and regulatory compliance reside for many companies. This aligns well with discussions estimating IBM's COBOL share at around 20–25%—even if this is a field estimate, it explains the motivation.

The key element of z17 is the Telum II processor with second-generation on-chip AI accelerators. According to IBM's official materials, the focus is on real-time performance: over 450 billion inferences per day and sub-1ms response times for transactional scenarios. For me, this is a signal that IBM is targeting “in-stream” tasks rather than offline analytics.

Technically, I like the vector towards dense integration: Telum II claims 8x more AI cores compared to the previous generation, increased cache, and higher frequency. At the same time, IBM emphasizes efficiency (lower power consumption, compact core), which directly impacts TCO in the data center.

The second piece of the puzzle is the optional IBM Spyre Accelerator in a PCIe card format, announced for availability in Q4 2025. I interpret Spyre as an attempt to “pull” generative AI (LLMs/agents) into a space previously dominated by scoring models and detectors. Importantly, this is not a replacement for Telum, but an expansion of the stack for different model classes.

Impact on Business and Automation

The most valuable aspect for enterprise in z17 isn't core count numbers, but the ability to perform inference “where the transaction lies,” without hauling data to a separate AI circuit. When I design AI implementations in banking, insurance, or logistics, it is precisely data transfer and security approvals that most often kill timelines. Here, IBM is selling a reduction of this pain.

Companies with a large existing z/OS landscape and heavy reliance on COBOL, CICS, DB2, and strict SLAs stand to win. For them, AI automation stops being a project of “rewrite half the system and build a data lake” and becomes a project of “embed inference into the transactional loop and add model quality control.”

Those who hoped that simply “connecting a cloud LLM” would suffice to rapidly build AI automation around legacy systems will lose out. In practice, such schemes hit walls regarding latency, egress/ingress costs, PII/PCI compliance, and the fact that business logic remains in COBOL. z17 reduces the sense in “outsourcing the brain,” because the brain can now be kept next to the ledger.

In our projects at Nahornyi AI Lab, I see typical requests: anti-fraud, anomaly detection, next-best-action, and assistants for operators/engineers. With z17, I would more frequently plan for a two-layer AI architecture: fast on-chip inference for transactions + a separate circuit for training and heavy LLM tasks that doesn't disrupt the main flow.

Strategic Vision and Deep Dive

My non-obvious conclusion: IBM is turning the mainframe from “legacy that slows down innovation” into a “platform that dictates data access rules.” If inference is done on the platform, then control over data, policies, and observability also remains on the platform. This changes IT's negotiating position regarding security and audit.

I also expect that in 2026–2027, the market will start dividing not by “who has an LLM,” but by “who has inference built into the critical path without latency compromises.” For transactional businesses, this means: a model that cannot stably work within the SLA is not considered implemented, even if it looks beautiful in a pilot.

If you are viewing z17 as a chance to “enter IBM,” I would act through applied use cases rather than hardware procurement. At Nahornyi AI Lab, I usually start with a process map: where is the event, where is the decision, what is the cost of a millisecond. Then I fix requirements for data and inference placement: on-platform, near-platform, or cloud.

And lastly: integrating artificial intelligence into a mainframe does not cancel engineering discipline. You will still need MLOps/LLMOps, drift monitoring, data shift testing, prompt/context control for assistants, and a rollback plan. It's just that now, all of this can be built closer to the source of truth—the transactions.

This analysis was prepared by Vadim Nahornyi—Lead Expert at Nahornyi AI Lab on AI architecture, implementation, and AI automation in the real sector. I invite you to discuss exactly how to embed inference and assistants into your z/OS landscape or hybrid scheme without risking SLAs and compliance—write to me, and we will assemble a target architecture and implementation plan for your processes.

Share this article

Twitter/X LinkedIn Telegram

IBM z17 with AI Accelerators: Reviving COBOL and Shifting Budgets

Technical Context

Impact on Business and Automation

Strategic Vision and Deep Dive

More News

LFM2.5-8B-A1B: How to Stop Infinite Loops

Altman's Tweet Is Here, But There is No Release in Sight