Skip to main content
Hermes Agentcomputer useAI automation

Hermes Agent Can Now Work Quietly in the Background

Hermes Agent has introduced background computer use for macOS, allowing the AI agent to click, type, and manage apps without moving the cursor or disrupting the user's desktop. For businesses, this is a major step towards practical AI automation for desktop tasks that require real interface interaction, not just chat commands.

Technical Context

I dug into the Hermes Agent documentation not out of curiosity, but with a practical question: is it suitable for real AI automation on the desktop, not just another five-minute demo? And that's where it got interesting. Their computer use feature works in the background: the cursor doesn't jump, focus isn't stolen, and macOS doesn't switch between Spaces.

Under the hood, it's not a simple HID emulator but event injection directly into the process via the accessibility SPI. This is a crucial detail. This approach is typically more stable on real interfaces, especially when the agent needs to click, type, scroll, and not disrupt a human using the same computer.

Installation is simple: hermes computer-use install, then grant Accessibility and Screen Recording permissions. After that, you can run it with the computer_use toolset. It covers all the basics: click, type, scroll, drag, and managing macOS applications.

What I liked most is its model neutrality. The feature isn't tied to one vendor: you can connect Claude, GPT, Gemini, and even open models via local vLLM endpoints. For AI integration, this is a great sign: you can build the architecture for the task, not for a specific model's marketing.

Another smart move: Hermes runs an OpenAI-compatible API on localhost. This means it can be integrated into existing pipelines, Open WebUI, or internal agent frameworks without a ton of glue code. The foundation is open-source, via cua-driver, and the computer use feature itself has been publicly available in Hermes since version 0.7.0, released in April 2026.

What This Means for Business and Automation

I see three practical scenarios here. First: automating legacy desktop systems that have no API but are business-critical. Second: background operational tasks where an agent gathers data, transfers fields, and runs reports without disturbing an employee. Third: hybrid processes where part of the logic lives in an LLM, and part is still locked in a GUI.

The winners are teams with a zoo of internal applications and expensive manual routines. The losers are solutions tied only to browser agents or fragile RPA that breaks with any window shift.

But there's a catch: the feature itself doesn't guarantee a reliable artificial intelligence implementation. You need permissions, session control, error handling, action limits, and proper observability. At Nahornyi AI Lab, we build exactly these kinds of things for clients: if your processes are stuck in a desktop interface, you don't have to wait for the perfect API. We can build a solid AI solution development around what you already have. If you're interested, my team and I can review your case and suggest where AI automation will actually pay off, and where it's better not to even start.

We have previously explored how a new level of agent autonomy, particularly when agents gain expanded computer interaction capabilities, introduces significant security challenges. It is crucial to understand the methods by which these advanced AI agents might attempt to bypass established safeguards, such as sandbox environments, through sophisticated command chaining.

Share this article