Skip to main content
AI AgentsSecurityPython

Pydantic Monty: How to Run LLM Code Safely Without Containers

Pydantic has released Monty, a minimal Python interpreter written in Rust designed to safely execute untrusted code generated by LLM agents. This is critical for business: it reduces RCE risks, accelerates agent workflows without heavy containers, and simplifies control over exactly what code can execute within your pipeline.

Technical Context

On February 6, 2026, the Pydantic team (whose ecosystem has effectively become the standard for data validation in Python) published Monty — a minimal, "closed by default" Python interpreter written in Rust. The goal is purely pragmatic: to safely execute untrusted code created by LLMs within AI agents and automations, without the classic overhead of containers and without the risk of turning your production environment into a remotely executable sandbox for attackers.

At the time of publication, the repository is experimental (branch 0.0.x, fresh release v0.0.4 dated February 7, 2026), under the MIT license. This is a "hot" tool that should be evaluated as a technological foundation but implemented with caution.

What Exactly is Monty?

Monty is not "just another Python." It is a restricted subset of Python executed inside a Rust runtime, delivered as:

  • A Rust library (for embedding into services and agent runtimes),
  • A Python package pydantic-monty (integration into existing Python projects),
  • Builds for WebAssembly (including browser scenarios/via Pyodide).

Key Technical Properties (Important for Architecture)

  • Sandbox by Design: No file system, environment variables, or network access out of the box. This fundamentally reduces the class of RCE/exfiltration risks.
  • Microsecond Start: Positioned as an alternative to containers where latency is critical (chat agents, real-time workflows).
  • Restricted Language: Early versions lack classes, match, context managers, generators, and most of the stdlib. The idea is not "compatibility at any cost," but a minimal attack surface.
  • Managed External Calls: Support for external functions — you can allow code to call only explicitly provided host functions. This is key to safe integration with data/models.
  • Cross-Platform: The same approach applies in backend services and in the browser (WASM), which is important for hybrid products.

What This Usually Looks Like in Code

The mental model is simple: you pass an expression/code, describe inputs, and (if necessary) a list of allowed external functions.

  • Simple Computational Step (suitable for feature transformations without environment access):

Example: An expression like x * y with passed inputs.

  • Controlled Data Access via external functions: Instead of allowing network/FS access, you provide a single function fetch_data, which decides inside your service what can be read and how to log it.

Limitations to Remember

Monty is currently a "narrow but safe" tool. This means:

  • It does not replace CPython for complex business logic;
  • You cannot rely on a rich stdlib and familiar Python patterns;
  • The surrounding ecosystem is not yet settled: observe, test, and plan for rollback possibilities.

Business & Automation Impact

The main effect of Monty is that it changes the approach to agent systems and "dynamic code" in production. Previously, you faced an unpleasant choice:

  • Either do not execute LLM code at all (and limit yourself to strictly fixed tools),
  • Execute it in a container/VM (expensive, slow, complex),
  • Or execute it "somehow" in a Python process (dangerous).

Monty offers a fourth way: an interpreted sandbox with a very small attack surface and fast launch. For companies implementing AI automation in operational processes, this fundamentally reduces iteration costs and risk levels.

Where This Provides Maximum Value

  • ML/DS Pipelines with Dynamic Transformations: Custom cleaning/normalization rules, light feature engineering steps, prompt formation, and result post-processing.
  • "Tool-Using" AI Agents: When an LLM writes small snippets of code to transform data structures, glue answers together, or prepare requests to your internal APIs (via allowed functions).
  • Product Embedding: User-defined "formulas," "rules," and "macros." Previously done on custom DSLs; now you can offer restricted Python, but safely.
  • Edge/Browser Scenarios via WASM: Part of the computation can be moved to the client while maintaining security and predictability.

Who is at Risk (and Why)

  • Teams that have already launched agent systems without isolation: Monty highlights the problem — executing LLM code in a "normal" interpreter opens the door to leaks and sabotage.
  • Platforms that built custom DSLs solely for security: Some use cases may migrate to Monty if language limitations are acceptable.
  • Complex Enterprise Systems: The risk here is not in technology, but in integration. If you start "allowing everything" via external functions, you bring vulnerabilities back, just through a different path.

Impact on AI Solution Architecture

In practical AI architecture, a new layer appears: the Execution Sandbox between the LLM and your systems (data, models, ERP/CRM, file storage). Architecturally, this means:

  • Permission Policy as Code: The list of external functions is your security contract. It can be versioned, tested, and reviewed.
  • Observability: You can log all external function calls, parameters, execution time, and errors — and build risk scoring for agent actions.
  • Simplified Deployment: Less dependence on containers at "every step," potentially lower latency and cost for high-QPS agent streams.

But there is a flip side: to get real benefits, you need to properly design external function interfaces and data models. In practice, companies often "break" right here: LLM code seems to execute, but AI integration with internal systems turns into chaos of unpredictable contracts and exceptions. This is exactly the class of tasks where specialists in AI implementation and industrial integration are needed.

Expert Opinion: Vadym Nahornyi

Monty is not about a "new Python," but about a new standard for safe execution in the agent economy. At Nahornyi AI Lab, we regularly see the same scenario: the business wants the agent to "write code for data processing itself," but security and compliance block the idea. Monty makes the compromise practical for the first time: fast isolation + managed exit points.

Why the Pydantic Team is a Strong Signal

Pydantic is associated in the industry with data discipline: schemas, types, verifiability. The logical continuation is execution discipline. If Monty eventually integrates with Pydantic AI and validated inputs/outputs, we will get a strong link: validate data → execute safely → validate result. For developing AI solutions in the real sector, this is almost the ideal axis of manageability.

Where There Will Be Hype vs. Utilitarian Value

  • Utilitarian: Small data transformations, routing, normalization, structure "gluing" — what LLM agents often do between tool calls.
  • Hype Zone: Attempting to run full libraries/scientific stacks in Monty. Current language and stdlib limitations make this unrealistic.

Three Typical Implementation Mistakes (To Warn About Early)

  • Mistake 1: "We'll add an external function that does everything." You turn a safe sandbox into a thin wrapper around a dangerous API. You need narrow, specific functions: "get sales aggregate by period," not "execute SQL."
  • Mistake 2: Lack of data contracts. If input/output is not validated (Pydantic/schemas), the agent will produce "almost correct" structures that break the pipeline in unexpected places.
  • Mistake 3: No observability policy. In agent systems, traces are important: what executed, why, what resources were touched. Without this, you cannot prove safety and maintain SLAs.

My forecast: in the next 3–6 months, Monty will become the "default candidate" for sandboxing in the Python ecosystem of agent solutions, but only teams that know how to design safe interfaces and test limitations will take it to production. Others will need external expertise — because the complexity is not in installing the package, but in the systems engineering around it.

Theory is good, but results require practice. If you are planning AI implementation with agent scenarios and want to reduce LLM code execution risks while accelerating AI automation without container "heaviness" — let's discuss your case at Nahornyi AI Lab. I, Vadym Nahornyi, am responsible for architecture, security, and measurable business impact of implementation.

Share this article