SLMs + Code Mode: make small models do big toolchains (on-prem, governed)
Small Language Models (SLMs) can orchestrate complex, multi-tool work if you stop asking them to improvise hop-by-hop. With Code Mode, the agent emits a plan, your model compiles that plan into code, and that code runs in isolates that can call only policy-approved MCP tools.

Short version: Small Language Models (SLMs) can orchestrate complex, multi-tool work if you stop asking them to improvise hop-by-hop. With Code Mode, the agent emits a plan, your model compiles that plan into code, and that code runs in isolates that can call only policy-approved MCP tools. Result: higher first-try success, fewer tokens, lower p95—while keeping keys and data local and avoiding vendor lock-in. Cloudflare popularized this "typed API + sandbox" pattern for MCP; Anthropic's engineering team explains why code execution with MCP reduces context bloat, latency, and errors.
Why SLMs struggle with tool calling—and what changes with Code Mode
SLMs are fast and economical, but they're not great at reasoning across multi-step chains in long prompts. Traditional sequential tool-calling forces the model to re-decide each hop, carry bulky intermediate data through the context window, and remember every tool schema in-prompt. That's a lot of cognitive load for a small model.
Code Mode flips the script. The SLM produces a compact function plan ("what to do"), then compiles that plan into code using the tool types—not pages of prose reasoning. Palma.ai executes that code inside isolates (no FS/Net), and the only capability exposed is calling allow-listed MCP tools through governed bindings. The orchestration is deterministic, not improvised, so even small models can finish complex workflows on the first try. Anthropic calls out these exact efficiency wins—load tools on demand, transform data in execution, return just the final result—which especially benefit SLMs.
Cloudflare + Anthropic set the pattern—Palma.ai makes it enterprise-ready
Cloudflare's "Code Mode: the better way to use MCP" shows developer ergonomics: expose MCP servers as typed APIs; let the model write code against those APIs; execute safely in a sandbox. Anthropic's "Code execution with MCP" explains the operational math: fewer tokens and round-trips when you keep intermediates out of the prompt and run complex logic in one step. Palma.ai brings both together for enterprises: isolates, policy at the tool boundary, RBAC/scopes/quotas, full audit, and FinOps—all deployable on-prem with BYO models to avoid lock-in.
How Palma.ai makes SLMs great at tool calling
In Palma.ai, the flow is simple: plan → compile to code → execute in isolates → call approved tools → return one governed result. Because the orchestration is compiled once and executed once, the SLM doesn't re-reason each hop. Prompts stay tiny (plan + needed tool types), retries collapse, and throughput stabilizes.
- Accuracy rises: a single, auditable unit of work encodes ordering, validation, and idempotency.
- Tokens drop: you don't drag every intermediate through the model.
- Latency improves: fewer model round-trips; p95 steadies as retries disappear.
- Privacy by locality: code generation can run with your on-prem SLMs; keys and data stay in your environment.
- Model freedom: start with SLMs for orchestration; call bigger models only where they add real value—no vendor lock-in.
Cloudflare's post demonstrates the code-first API approach; Anthropic details why it's efficient with MCP. Palma.ai operationalizes both with enterprise controls and on-prem portability.
Patterns where SLMs shine with Code Mode
SLMs excel when they compile and execute rather than improvise:
- Enrich → transform → validate → write: structured ETL across MCP tools with schema checks and idempotent writes.
- Human-in-the-loop: insert approval gates for sensitive steps; the compiled code resumes deterministically after approval.
- Batch + pagination: loops and backoff live in code, not in a chain of fragile prompts.
- Hybrid composition: the SLM orchestrates; specific steps call specialized vision/OCR/summarization models via governed tools.
These match Anthropic's guidance: execute complex logic in a single step, keep only what's needed in context, and rely on code execution with MCP for efficiency and reliability.
Governance, portability, and on-prem control (the enterprise bits)
For SLM-forward teams, the real unlock is control without lock-in. Palma.ai centralizes policy at the tool boundary, enforces RBAC/scopes/quotas, validates schemas, and records audit-grade traces of each compiled run. Because Palma.ai is self-hostable, you can deploy it on-prem or inside your VPC, integrate with IdP/SIEM/secret stores, and isolate teams or environments cleanly. And since orchestration rides on MCP, your tool catalog and chains remain portable across agent clients and model vendors.
What Palma.ai tracks (so you can prove the SLM gains)
Palma.ai ships FinOps-grade visibility:
- First-try success rate for multi-tool chains
- Tokens per completed task
- Time-to-finish (p50/p95) and queue vs. run time
- Cost per successful action (with showback/chargeback)
SLM orchestration typically improves all four because Code Mode eliminates hop-by-hop uncertainty and context bloat.
What is "Code Mode" for SLMs?
A pattern where an SLM compiles an agent's plan into code that runs in isolates, with the code allowed to call only policy-approved MCP tools. This reduces token use, context churn, and latency while increasing first-try success—especially with on-prem SLMs. Cloudflare popularized the API-plus-sandbox approach; Anthropic explains why code execution with MCP is more efficient. Palma.ai makes it enterprise-ready with governance, audit, FinOps, and on-prem portability.
FAQs
Can SLMs really replace larger models for orchestration?
For many multi-tool workflows, yes. With Code Mode, an SLM compiles a plan into deterministic code and lets the runtime manage loops, branching, and validation. Call bigger models only where they add value—Palma.ai enforces policy either way. Anthropic's article outlines the efficiency mechanics behind this.
How does Palma.ai keep SLM tool calls safe?
Execution happens in isolates with no filesystem or network access; the only capability is calling allow-listed MCP tools under RBAC/scopes/quotas with schema validation and full audit.
What if we're on-prem and can't send data to a cloud model?
Use client-side codegen with your on-prem SLMs. Keys and sensitive data stay local; Palma.ai's gateways and policies run inside your perimeter.
Are we locked into a vendor SDK?
No. MCP is an open protocol. Palma.ai is vendor-neutral and BYO-model—portable across clouds and agent clients.
Ready to make SLMs your default orchestrators?
- Book a 20-min demo — We'll run your workflow with an SLM in Code Mode and show the delta in first-try success, tokens, and p95 in Palma.ai.
- Explore Code Mode — Learn how Palma.ai compiles plans to code and enforces policy at the tool boundary.
- Further reading: Cloudflare, "Code Mode: the better way to use MCP"; Anthropic, "Code execution with MCP: building more efficient AI agents."
Read More

Enterprise Code Mode: avoid lock-in, keep control (the Palma.ai way)
Code Mode compiles an agent's plan into code, runs it in isolates, and lets that code call only policy-approved MCP tools. Palma.ai brings this to the enterprise—on-prem, vendor-neutral, with policy, audit, and FinOps at the tool boundary.

What is "Code Mode" for MCP—and why Palma.ai makes it enterprise-ready (and portable)
Code Mode turns multi-hop agent flows into deterministic code that runs in isolates and calls only approved MCP tools. The payoff is higher accuracy, lower token spend, and real governance. Palma.ai brings these ideas to the enterprise—on-prem, no vendor lock-in, with policy, audit, and cost controls at the tool boundary.
Ready to Future-proof your AI Strategy?
Transform your business with secure, controlled AI integration
Connect your enterprise systems to AI assistants while maintaining complete control over data access and user permissions.