Agentic Layers · Gateway

Your agent's whole pipeline. One gateway.

Inference, memory, tools, verifiers, traces — every call your agent makes runs through Mubit Gateway. One client, not five vendor SDKs stitched together.

Get started → Docs

Select

gpt-5.5
gpt-5.5-pro
claude-opus-4-8
gemini-3.5
llama-4

⏎ send

In practice / 01

One product where your agent's pipeline actually lives.

01 · ONE PIPELINE, NOT FIVE

Pick your stack. One SDK runs all of it.

Pick your models, point at your tools, and your agent has inference, memory, verifiers, and traces from the first call. Swap models, memory backends, or tool catalogs later — the agent code doesn't change.

PROVIDERS

OpenAI

Anthropic

Google

Cohere

Microsoft

02 · LEARN BETWEEN RUNS

Lessons from the last run, applied to the next call.

When a run finishes, the lesson is stored. The next call reads it before generating. No vector DB to provision, no glue between your memory store and your model client.

LAST RUN MEMORY NEXT CALL RUN 01 LESSONS RUN 02

03 · ONE TRACE PER RUN

The full agentic pipeline in a single timeline.

Every LLM call, memory read, tool invocation, and verifier outcome lands in the same run trace. No correlating spans across Helicone, LangSmith, and Datadog — your agent's full causal chain is one query.

"found it without three dashboards"

Why it matters / 02

Less to integrate. More coordination.

Inference, memory, tools, verifiers, and traces share a code path. Each piece has access to the others.

Agents that learn between calls.

Each completion writes a lesson to memory. The next call retrieves it. No app code in between.
One trace covers the whole run.

LLM call, memory read, tool invocation, verifier outcome — every step lands in the same trace.
Providers you can swap without a migration.

Switch models, memory backends, or tool registries behind the same client. Your agent code doesn't change.
One surface for the whole stack.

One auth, one bill, one SLA. One place to look when something breaks.

FAQ

How is this different from a model router like OpenRouter or LiteLLM?

Model routers connect one piece of the stack — your app to multiple LLMs. Gateway is the seam for the whole agentic pipeline: inference, memory, tools, verifiers, traces, and audit, all through one SDK and one observability surface. Routing across LLM providers is one feature, not the product.

What exactly does Gateway consolidate?

Five things that usually live in five products: LLM inference (OpenAI, Anthropic, Google, etc.), execution memory (lesson capture + retrieval), tool registry + invocation, verifier outcomes, and the run-level trace that ties it all together. One client surfaces all of them.

Does Gateway add latency?

Typical Gateway overhead is 6–9ms at p95 — negligible against any LLM round-trip. Requests stream through; we don't buffer the response. Memory writes happen async after the response is returned.

Can I keep my existing LLM keys?

Yes. Bring-your-own-keys is the default for every provider — Gateway uses your keys, so you keep your existing rate limits, billing, and SLA terms. Mubit-managed keys are available for providers we resell.

Does this replace my vector database?

It can. Mubit Memory ships with managed embeddings and retrieval — point Gateway at your agent ID and you're done. Or run Gateway as a passthrough to your existing vector store; the SDK doesn't care.

What about my existing observability tools?

Gateway exports OTel spans natively — pipe them to Datadog, Honeycomb, LangSmith, or your own collector. The unified Mubit trace is an additional surface, not a replacement.

Can I self-host the Gateway?

Yes. Run the gateway plane in your own VPC — same wire format, same SDK, same APIs. Useful when traffic must not leave your network, or you want all keys to stay in-house.

# Gateway — Mubit

Your agent's whole pipeline. One gateway.

Inference, memory, tools, verifiers, traces — every call your agent makes runs through Mubit Gateway. One client, not five vendor SDKs stitched together.

## In practice

### Pick your stack. One SDK runs all of it.
Pick your models, point at your tools, and your agent has inference, memory, verifiers, and traces from the first call. Swap models, memory backends, or tool catalogs later — the agent code doesn't change.

### Lessons from the last run, applied to the next call.
When a run finishes, the lesson is stored. The next call reads it before generating. No vector DB to provision, no glue between your memory store and your model client.

### The full agentic pipeline in a single timeline.
Every LLM call, memory read, tool invocation, and verifier outcome lands in the same run trace. No correlating spans across Helicone, LangSmith, and Datadog — your agent's full causal chain is one query.

## Why it matters

- Agents that learn between calls. Each completion writes a lesson to memory. The next call retrieves it. No app code in between.

- One trace covers the whole run. LLM call, memory read, tool invocation, verifier outcome — every step lands in the same trace.

- Providers you can swap without a migration. Switch models, memory backends, or tool registries behind the same client. Your agent code doesn't change.

- One surface for the whole stack. One auth, one bill, one SLA. One place to look when something breaks.

## FAQ

**How is this different from a model router like OpenRouter or LiteLLM?**
Model routers connect one piece of the stack — your app to multiple LLMs. Gateway is the seam for the whole agentic pipeline: inference, memory, tools, verifiers, traces, and audit, all through one SDK and one observability surface. Routing across LLM providers is one feature, not the product.

**What exactly does Gateway consolidate?**
Five things that usually live in five products: LLM inference (OpenAI, Anthropic, Google, etc.), execution memory (lesson capture + retrieval), tool registry + invocation, verifier outcomes, and the run-level trace that ties it all together. One client surfaces all of them.

**Does Gateway add latency?**
Typical Gateway overhead is 6–9ms at p95 — negligible against any LLM round-trip. Requests stream through; we don't buffer the response. Memory writes happen async after the response is returned.

**Can I keep my existing LLM keys?**
Yes. Bring-your-own-keys is the default for every provider — Gateway uses your keys, so you keep your existing rate limits, billing, and SLA terms. Mubit-managed keys are available for providers we resell.

**Does this replace my vector database?**
It can. Mubit Memory ships with managed embeddings and retrieval — point Gateway at your agent ID and you're done. Or run Gateway as a passthrough to your existing vector store; the SDK doesn't care.

**What about my existing observability tools?**
Gateway exports OTel spans natively — pipe them to Datadog, Honeycomb, LangSmith, or your own collector. The unified Mubit trace is an additional surface, not a replacement.

**Can I self-host the Gateway?**
Yes. Run the gateway plane in your own VPC — same wire format, same SDK, same APIs. Useful when traffic must not leave your network, or you want all keys to stay in-house.

Your agent's whole pipeline. One gateway.

One product where your agent's pipeline actually lives.

Pick your stack. One SDK runs all of it.

Lessons from the last run, applied to the next call.

The full agentic pipeline in a single timeline.

Less to integrate. More coordination.

Agents that learn between calls.

One trace covers the whole run.

Providers you can swap without a migration.

One surface for the whole stack.

Run your agent's whole pipeline through Gateway.