Enterprise AI gateway

Argy LLM Gateway, the multi-provider AI gateway

One secure, OpenAI-compatible entry point to reach LLM providers, enforce policies (PII, secrets, prompt injection), route with automatic fallbacks, audit every request, and keep costs predictable with quotas and credits.

Start for free Read the docs

OpenAIAnthropicGoogleMistralxAI

Useful links: deployment options · security model · Argy Code · Argy Chat · Pricing · Shadow AI

Argy LLM Gateway diagram: policies, data plane, and LLM providers

The LLM Gateway is the central layer: Argy Chat, Argy Code, and agents call it, and all policies are enforced there.

The diagnosis

Without a gateway, each team ships its own SDK, keys, logs, and limits. You get duplication and blind spots.

Vendor dependency

Lock-in to one provider, forced pricing, and costly switching as the market evolves.

Security & compliance

Sensitive data sent to models without filtering and without an audit trail security teams can rely on.

Operational complexity

Divergent SDKs and configs per team, no consolidated view of cost and usage.

What Argy LLM Gateway does

The governed entry point to LLMs, without client-side keys and with predictable costs.

OpenAI-compatible API

OpenAI-compatible endpoints: /v1/chat/completions, /v1/responses, /v1/embeddings, /v1/agent/steps, /v1/models. Plug in and go.

Multi-provider orchestration

Unify access to OpenAI, Anthropic, Google, Mistral, xAI behind one API and avoid vendor lock-in.

Intelligent routing + fallback

Auto/Quality/Budget/Explicit strategies, capability-based selection (chat/code/agent/RAG/OCR), and multi-level fallback chains.

Configurable security

PII/secrets/prompt injection/forbidden topics filters, masking or blocking options, plus outbound filtering on outputs.

Encrypted auditing

Requested/effective model, consumption, enforced policies, RAG usage, and latency, with encryption and export options.

Quotas, credits and limits

Tenant and org budgets, usage limits, and alerts to keep costs predictable.

Integrated RAG

RAG (Retrieval-Augmented Generation) augments prompts with passages retrieved from your documents to deliver grounded, context-aware responses. grounds answers on your documents, with access control and traceable citations.

Intelligent routing

The routing engine selects the best model based on strategy (quality, cost, latency) and required capabilities, with automatic fallback when providers fail.

Auto

Balances quality, cost, and latency for standard production use.

Quality

Prefers premium models for complex analysis and audits.

Budget

Uses more economical models for high-volume workloads.

Explicit

User-selected model for benchmarks or specific cases.

Capability-based selection

Routing only selects models compatible with the requested capability.

chatcodeagentragocr

Security & data protection

Every request goes through an inbound filtering pipeline before reaching providers. Outputs can be analyzed as well.

No filtering

Configurable option when you don’t want filters applied.

Masking

Replace sensitive values with markers to prevent leakage.

Reversible tokenization

Minimize exposure while keeping a smooth end-user experience.

Blocking

Block the request when a rule is violated or sensitive data is detected.

Supported filters

PII, secrets, prompt injection, and forbidden topics.

Built for demanding environments: GDPR controls and enterprise compliance needs.

Audit & traceability

Every call is traceable. Goal: visibility, cost allocation, and anomaly detection.

What gets recorded

Encrypted audit trail, with export options depending on your needs.

• Who requested what (tenant, user) and when.
• Requested model vs effective model (routing + fallback).
• Consumption (tokens, credits) for budget control.
• Enforced policies (filters, RAG) for auditable evidence.
• Latency and errors for observability and continuous improvement.

Extended capabilities

Beyond chat: a gateway that supports practical enterprise AI workloads.

Image generation

Generation parameters (size, quality, count) with cost tracking.

OCR

Text extraction from images and documents, integrable into workflows.

Embeddings

Vectorization for semantic search, with fallback across providers.

Agent support

Multi-step reasoning and tools for governed agents and integrations.

Smart cache

Reduce latency and costs on repeated requests, without sacrificing governance.

Multi-tenancy, BYOK, and deployment

Tenant isolation, per-tenant provider keys (BYOK), isolated quotas, and an encrypted audit trail.

SSO (OIDC)

Enterprise single sign-on via OpenID Connect (OIDC).

PAT

Tokens for agents, CI/CD, and automation.

Signed calls

Signed inter-service calls for internal security.

SaaS or on-premises

Shared or dedicated gateway depending on sovereignty constraints.

Flexible deployment

SaaS (shared gateway) or on-premises (dedicated gateway). Docker/Kubernetes-ready deployments.

OpenAI-compatible API

Plug in your existing SDK. Change two lines.

The LLM Gateway exposes the same interface as the OpenAI API. Just replace the base URL and API key — the rest of your code is unchanged.

Request — Python OpenAI SDKPOST /v1/chat/completions

from openai import OpenAI

client = OpenAI(
    base_url="https://llm.argy.cloud/v1",
    api_key="<ARGY_PAT>"          # your Argy PAT
)

response = client.chat.completions.create(
    model="argy-gateway/claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Explain the Argy virtual token concept."
        }
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

JSON response — OpenAI format200 OK

{
  "id": "chatcmpl-argy-a1b2c3d4e5",
  "object": "chat.completion",
  "created": 1741392000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An Argy virtual token is a normalized
AI consumption unit. 1M virtual tokens
≈ 3.8M real provider tokens, regardless
of which model is used."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 44,
    "total_tokens": 60,
    "argy_virtual_tokens": 60,
    "argy_real_tokens_equivalent": 228
  },
  "x-argy-provider": "anthropic",
  "x-argy-model-routing": "auto",
  "x-argy-audit-id": "aud_9kXmP2qR7vL"
}

Argy-specific response fields

Beyond the standard OpenAI format, every response includes governance metadata usable in your logs or dashboards.

Field	Description
`usage.argy_virtual_tokens`	Argy virtual tokens consumed (quota unit)
`usage.argy_real_tokens_equivalent`	Estimated real tokens on the provider side
`x-argy-provider`	Provider actually used (anthropic, openai, google…)
`x-argy-model-routing`	Routing strategy applied (auto, quality, budget)
`x-argy-audit-id`	Traceability ID linking to the full audit log

FAQs

Common questions about the LLM Gateway.

Why an LLM Gateway in an enterprise?⌃

To centralize LLM usage behind a single API and apply governance and visibility: multi-provider routing, quotas, security filters, and full auditability.

Is it compatible with existing OpenAI SDKs?⌃

Yes. The LLM Gateway exposes an OpenAI-compatible API: POST /v1/chat/completions, POST /v1/responses, POST /v1/embeddings, GET /v1/models. Switch integrations with zero code changes.

How is sensitive data protected?⌃

With an inbound filtering pipeline (PII, secrets, prompt injection, forbidden topics) and configurable modes (masking, reversible tokenization, blocking), plus outbound filtering on model outputs.

What is recorded in the audit trail?⌃

Every request is traced end-to-end (tenant, user, requested/effective model, tokens, credits, applied filters, RAG usage, latency), with encryption and export options.

Can it run on-premises?⌃

Yes. The LLM Gateway supports SaaS (shared) and on-premises (dedicated) deployments depending on sovereignty and network constraints.

European SaaS

GDPR compliant & hosted in EU

No Lock-in

Built on open standards

API-First

Everything is automatable

Ready to get started with Argy?

Start with the Free plan. Upgrade when you're ready, or contact us for an enterprise rollout.

Start for free Explore automations