Origin Docs

Gateway Architecture and Inference Flow

How ORGN Gateway routes OpenAI-compatible requests through TEE or ZDR execution environments — control plane, data plane, and attestation for verifiable confidential inference.

ORGN Gateway's architecture provides verifiable confidential inference when you choose TEE models, and policy zero retention when you choose ZDR models — while keeping model selection firmly with you.

Gateway separates request orchestration, secure execution, and verification. It does not perform automatic model selection or dynamic routing. The model specified in your request is executed.

High-level components

Client application

Your application sends requests using an OpenAI-compatible API, explicitly specifying the model to use.

The client:

  • Selects the model in code or request parameters
  • Sends prompts and inference parameters
  • Receives model responses
  • Receives attestation metadata on TEE requests (not on ZDR)

Gateway does not modify, override, or substitute the requested model.

Gateway router (control plane)

The Gateway router is a secure orchestration layer, responsible for:

  • Authenticating requests (sk-ollm-* API keys)
  • Validating model availability and permissions
  • Enforcing security and execution constraints
  • Coordinating attestation data for TEE routes

The router does not choose models, does not inspect prompt or response content, and does not perform inference.

Execution environments (data plane)

Gateway routes requests to one of two execution environments depending on the model ID you send.

TEE models (near_*, phala_*) run on NEAR and Phala infrastructure inside hardware-backed Trust Domains:

  • Hardware-enforced memory isolation from host OS, hypervisor, and infrastructure
  • Encryption in use via Intel TDX confidential VMs and NVIDIA H100 GPU attestation
  • Cryptographic attestation receipt generated per request

ZDR models (vercel_*) run on Vercel AI Gateway infrastructure under zero data retention provider agreements:

  • No storage or logging of prompts and responses by Vercel or the underlying model provider
  • Broad access to frontier closed-weight and multimodal models
  • No hardware isolation or attestation receipt

The model identifier you specify determines which environment is used. Gateway does not select or override the execution path.

Attestation and verification layer

For TEE model requests, the execution environment produces attestation artifacts that prove:

  • The specified model ran inside a valid Trust Domain
  • The execution environment matched expected measurements
  • The response was generated within the trusted boundary

These artifacts are inspectable in ORGN Scanner, enabling independent verification of secure execution.

ZDR model requests do not produce attestation artifacts. Privacy is enforced through Vercel's zero data retention agreements with model providers.

Request lifecycle

Request submission

The client sends a request to https://api.gateway.orgn.com/v1, explicitly specifying the model ID (for example near_glm_4_7 or vercel_claude_sonnet_4_6).

Request validation

Gateway authenticates the request and verifies that the specified model is available and supported.

Inference execution

The request is forwarded to the selected model's execution environment — TEE (NEAR/Phala) or ZDR (Vercel).

Attestation generation (TEE only)

For TEE models, hardware attestation data is generated as part of execution.

Response delivery

The model output is returned. TEE responses include verification metadata for Scanner.

Gateway does not alter your model choice. For TEE routes, prompts and responses remain inside the Trust Domain during inference — Gateway does not access plaintext inference content outside that boundary.

Trust boundaries and guarantees

GuaranteeDetail
Model choice is user-controlledNo automatic routing or model substitution
Gateway does not retain inference contentZero retention for prompts and outputs
Security depends on model tierTEE: hardware + attestation. ZDR: policy retention via Vercel

This architecture lets teams run sensitive LLM workloads with full control over model selection, choosing cryptographic proof (TEE) or frontier catalog access (ZDR) per request.

On this page