Register

Exoscale’s Managed Inference


Managed Inference API endpoints is our usage-based AI inference offering. It gives you direct access to a curated catalog of production-ready models via a managed API, enabling you to integrate AI features quickly.

There are no instances to size and no idle costs. You simply connect to our API and pay only for what you consume. It’s the perfect operating model for rapid prototyping, unpredictable traffic, and teams that want immediate AI capabilities on a secure, sovereign European cloud.

Why Choose Managed Inference on Exoscale?

EU & Switzerland availability

EU & Switzerland availability

All models are planned to be deployed in both the EU and Switzerland locations, giving you the flexibility to choose where your Managed Inference runs while maintaining full sovereignty and compliance.

Curated production-ready model catalog

Curated production-ready model catalog

Instead of hundreds of experimental models, access a focused catalog covering common enterprise AI needs such as general-purpose assistants, coding, small models, embeddings, OCR, and speech workloads.

Reliable performance by design

Reliable performance by design

Built on proven infrastructure and optimized for inference workloads, delivering consistent performance and low-latency responses for your AI applications.

Usage-Based Pricing

Usage-Based Pricing

Pay exclusively for what you use. No GPU sizing, no capacity planning, no infrastructure operations.

Clear model lifecycle

Clear model lifecycle

Get transparency on model availability and lifecycle so you can plan adoption, integration, and future updates with more confidence.

Integrated with the Exoscale platform

Integrated with the Exoscale platform

Connect AI as a Service with the rest of your sovereign Concrete AI portfolio, including Vector databases, Storage, and Kubernetes.

Shape Our Production-Ready Catalog

We’re building a curated, production-ready models catalog for our upcoming pay-per-use Managed Inference product. Tell us which models you need most and help us prioritize the roadmap.

Share Your Input

Designed for common AI workloads

Managed Inference API endpoints are optimized for fast adoption, experimentation, and workloads with variable demand where operating dedicated GPUs would add unnecessary complexity.

Rapid prototyping & experimentation


Test models, prompts, and use cases quickly without provisioning infrastructure. Ideal for early-stage projects and proof-of-concept.

Variable or unpredictable traffic


Handle bursty workloads and changing demand patterns with token-based consumption, without committing to fixed GPU capacity.

Multi-team or decentralized usage


Provide a shared AI capability across teams via API access, without requiring each team to manage infrastructure or deployments.

Gradual path to production


Start with managed models and move to dedicated deployments when workloads become predictable or mission-critical.

Combine Managed Inference with…

…other Exoscale products. Easily compatible with Dedicated Inference, pgvector and/or Vector search.

Dedicated Inference

Dedicated Inference

Fully managed, secure, and production-ready API endpoints for any open-weight AI model.

Discover
Managed pgvector

Managed pgvector

PostgreSQL with pgvector extension. Perfect for hybrid workloads needing both relational and vector data.

Discover
Managed Vector search

Managed Vector search

OpenSearch-based vector database. Optimized for pure AI workloads and large-scale semantic search.

Discover

Explore More Exoscale Services

Boost your Managed Inference solution by adding complementary offerings to help you achieve greater availability and performance, as well as expert support for any workload. Exoscale has the right service to support your project’s growth.

Kubernetes

Scalable Kubernetes Service

Deploy containerized applications on a production-ready Kubernetes cluster in under two minutes. Use SKS as the control layer for your virtual machine instances, with support for CLI, API, Terraform, and other DevOps tools.

Discover
object-storage

Simple Object Storage

Use a highly scalable and S3-compatible storage solution for unstructured data. Ideal for storing backups, logs, static assets, or media, fully integrated with Exoscale regions and access-controlled via API.

Discover
Support Plans

Support Plans

Get the help you need to run your infrastructure with confidence through flexible support plans, designed to provide expert guidance, faster response times, and dedicated assistance tailored to your business.

Discover

Frequently Asked Questions about Managed Inference

What is a Managed Inference service?

Managed Inference is Exoscale’s token-based AI inference service. It provides access to a curated catalog of production-ready models through a managed API, without requiring you to operate GPUs or model-serving infrastructure.

How is Managed Inference different from Dedicated Inference?

Managed Inference is designed for flexible, usage-based consumption (you pay per token) of a curated model catalog. Dedicated Inference is designed for customers who want to deploy and operate their own chosen model on managed, dedicated GPU-backed infrastructure.

Will I be able to choose from multiple model categories?

Yes. The Managed Inference service is planned to cover several common enterprise AI categories, including general-purpose models, coding, small models, embeddings, OCR, and speech workloads.

How does the token pricing work?

Pricing is calculated based on the number of tokens processed. A token represents a piece of a word (roughly 4 characters in English). You are billed based on input tokens (the prompt you send) and output tokens (the response generated by the model). Detailed pricing tiers per 1M tokens per model will be published upon launch.

Where will the Managed Inference solution run?

The service is planned for deployment on Exoscale infrastructure in Europe and Switzerland, with clear location transparency.