Managed Inference: Pay-As-You-Go AI Models in Europe

Exoscale’s Managed Inference

Managed Inference API endpoints is our usage-based AI inference offering. It gives you direct access to a curated catalog of production-ready models via a managed API, enabling you to integrate AI features quickly.

There are no instances to size and no idle costs. You simply connect to our API and pay only for what you consume. It’s the perfect operating model for rapid prototyping, unpredictable traffic, and teams that want immediate AI capabilities on a secure, sovereign European cloud.

Why Choose Managed Inference on Exoscale?

EU & Switzerland availability

All models are planned to be deployed in both the EU and Switzerland locations, giving you the flexibility to choose where your Managed Inference runs while maintaining full sovereignty and compliance.

Curated production-ready model catalog

Instead of hundreds of experimental models, access a focused catalog covering common enterprise AI needs such as general-purpose assistants, coding, small models, embeddings, OCR, and speech workloads.

Reliable performance by design

Built on proven infrastructure and optimized for inference workloads, delivering consistent performance and low-latency responses for your AI applications.

Usage-Based Pricing

Pay exclusively for what you use. No GPU sizing, no capacity planning, no infrastructure operations.

Clear model lifecycle

Get transparency on model availability and lifecycle so you can plan adoption, integration, and future updates with more confidence.

Integrated with the Exoscale platform

Connect AI as a Service with the rest of your sovereign Concrete AI portfolio, including Vector databases, Storage, and Kubernetes.

Shape Our Production-Ready Catalog

We’re building a curated, production-ready models catalog for our upcoming pay-per-use Managed Inference product. Tell us which models you need most and help us prioritize the roadmap.

Share Your Input

Rapid prototyping & experimentation

Test models, prompts, and use cases quickly without provisioning infrastructure. Ideal for early-stage projects and proof-of-concept.

Variable or unpredictable traffic

Handle bursty workloads and changing demand patterns with token-based consumption, without committing to fixed GPU capacity.

Multi-team or decentralized usage

Provide a shared AI capability across teams via API access, without requiring each team to manage infrastructure or deployments.

Gradual path to production

Start with managed models and move to dedicated deployments when workloads become predictable or mission-critical.

Combine Managed Inference with…

…other Exoscale products. Easily compatible with Dedicated Inference, pgvector and/or Vector search.

Dedicated Inference

Fully managed, secure, and production-ready API endpoints for any open-weight AI model.

Discover

Managed pgvector

PostgreSQL with pgvector extension. Perfect for hybrid workloads needing both relational and vector data.

Discover

Managed Vector search

OpenSearch-based vector database. Optimized for pure AI workloads and large-scale semantic search.

Discover

Explore More Exoscale Services

Boost your Managed Inference solution by adding complementary offerings to help you achieve greater availability and performance, as well as expert support for any workload. Exoscale has the right service to support your project’s growth.

Scalable Kubernetes Service

Deploy containerized applications on a production-ready Kubernetes cluster in under two minutes. Use SKS as the control layer for your virtual machine instances, with support for CLI, API, Terraform, and other DevOps tools.

Discover

Simple Object Storage

Use a highly scalable and S3-compatible storage solution for unstructured data. Ideal for storing backups, logs, static assets, or media, fully integrated with Exoscale regions and access-controlled via API.

Discover

Support Plans

Get the help you need to run your infrastructure with confidence through flexible support plans, designed to provide expert guidance, faster response times, and dedicated assistance tailored to your business.

Discover

Frequently Asked Questions about Managed Inference

What is a Managed Inference service?

Managed Inference is Exoscale’s token-based AI inference service. It provides access to a curated catalog of production-ready models through a managed API, without requiring you to operate GPUs or model-serving infrastructure.

How is Managed Inference different from Dedicated Inference?

Managed Inference is designed for flexible, usage-based consumption (you pay per token) of a curated model catalog. Dedicated Inference is designed for customers who want to deploy and operate their own chosen model on managed, dedicated GPU-backed infrastructure.

Will I be able to choose from multiple model categories?

Yes. The Managed Inference service is planned to cover several common enterprise AI categories, including general-purpose models, coding, small models, embeddings, OCR, and speech workloads.

How does the token pricing work?

Pricing is calculated based on the number of tokens processed. A token represents a piece of a word (roughly 4 characters in English). You are billed based on input tokens (the prompt you send) and output tokens (the response generated by the model). Detailed pricing tiers per 1M tokens per model will be published upon launch.

Where will the Managed Inference solution run?

The service is planned for deployment on Exoscale infrastructure in Europe and Switzerland, with clear location transparency.

Coming soon: Pay-as-you-go Managed AI inference on a sovereign European cloud.

Exoscale’s Managed Inference

Why Choose Managed Inference on Exoscale?

EU & Switzerland availability

Curated production-ready model catalog

Reliable performance by design

Usage-Based Pricing

Clear model lifecycle

Integrated with the Exoscale platform

Shape Our Production-Ready Catalog

Designed for common AI workloads

Rapid prototyping & experimentation

Variable or unpredictable traffic

Multi-team or decentralized usage

Gradual path to production

Combine Managed Inference with…

Dedicated Inference

Managed pgvector

Managed Vector search

Explore More Exoscale Services

Scalable Kubernetes Service

Simple Object Storage

Support Plans

Frequently Asked Questions about Managed Inference

What is a Managed Inference service?

How is Managed Inference different from Dedicated Inference?

Will I be able to choose from multiple model categories?

How does the token pricing work?

Where will the Managed Inference solution run?