Scaling Prometheus with Thanos: Long-Term Storage, HA, and Global Queries

Prometheus is the default metrics and alerting stack for Kubernetes and most cloud-native deployments. One binary, a few scrape configs, and a team gets reliable metrics and alerts within an afternoon.
At a single-cluster scale, that simplicity is the whole point. Once an infrastructure spans several clusters or needs more than a few weeks of history, three limits show up: local disk caps retention, high cardinality pressures RAM, and the per-instance scope hides the global view.
Thanos is the open-source project most teams reach for to address those limits. It sits next to Prometheus, uploads metric blocks to object storage, and exposes a single query endpoint across instances. This article walks through what breaks at scale, how Thanos fixes it, and how Managed Thanos on Exoscale removes the operational burden.
Quick summary of this article
- Prometheus is excellent at single-cluster scale, but local disk caps retention, high cardinality pressures RAM, and each instance only sees its own metrics.
- map[Thanos extends Prometheus from the outside:it uploads metric blocks to object storage, exposes one global query endpoint across instances, and deduplicates across replicas.]
- Object storage decouples retention from disk capacity, so long-term history no longer competes with local SSD space.
- Thanos is a set of focused components (Receiver, Querier, Store Gateway, Compactor), each solving one part of the scaling problem.
- Managed Thanos on Exoscale runs the server-side components for you. You keep your Prometheus, point remote_write at the receiver endpoint, and keep querying with PromQL.
Why Prometheus works so well at first
Prometheus became popular for good reasons: It is easy to run. You deploy one binary, configure a few scrape targets, and you immediately get metrics and alerts. The data model is simple: a time series with labels. PromQL is powerful and flexible. And the ecosystem is large, with hundreds of exporters for databases, applications, and infrastructure components.
For a single cluster or a small setup, Prometheus is often more than enough. You get fast queries, reliable alerting, and full control over your metrics.
Problems usually start when the system grows.
Prometheus without Thanos: Limits at scale
Prometheus long-term storage
Prometheus stores metrics on a local disk. By default, retention is set to 15 days. This design is simple and efficient, but it comes with a clear limit: retention is bounded by disk size.
In practice, this means:
- You keep metrics for days or weeks, not months or years
- Increasing retention often requires resizing disks or moving data
- Historical analysis becomes difficult or impossible
Prometheus was designed for recent data and fast access, not for long-term storage. A single Prometheus server can typically handle one to two million active time series before memory becomes a concern. As cardinality grows, more labels, more targets, more dimensions, Prometheus uses more RAM, and out-of-memory (OOM) crashes become a real risk. Once you need to analyze trends over several months or compare today’s metrics with last year’s, Prometheus alone is not enough.
This is one of the most common Prometheus limits at scale.
No global view: Prometheus multi-cluster monitoring
Prometheus works on a per-instance basis. One Prometheus server gives you visibility into one environment. As soon as you run multiple Kubernetes clusters, regions, or environments, things get complicated:
- Each cluster has its own Prometheus
- Dashboards need to be duplicated
- Queries only show part of the picture
Prometheus federation exists, but it quickly becomes complex. Federation was not designed as a global query layer. It adds configuration overhead and does not solve long-term storage or high availability issues.
Teams often end up asking a simple question with no easy answer:
“Where can I see all my metrics in one place?”
Prometheus high availability
A standalone Prometheus instance is a single point of failure. If the node crashes or the disk fails:
- Metrics are lost
- Alerts may stop firing
- Historical data disappears
High availability is possible, but it is not built in. You need to run multiple Prometheus instances, manage replicas, and handle deduplication yourself. This adds complexity and operational risk.
At a small scale, this may be acceptable. At a larger scale, it becomes a real problem.
Prometheus vs Thanos: How Thanos addresses Prometheus limits
Thanos extends Prometheus from the outside. Each Prometheus instance keeps scraping and alerting as before. Thanos adds a layer that uploads metric blocks to object storage, queries them globally, and deduplicates across replicas.
Here is a quick comparison:
| Capability | Prometheus alone | Prometheus + Thanos |
|---|---|---|
| Retention | Days to weeks (local disk) | Months to years (object storage) |
| Global view | Per-instance only | Unified query across all clusters |
| High availability | Manual replica management | Built-in deduplication and failover |
| Downsampling | None | Automatic for older data |
| Storage cost | Bound to local SSD/disk | Cheap object storage (S3, GCS, etc.) |
The table reflects a layered design: Prometheus stays the source of truth for scraping, Thanos handles long-term storage and the cross-instance view.
Thanos architecture: key components
Thanos is composed of focused components, each solving a specific problem:
This modular design means you only deploy what you need.
Long-term metrics storage
Thanos stores metrics in object storage (S3-compatible stores like Exoscale Simple Object Storage, GCS, Azure Blob) instead of local disk.
Object storage decouples retention from disk capacity:
- Metrics can be kept for months or years
- Storage scales independently from compute
- Historical analysis becomes practical
Instead of worrying about disk size and retention windows, teams can focus on using their data. The Compactor automatically downsamples older data, which keeps queries fast and storage efficient. See retention and downsampling for configuration details.
For reference, Zapier stores over 130 TB of metrics with Thanos and still serves queries within seconds.
Thanos directly solves the Prometheus long-term storage problem.
Global query across clusters
The Thanos Querier provides a single endpoint that aggregates metrics from multiple Prometheus instances.
From the user’s point of view this means:
- One PromQL endpoint
- One set of dashboards
- One global view
Metrics from different clusters, regions, or environments can be queried together, without manual federation. PromQL stays exactly the same. This makes Prometheus multi-cluster monitoring much simpler and more reliable. To set this up, follow the guide on connecting Prometheus to Thanos.
Built-in high availability
Thanos supports running Prometheus replicas and automatically deduplicates metrics during both querying and compaction.
If one Prometheus instance goes down:
- Queries still work
- Metrics remain available
- Alerts continue to function
High availability becomes part of the system instead of an extra configuration burden. Combined with the remote write protocol for ingestion, this removes one of the biggest operational risks of running Prometheus at scale.
Thanos vs Cortex vs Grafana Mimir vs VictoriaMetrics
Thanos is not the only option for scaling Prometheus. Other projects address similar problems:
Approach: Provides a centralized, multi-tenant metrics backend. Ingests metrics via remote write and stores them in object storage.
Best for: Teams that prefer a fully centralized architecture managed by a dedicated platform team.
The main difference: Thanos keeps Prometheus at the center and extends it from the outside. Cortex, Mimir, and VictoriaMetrics replace the storage layer entirely with their own backends.
When Prometheus and Thanos make sense
Thanos is not required for every Prometheus setup. It usually makes sense when:
- You run multiple Kubernetes clusters or environments
- You need to keep metrics for longer than a few weeks
- You want a single global view of your metrics
- You want high availability without complex setups
If you are starting with Prometheus, Thanos may not be necessary right away. But as systems grow, it becomes a natural next step.
Prometheus and Thanos on Exoscale
The diagram shows the ingestion and query flow of Managed Thanos on Exoscale. On the left, metric sources push data via the Prometheus remote write protocol: standalone Prometheus instances, Kubernetes clusters running Prometheus Agent, and other Exoscale DBaaS services. In the center, Managed Thanos receives and stores those metrics in Exoscale Simple Object Storage, running compaction and downsampling automatically. On the right, Managed Grafana queries the Thanos Querier endpoint to visualize metrics across all sources in a single dashboard.
On Exoscale, Prometheus remains exactly what users expect: a metrics collector and alerting engine. Managed Thanos provides the scalable backend:
- Long-term metrics storage
- Global querying
- High availability
- Automatic maintenance
Prometheus instances connect to Thanos using the Prometheus remote write protocol. Metrics are stored securely in European data centers using Exoscale Simple Object Storage, and queried using standard PromQL. Visualization works seamlessly with Managed Grafana.
Connecting an existing Prometheus instance requires a single remote_write block in prometheus.yml:
remote_write:
- url: https://<your-thanos-endpoint>/api/v1/receive
basic_auth:
username: <user>
password: <password>
queue_config:
capacity: 10000
max_samples_per_send: 5000
max_shards: 30The endpoint URL and credentials are available in the Exoscale Portal once the service is provisioned. For a Kubernetes setup using Prometheus Operator, see the full deployment guide.
Get started with the Thanos quick start guide.
Frequently asked questions
What is Thanos in Prometheus?
Thanos is an open-source project that extends Prometheus with long-term storage, a global query view across multiple instances, and high availability. It sits alongside existing Prometheus deployments without replacing them, using S3-compatible object storage as its backend for metric retention spanning months or years.
Is Thanos part of Prometheus?
No. Thanos is a separate open-source project, maintained independently at thanos.io. It integrates with Prometheus through the remote write protocol and the TSDB block format, but it has its own release cycle and components.
Why use Thanos over Prometheus?
Prometheus alone is limited to local disk retention (15 days by default), has no global view across multiple instances, and requires manual work for high availability. Thanos addresses all three: it ships metric blocks to object storage for configurable long-term retention, provides a single PromQL endpoint across all Prometheus instances, and deduplicates data from replicas automatically. Teams typically add Thanos when they run more than one cluster or need more than a few weeks of metric history.
What is the difference between Prometheus and Thanos?
Prometheus collects metrics and stores them locally on disk with short-term retention. Thanos adds a distributed storage layer using object storage, a global query view across multiple Prometheus instances, and automatic deduplication for high availability.
How does Thanos handle long-term storage?
Thanos supports two ingestion models. In the Sidecar model, a Thanos Sidecar runs alongside each Prometheus instance and uploads TSDB blocks to object storage. In the Receiver model (used by Managed Thanos on Exoscale), Prometheus pushes metrics via remote_write to a central Receiver, with no sidecar required. In both cases, the Store Gateway serves historical data and the Compactor downsamples older blocks to keep queries fast and storage costs low.
Can I use Thanos with multiple Kubernetes clusters?
Yes. This is one of the primary use cases. Each cluster runs its own Prometheus configured to push metrics via remote_write. A central Thanos Querier aggregates metrics from all clusters into a single PromQL endpoint. See the deployment guide for a Kubernetes setup using Prometheus Operator.
What is the difference between Thanos and Cortex?
Thanos extends Prometheus from the outside, keeping existing Prometheus instances intact. Cortex replaces the storage layer with a centralized multi-tenant backend. Thanos is often preferred when teams want to keep their existing Prometheus setup. Cortex suits centralized, managed architectures.
Is Thanos compatible with Grafana?
Yes. The Thanos Querier exposes a standard Prometheus-compatible API. You can point Grafana directly at the Thanos Querier endpoint, and all existing dashboards and PromQL queries work without changes. See the Grafana integration guide.
How many time series can Prometheus handle?
A single Prometheus server can typically handle one to two million active time series on commodity hardware. Beyond that, high cardinality, caused by too many label combinations, leads to increased memory usage and potential out-of-memory (OOM) crashes. Thanos helps by distributing the query and storage load across multiple Prometheus instances.
What is the difference between Thanos and Grafana Mimir?
Thanos extends existing Prometheus instances from the outside using a sidecar model. Grafana Mimir replaces the storage backend entirely with a centralized, multi-tenant architecture. Thanos is a better fit when you want to keep your current Prometheus setup. Mimir suits teams that want a fully managed, horizontally scalable write path integrated with the Grafana ecosystem.
Can Prometheus store metrics for years?
Not on its own. Prometheus stores metrics on local disk with a default retention of 15 days. For long-term storage spanning months or years, you need an external solution. Thanos solves this by uploading Prometheus metric blocks to object storage and serving them through its Store Gateway.
Clovis Genevard
