Unlocking Enhanced Observability: The Power of Thanos in Multi-Cluster Kubernetes Environments

6 min readJul 29, 2023

Introduction

In this article we are going to see the limitation of a Prometheus only monitoring stack and why moving to a Thanos based stack can improve metrics retention and also reduce overall infrastructure cost.

Config files and chart used for this demo are available here.

Kubernetes Prometheus Stack

This stack is designed to collect, store, query, and visualize metrics from Kubernetes clusters, providing insights into the health and performance of the applications and infrastructure.

Prometheus: collect metrics
AlertManager: send alerts to various provider based on metrics query
Grafana: fancy dashboards

While this is a popular and effective monitoring solution, it also has some caveats and limitations that one should be aware of:

It does not scale out well when increasing the number of cluster from which you want to get metrics.
Each cluster has its own Grafana with its own set of dashboards which can be a pain to maintain.
Prometheus is designed for short-term data retention by default. It stores metrics for a limited period, typically a few weeks. This means that historical data beyond the retention period is not available for analysis, block storage can be expensive if you store terabyte of data on it. which may be crucial for certain use cases.

Thanos

Thanos works as an extension to Prometheus, enhancing its capabilities to address the challenges of long-term data retention, scalability, and global querying. It introduces several components and architectural changes to achieve these goals. Here’s an overview of how Thanos works:

Prometheus Data Collection:

Just like in a regular Prometheus setup, each Prometheus instance scrapes metrics from various targets (applications, services, Kubernetes components, etc.) using the pull-based model.
Prometheus stores the collected metrics locally in its time-series database.

Thanos Sidecar:

The Thanos Sidecar runs alongside each Prometheus server and acts as a connector between Prometheus and Thanos.
It continuously uploads the local Prometheus data to a remote object storage system, like Amazon S3, Google Cloud Storage, or any other compatible storage backend.
By pushing the data to remote storage, the Sidecar offloads the long-term data storage responsibility from Prometheus.

Thanos Store Gateway:

The Thanos Store component is a read-only component that exposes the metrics data stored in the remote object storage.
It acts as a gateway, allowing Prometheus and other Thanos components to query the historical metrics data from the object storage, even though the data is physically stored in a different location.

Global Querying with Thanos Querier:

The Thanos Querier component allows for cross-cluster querying and operates as a single entry point for querying metrics data from multiple Prometheus instances and Thanos Store gateways.
When a query is issued, the Querier aggregates data from various sources (Prometheus servers and Thanos Store gateways) and presents a unified view of metrics from different clusters.

Compaction and Deduplication:

Thanos periodically runs the Compactor component, which performs data compaction in the object storage, reducing storage space by removing unnecessary data.
Thanos also deduplicates data to optimize storage efficiency and reduce redundancy, further reducing the storage costs.

High Availability:

Thanos achieves high availability by replicating data across multiple object storage instances.
In case a Thanos Store or object storage instance becomes unavailable, the data remains accessible through other replicas, ensuring fault tolerance and avoiding data loss

Ruler:

Thanos Ruler evaluates alerting rules across all the Prometheus servers and Thanos Store gateways in the Thanos-based monitoring stack. This allows for a consistent and unified approach to alerting across all clusters.
Alerting rules can be defined in any of the Prometheus instances or Thanos Store gateways. Thanos Ruler collects and aggregates these rules, ensuring that each rule is evaluated appropriately.
Thanos Ruler works in conjunction with Prometheus Alertmanager to handle alert notifications. When an alert is triggered, it is sent to Alertmanager, which then takes care of grouping, deduplicating, and routing alerts to the appropriate receivers.

Multi Cluster Architecture

This example is running on Azure with 3 cluster setup in private AKS (Azure Kuberentes Service):

Ops-Cluster: Centralized cluster for monitoring and operations exposed with external label as ops-cluster.
Cluster N: Application clusters exposed with external label cluster-n.

Our deployment uses the official Kube-prometheus-stack.

Cluster Setup:

Setup Monitoring namespace
$: kubectl create ns monitoring

Create secret to mount blob container for storing prometheus metrics, update access key in thanos.yaml

type: AZURE
config:
  storage_account: 'centralmetricstore' # azure blob storage account name
  storage_account_key: 'xxxxxxxxxxxxxxxxxxxx' # azure blob storage key
  container: 'metricsthanos' # azure blob container name

# Create Secret using 
$: kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos.yaml

Deploy prometheus stack — Make below changes in values file cluster-n.yaml

# update on line 2750
externalLabels: 
  cluster: cluster-n

Install helm chart

$: cd multi-cluster-thanos/monitoring-setup
$: helm dependency build
$: cd ../
$: helm  install prometheus-stack monitoring-setup/ --values ./cluster-n.yaml -n monitoring

# check pods
$: kubectl get po -n monitoring

# check services
$: kubectl get svc -n monitoring

Following components are deployed:

Prometheus
Thanos sidecar
Kube-metrics

NOTE: repeat this steps for other application clusters.

Ops-cluster Setup:

Create namespace to setup monitoring stack
$: kubectl create ns monitoring

Create secret to mount blob container for storing prometheus metrics, update access key in thanos.yaml.

type: AZURE
config:
  storage_account: 'centralmetricstore' # azure blob storage account name
  storage_account_key: 'xxxxxxxxxxxxxxxxxxxx' # azure blob storage key
  container: 'metricsthanos' # azure blob container name

# Create Secret using 
$: kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos.yaml

Deploy prometheus stack — Make below changes in values file ops-cluster.yaml

# update slack config for alerts 
   slack_configs:
      - channel: 'channel-name'
        api_url: 'https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXX'

# update external label in value file
externalLabels: 
  cluster: ops-cluster # line 2749

# on line 3840 update alertmanager config, this is used by thanos ruler to fire alerts
  extraSecret:
    name: "thanos-alertmanager-config"
    data:
      alertmanager-configs.yaml: |
        alertmanagers:
        - static_configs: ["prometheus-stack-kube-prom-alertmanager.monitoring.svc.cluster.local:9093"]
          scheme: http
          timeout: 30s

$: cd multi-cluster-thanos/
$: helm  install prometheus-stack monitoring-setup/ --values ./ops-cluster.yaml -n monitoring

# check pods
$: kubectl get po -n monitoring

# check services
$: kubectl get svc -n monitoring

Following components are deployed:

Prometheus
Grafana
Alertmanager
Thanos sidecar
Thanos ruler
Kube-metrics

Now we need to deploy store and query.

# update XX.XX.XX.XX with node IP, we are using nodePort to expose service
# Find node IP using kubectl get nodes && kubectl get svc -n monitoring  
# querier-deployment.yaml     
args:
   - 'query'
   - '--log.level=debug'
   - '--query.replica-label=prometheus_replica'
   - '--store=XX.XX.XX.XX:30901' # cluster-a
   - - '--store=XX.XX.XX.XX:30901' # cluster-n
   - '--store=prometheus-stack-kube-prom-thanos-external.monitoring.svc.cluster.local:10901' # ops-cluster
   - '--store=thanos-store-svc.monitoring.svc.cluster.local:10901' # store svc

$: kubectl create -f querier-deployment.yaml
# expose service
$: kubectl create -f querier-service-servicemonitor.yaml

# deploy store service which connects to blob storage and fetches historical data
$: kubectl create -f store-service.yaml

Connecting using grafana:

Add prometheus as data source on grafana and pass quereier service endpoint which is thanos-query.monitoring.svc.cluster.local:9090

Conclusion:

Thanos is a powerful open-source project that extends Prometheus and addresses some of the key challenges faced in monitoring and observability, particularly in multi-cluster Kubernetes environments. By adding Thanos to a monitoring stack, organizations can achieve long-term data retention, improved scalability, and efficient global querying capabilities.