Enable backups

This page applies to SUSE® Observability {release-version} or newer. If you are upgrading from a version prior to {release-version}, see Migrating to S3Proxy for important information about changes in configuration of the backup and restore solution.

Overview

SUSE® Observability has a built-in backup mechanism that can be configured to store backups to AWS S3, Azure Blob Storage, or a Kubernetes Persistent Volume.

Most backups are enabled with a single Helm value global.backup.enabled. Settings backups are always enabled but behave differently based on the following values:

Backup scope

The following data can be automatically backed up:

  • Settings (StackPacks, monitors, views, tokens) - always enabled:

    • When global.backup.enabled is false: backups are stored via the S3 Proxy to a K8s Persistent Volume

    • When global.backup.enabled is true: backups are stored via the S3 Proxy on your configured storage backend

  • Topology data and Settings stored in StackGraph - enabled when global.backup.enabled is true

  • Metrics stored in SUSE® Observability’s Victoria Metrics instance(s) - enabled when global.backup.enabled is true

  • Telemetry data stored in SUSE® Observability’s Elasticsearch instance - enabled when global.backup.enabled is true

  • OpenTelemetry data stored in SUSE® Observability’s ClickHouse instance - enabled when global.backup.enabled is true

Storage options

Backups use [S3 Proxy](https://github.com/gaul/s3proxy) as an S3-compatible gateway to your storage backend. S3 Proxy is automatically deployed by the Helm chart.

It can be configured to store the backups in three locations:

Enable backups

AWS S3

Encryption

Amazon S3-managed keys (SSE-S3) should be used when encrypting S3 buckets that store the backups.

⚠️ Encryption with AWS KMS keys stored in AWS Key Management Service (SSE-KMS) isn’t supported. This will result in errors such as this one in the Elasticsearch logs:

Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Unable to verify integrity of data upload. Client calculated content hash (contentMD5: ZX4D/ZDUzZWRhNDUyZTI1MTc= in base 64) didn’t match hash (etag: c75faa31280154027542f6530c9e543e in hex) calculated by Amazon S3. You may need to delete the data stored in Amazon S3. (metadata.contentMD5: null, md5DigestStream: com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream@5481a656, bucketName: suse-observability-elasticsearch-backup, key: tests-UG34QIV9s32tTzQWdPsZL/master.dat)\",

Using separate S3 buckets

To enable scheduled backups to separate AWS S3 buckets (one per datastore), add the following YAML fragment to the Helm values.yaml file used to install SUSE® Observability:

global:
  backup:
    enabled: true
  s3proxy:
    credentials:
      # Credentials used internally in the cluster to access the backup storage (S3Proxy).
      accessKey: YOUR_ACCESS_KEY
      secretKey: YOUR_SECRET_KEY
backup:
  storage:
    backend:
      s3:
        enabled: true
        # Specify AWS region (can also be set via environment variable instead)
        region: "eu-west-1"
        # Option 1: Use explicit credentials
        accessKey: AWS_ACCESS_KEY
        secretKey: AWS_SECRET_KEY

        # Option 2: Use IAM role / IRSA
        # Leave accessKey and secretKey empty


        # Option 3: Use credentials from external secret:
        fromExternalSecret: "my-aws-credentials" # Kubernetes secret containing backendAccessKey and backendSecretKey
        # Optional: Custom S3-compatible endpoint (e.g., MinIO)
        # endpoint: "https://minio.example.com"
  stackGraph:
    bucketName: AWS_STACKGRAPH_BUCKET
  elasticsearch:
    bucketName: AWS_ELASTICSEARCH_BUCKET
  configuration:
    bucketName: AWS_CONFIGURATION_BUCKET
victoria-metrics-0:
  backup:
    bucketName: AWS_VICTORIA_METRICS_BUCKET
victoria-metrics-1:
  backup:
    bucketName: AWS_VICTORIA_METRICS_BUCKET
clickhouse:
  backup:
    bucketName: AWS_CLICKHOUSE_BUCKET

Replace the following values:

  • YOUR_ACCESS_KEY and YOUR_SECRET_KEY are the credentials that will be used to secure the S3 Proxy instance. The automatic backup jobs and the restore jobs use these to authenticate. They’re also required if you want to manually access the S3 Proxy instance.

    • YOUR_ACCESS_KEY should contain 5 to 20 alphanumerical characters.

    • YOUR_SECRET_KEY should contain 8 to 40 alphanumerical characters.

  • AWS_ACCESS_KEY and AWS_SECRET_KEY are the AWS credentials for the IAM user that has access to the S3 buckets where the backups will be stored. See below for the permission policy that needs to be attached to that user.

  • AWS_*_BUCKET are the names of the S3 buckets where the backups should be stored.

    The names of AWS S3 buckets are global across the whole of AWS. Therefore, the S3 buckets, with the default name (sts-elasticsearch-backup, sts-configuration-backup, sts-stackgraph-backup, sts-victoria-metrics-backup and sts-clickhouse-backup ), will probably not be available.

Using a single S3 bucket with prefixes

Instead of using separate buckets for each datastore, you can use a single S3 bucket with different prefixes:

global:
  backup:
    enabled: true
  s3proxy:
    credentials:
      # Credentials used internally in the cluster to access the backup storage (S3Proxy).
      accessKey: YOUR_ACCESS_KEY
      secretKey: YOUR_SECRET_KEY
backup:
  storage:
    backend:
      s3:
        enabled: true
        region: "eu-west-1"
        # For other authentication options and custom endpoint, see the previous example
        accessKey: AWS_ACCESS_KEY
        secretKey: AWS_SECRET_KEY
  elasticsearch:
    bucketName: BUCKET
    s3Prefix: elasticsearch
  stackGraph:
    bucketName: BUCKET
    s3Prefix: stackgraph
  configuration:
    bucketName: BUCKET
    s3Prefix: configuration
victoria-metrics-0:
  backup:
    bucketName: BUCKET
    s3Prefix: victoria-metrics-0
victoria-metrics-1:
  backup:
    bucketName: BUCKET
    s3Prefix: victoria-metrics-1
clickhouse:
  backup:
    bucketName: BUCKET
    s3Prefix: clickhouse

Replace BUCKET with your S3 bucket name. The backups for different datastores are organized using the configured s3Prefix values. The same YOUR_ACCESS_KEY, YOUR_SECRET_KEY, AWS_ACCESS_KEY, and AWS_SECRET_KEY values from the previous section apply here.

AWS S3 Permissions

The IAM user identified by AWS_ACCESS_KEY and AWS_SECRET_KEY must be configured with the following permission policy to access the S3 buckets:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowListBackupBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::AWS_STACKGRAPH_BUCKET",
                "arn:aws:s3:::AWS_ELASTICSEARCH_BUCKET",
                "arn:aws:s3:::AWS_VICTORIA_METRICS_BUCKET",
                "arn:aws:s3:::AWS_CLICKHOUSE_BUCKET",
                "arn:aws:s3:::AWS_CONFIGURATION_BUCKET"
            ]
        },
        {
            "Sid": "AllowWriteBackupBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::AWS_STACKGRAPH_BUCKET/*",
                "arn:aws:s3:::AWS_ELASTICSEARCH_BUCKET/*",
                "arn:aws:s3:::AWS_VICTORIA_METRICS_BUCKET/*",
                "arn:aws:s3:::AWS_CLICKHOUSE_BUCKET/*",
                "arn:aws:s3:::AWS_CONFIGURATION_BUCKET"
            ]
        }
    ]
}

Azure Blob Storage

Using separate containers

To enable backups to separate Azure Blob Storage containers (one per datastore), add the following YAML fragment to the Helm values.yaml file used to install SUSE® Observability:

global:
  backup:
    enabled: true
  s3proxy:
    credentials:
      # Credentials used internally in the cluster to access the backup storage (S3Proxy).
      accessKey: YOUR_ACCESS_KEY
      secretKey: YOUR_SECRET_KEY

backup:
  storage:
    backend:
      azure:
        enabled: true
        accountName: "mystorageaccount"
        # Option 1: Use explicit credentials
        accountKey: "your-storage-account-key"

        # Option 2: Use managed identity
        # Leave accountKey empty

        # Option 3: Use credentials from external secret:
        fromExternalSecret: "my-azure-credentials" # Kubernetes secret containing azureAccountName and azureAccountKey

Replace the following values:

The StackGraph, Elasticsearch, Victoria Metrics, and ClickHouse backups are stored in BLOB containers called sts-stackgraph-backup, sts-configuration-backup, sts-elasticsearch-backup, sts-victoria-metrics-backup, sts-clickhouse-backup respectively. These names can be changed by setting the Helm values backup.stackGraph.bucketName, backup.elasticsearch.bucketName, victoria-metrics-0.backup.bucketName, victoria-metrics-1.backup.bucketName and clickhouse.backup.bucketName respectively.

Using a single container with prefixes

Instead of using separate containers for each datastore, you can use a single Azure Blob Storage container with different prefixes:

global:
  backup:
    enabled: true
  s3proxy:
    credentials:
      # Credentials used internally in the cluster to access the backup storage (S3Proxy).
      accessKey: YOUR_ACCESS_KEY
      secretKey: YOUR_SECRET_KEY

backup:
  storage:
    backend:
      azure:
        enabled: true
        accountName: "mystorageaccount"
        # For other authentication options, see the previous example
        accountKey: "your-storage-account-key"
  elasticsearch:
    bucketName: CONTAINER
    s3Prefix: elasticsearch
  stackGraph:
    bucketName: CONTAINER
    s3Prefix: stackgraph
  configuration:
    bucketName: CONTAINER
    s3Prefix: configuration
victoria-metrics-0:
  backup:
    bucketName: CONTAINER
    s3Prefix: victoria-metrics-0
victoria-metrics-1:
  backup:
    bucketName: CONTAINER
    s3Prefix: victoria-metrics-1
clickhouse:
  backup:
    bucketName: CONTAINER
    s3Prefix: clickhouse

Replace the following values: * YOUR_ACCESS_KEY and YOUR_SECRET_KEY are the credentials that will be used to secure the S3 Proxy instance. The automatic backup jobs and the restore jobs use these to authenticate. They’re also required if you want to manually access the S3 Proxy instance. * CONTAINER with your Azure Blob Storage container name. The backups for different datastores will be organized using the configured s3Prefix values. The same AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY values from the previous section apply here.

Kubernetes Persistent Volume

Using Kubernetes Persistent Volumes for backups has significant limitations:

  • Expensive - Cloud providers typically use block storage (EBS/Azure Block) which is costly for large backups

  • No disaster recovery - PVs are destroyed if the cluster is deleted

  • Not portable - Cannot restore backups to a different cluster

Recommendation: Use AWS S3 or Azure Blob Storage instead for production environments.

Basic configuration

To enable backups to cluster-local storage, enable global backups with PVC storage by adding the following YAML fragment to the Helm values.yaml file used to install SUSE® Observability:

global:
  backup:
    enabled: true
  s3proxy:
    credentials:
      # Credentials used internally in the cluster to access the backup storage (S3Proxy).
      accessKey: YOUR_ACCESS_KEY
      secretKey: YOUR_SECRET_KEY
backup:
  storage:
    backend:
      pvc:
        enabled: true
        size: 500Gi  # Size for main backup storage, size depends on the expected combined backups size

Replace the following values:

  • YOUR_ACCESS_KEY and YOUR_SECRET_KEY are the credentials that will be used to secure the S3 Proxy instance. The automatic backup jobs and the restore jobs use these to authenticate. They’re also required if you want to manually access the S3 Proxy instance.

Configuration and topology data (StackGraph)

Configuration and topology data (StackGraph) backups are full backups, stored in a single file with the extension .graph. Each file contains a full backup and can be moved, copied or deleted as required.

Backup schedule

By default, the StackGraph backups are created daily at 03:00 AM server time.

The backup schedule can be configured using the Helm value backup.stackGraph.scheduled.schedule, specified in Kubernetes cron schedule syntax (kubernetes.io).

Backup retention

By default, the StackGraph backups are kept for 30 days. As StackGraph backups are full backups, this can require a lot of storage.

The backup retention delta can be configured using the Helm value backup.stackGraph.scheduled.backupRetentionTimeDelta, specified in the format of GNU date --date argument. The default is 30 days ago. See Relative items in date strings for more examples.

Disable scheduled backups

To disable scheduled StackGraph backups, set the backup schedule to a date far in the past using the Helm value backup.stackGraph.scheduled.schedule:

backup:
  stackGraph:
    scheduled:
      schedule: '0 0 1 1 1970'  # January 1, 1970 (epoch start)

Settings

Settings (Configuration previously) includes installed instances of StackPacks with their configuration and other customizations created by the user, such as monitors, custom views, and service tokens.

Settings backups are lightweight (typically only several megabytes) and quick to restore with minimal downtime. After a settings backup is restored, new data will be processed as before, recreating topology, health states, and alerts. However, topology history (including health) is not preserved in settings backups - for that purpose, use the StackGraph backup described above.

Settings backups are always enabled, regardless of the global.backup.enabled value:

  • When global.backup.enabled is true: Settings backups are stored, via the S3 Proxy, to your configured storage backend (AWS S3, Azure Blob Storage, or Kubernetes Persistent Volume)

  • When global.backup.enabled is false: Settings backups are stored, via the S3 Proxy, to a dedicated Kubernetes Persistent Volume

Backup schedule

By default, settings backups are created daily at 04:00 AM server time.

The backup schedule can be configured using the Helm value backup.configuration.scheduled.schedule, specified in Kubernetes cron schedule syntax (kubernetes.io).

Backup retention

Backup retention depends on the global.backup.enabled setting:

When global.backup.enabled is true (backups stored via S3 Proxy):

  • By default, settings backups are kept for 365 days

  • Configure retention using backup.configuration.scheduled.backupRetentionTimeDelta - specified in the format of GNU date --date argument. The default is 365 days ago

When global.backup.enabled is false (backups stored to a dedicated PV):

  • By default, the last 10 backup files are kept on the Persistent Volume

  • Configure the maximum number of files using backup.configuration.maxLocalFiles (default: 10)

  • Configure the PV size using backup.configuration.scheduled.pvc.size (default: 2Gi)

Example configuration for dedicated PV storage:

backup:
  configuration:
    maxLocalFiles: 10
    scheduled:
      pvc:
        size: '2Gi'

Disable scheduled backups

To disable scheduled settings backups, set the backup schedule to a date far in the past using the Helm value backup.configuration.scheduled.schedule:

backup:
  configuration:
    scheduled:
      schedule: '0 0 1 1 1970'  # January 1, 1970 (epoch start)

Metrics (Victoria Metrics)

Victoria Metrics uses a two-tier smart backup strategy:

  • Hourly incremental — takes a snapshot and uploads only changed blocks to a fixed latest path in the backup storage.

  • Daily snapshot — performs a server-side copy from latest to a timestamped folder (YYYYMMDDHHmmss), creating a point-in-time restore point.

High Availability deployments

High Available deployments use two instances of Victoria Metrics (victoria-metrics-0 and victoria-metrics-1). Backups are configured independently for each instance.

Backup schedule

Victoria Metrics backups run on two schedules:

  • Hourly incremental backups:

    • victoria-metrics-0 — 25 minutes past the hour

    • victoria-metrics-1 — 35 minutes past the hour

  • Daily snapshot backups:

    • victoria-metrics-0 — at 00:55 daily

    • victoria-metrics-1 — at 01:05 daily

The backup schedules can be configured using the Helm values victoria-metrics-0.backup.scheduled.hourly and victoria-metrics-0.backup.scheduled.daily (and the corresponding victoria-metrics-1 values).

Backup retention

Daily snapshot backups are retained using a two-tier retention policy:

  1. The most recent daily backups are kept unconditionally (default: 7, configured via keepLastDaily).

  2. From older backups, one backup per 7-day period is kept (default: up to 4 weekly backups, configured via keepLastWeekly).

  3. Everything else is deleted automatically.

The retention can be configured using the following Helm values:

victoria-metrics-0:
  backup:
    keepLastDaily: 7   # Number of most recent daily backups to retain
    keepLastWeekly: 4  # Number of weekly backups to retain (beyond the daily window)
victoria-metrics-1:
  backup:
    keepLastDaily: 7
    keepLastWeekly: 4

Disable scheduled backups

To disable scheduled Victoria Metrics backups, set the backup schedules for both instances to a date far in the past:

victoria-metrics-0:
  backup:
    scheduled:
      hourly: '0 0 1 1 1970'  # January 1, 1970 (epoch start)
      daily: '0 0 1 1 1970'   # January 1, 1970 (epoch start)
victoria-metrics-1:
  backup:
    scheduled:
      hourly: '0 0 1 1 1970'  # January 1, 1970 (epoch start)
      daily: '0 0 1 1 1970'   # January 1, 1970 (epoch start)

OpenTelemetry (ClickHouse)

ClickHouse uses both incremental and full backups. The old backups are deleted based on the configured retention policy.

Backup schedule

The ClickHouse backups are created:

  • Full Backup - at 00:45 every day

  • Incremental Backup - 45 minutes past the hour (from 3 am to 12 am)

Backup retention

By default, the tooling keeps last 308 backups (full and incremental) what is equal to ~14 days.

The backup retention can be configured using the Helm value clickhouse.backup.config.keep_remote.

Disable scheduled backups

To disable scheduled ClickHouse backups, set both full and incremental backup schedules to a date far in the past:

clickhouse:
  backup:
    scheduled:
      full_schedule: '0 0 1 1 1970'         # January 1, 1970 (epoch start)
      incremental_schedule: '0 0 1 1 1970'  # January 1, 1970 (epoch start)

Telemetry data (Elasticsearch)

Elasticsearch snapshots are incremental.

Snapshot schedule

The Elasticsearch snapshots are created daily at 03:00 AM server time.

Snapshot retention

By default, Elasticsearch snapshots are kept for 30 days, with a minimum of 5 snapshots and a maximum of 30 snapshots.

The retention time and number of snapshots kept can be configured using the following Helm values:

  • backup.elasticsearch.scheduled.snapshotRetentionExpireAfter, specified in Elasticsearch time units (elastic.co).

  • backup.elasticsearch.scheduled.snapshotRetentionMinCount

  • backup.elasticsearch.scheduled.snapshotRetentionMaxCount

Disable scheduled snapshots

To disable scheduled Elasticsearch snapshots, set the snapshot schedule to a date far in the past using the Helm value backup.elasticsearch.scheduled.schedule:

backup:
  elasticsearch:
    scheduled:
      schedule: '0 0 1 1 1970'  # January 1, 1970 (epoch start)