clickhouse-backup icon indicating copy to clipboard operation
clickhouse-backup copied to clipboard

use_environment_credentials is not working when using IRSA

Open jasondavindev opened this issue 1 year ago • 25 comments

ClickHouse server version 24.1.2.5 ClickHouse backup version: 2.6.2

In my clickhouse setup I set use_environment_credentials to true for s3 disk, but when using remote backup it cannot uses service account credentials

  <storage_configuration>
      <disks>
        <s3_backup>
          <type>s3</type>
          <endpoint>https://xxxxxxxxxxx.s3.amazonaws.com/</endpoint>
          <use_environment_credentials>true</use_environment_credentials>
        </s3_backup>

      
      <!--
        default disk is special, it always exists even if not explicitly configured here,
        but you can't change it's path here (you should use <path> on top level config instead)
      -->
      <default>
        <!--
          You can reserve some amount of free space on any disk (including default) by adding
          keep_free_space_bytes tag.
        -->
        <keep_free_space_bytes>10485760</keep_free_space_bytes>
      </default>
      <s3>
        <type>s3</type>
        <endpoint>https://xxxxxxxxxxxx.s3.amazonaws.com/data2/</endpoint>
        <use_environment_credentials>true</use_environment_credentials>
      </s3>
    </disks>

The following warn is shown

2024-10-10 21:40:38.196 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3_backup doesn't contains <access_key_id> and <secret_access_key> environment variables will use
2024-10-10 21:40:38.200 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3 doesn't contains <access_key_id> and <secret_access_key> environment variables will use

And following error is shown

2024-10-10 21:40:55.236 FTL cmd/clickhouse-backup/main.go:658 > error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-10-full/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/vkw/nyfgwkxlxfhshaxogyccexradjzrf -> xxxxxxxxxxx/s3/2024-10-10-full/s3/vkw/nyfgwkxlxfhshaxogyccexradjzrf return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: DQCDY8K4EBJDPEN6, HostID: t/U9Ut73DraD/sbHxG6xLKitulhU867kZV8TQOxJ4tvWhI7CmlUv62nzRdKVfi9vafyt9p+v4Rs=, api error AccessDenied: Access Denied"

My config

general:
  remote_storage: s3
  max_file_size: 1073741824
  backups_to_keep_local: -1

  backups_to_keep_remote: 0

  log_level: info
  allow_empty_backups: false

  download_concurrency: 8
  upload_concurrency: 8

  download_max_bytes_per_second: 0
  upload_max_bytes_per_second: 0

  object_disk_server_side_copy_concurrency: 32

  allow_object_disk_streaming: false

  restore_schema_on_cluster: "cluster"
  upload_by_part: true
  download_by_part: true
  use_resumable_state: true

  restore_database_mapping: {}

  restore_table_mapping: {}

  retries_on_failure: 3
  retries_pause: 5s

  watch_interval: 1h
  full_interval: 24h
  watch_backup_name_template: "shard{shard}-{type}-{time:20060102150405}"

  sharded_operation_mode: none

  cpu_nice_priority: 15
  io_nice_priority: "idle"

  rbac_backup_always: true
  rbac_resolve_conflicts: "recreate"
clickhouse:
  username: default
  password: ""
  host: localhost
  port: 9000
  skip_tables:
    - system.*
    - INFORMATION_SCHEMA.*
    - information_schema.*
    - default.*

  timeout: 6h
  freeze_by_part: false
  freeze_by_part_where: ""
  secure: false
  skip_verify: true
  sync_replicated_tables: true
  log_sql_queries: false
  debug: false
  config_dir: "/etc/clickhouse-server"
  ignore_not_exists_error_during_freeze: true
  check_replicas_before_attach: true
  use_embedded_backup_restore: false
  embedded_backup_disk: ""
  backup_mutations: true
  restore_as_attach: true
  check_parts_columns: true
  max_connections: 0
s3:
  bucket: "xxxxxxxxxxxxx"
  endpoint: ""
  region: us-east-1

  acl: private
  assume_role_arn: ""
  force_path_style: false
  path: ""
  object_disk_path: "backups/"
  disable_ssl: false
  compression_level: 1
  compression_format: tar

  disable_cert_verification: true
  use_custom_storage_class: false
  storage_class: STANDARD
  concurrency: 1
  part_size: 0
  max_parts_count: 10000
  allow_multipart_download: false
  checksum_algorithm: ""

For test purposes I selected just 1 table for backup and it works

  • selected 1 table that not contains data on s3 (tiered) - works
  • selected 1 table that contains data on s3 (tiered) - works

But when I selected a tables set the AccessDenied is shown

Output of successfully backup (wrote to s3)

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml list
backup5      64.95GiB   10/10/2024 18:01:35   remote      tar, regular
2024-10-10   10.00GiB   10/10/2024 21:15:30   remote      tar, regular

The following selected tables that backup does not working

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml tables
signoz_logs.logs                                              173.72GiB   default,s3  full
signoz_traces.signoz_index_v2                                 160.41GiB   default,s3  full
signoz_logs.logs_v2                                           65.64GiB    default     full
signoz_traces.durationSort                                    52.24GiB    default,s3  full
signoz_traces.signoz_spans                                    21.50GiB    default,s3  full
signoz_metrics.samples_v2                                     11.96GiB    default     full
signoz_metrics.samples_v4                                     10.01GiB    default,s3  full
signoz_metrics.samples_v4_agg_5m                              4.71GiB     default     full
signoz_metrics.samples_v4_agg_30m                             1.34GiB     default     full
signoz_metrics.time_series_v4                                 1.22GiB     default,s3  full
signoz_metrics.time_series_v4_6hrs                            1017.92MiB  default,s3  full
signoz_metrics.time_series_v4_1day                            889.61MiB   s3,default  full
signoz_metrics.time_series_v2                                 873.61MiB   default     full
signoz_metrics.time_series_v4_1week                           832.37MiB   default     full
signoz_traces.span_attributes                                 775.90MiB   default     full
signoz_logs.tag_attributes                                    692.19MiB   default     full
signoz_traces.dependency_graph_minutes_v2                     225.38MiB   s3,default  full
signoz_traces.dependency_graph_minutes                        140.90MiB   default     full
signoz_traces.signoz_error_index_v2                           113.24MiB   default,s3  full
signoz_logs.logs_v2_resource                                  8.15MiB     default     full
signoz_logs.distributed_logs                                  1.18MiB     default     full
signoz_logs.distributed_logs_v2                               691.83KiB   default     full
signoz_metrics.distributed_samples_v4                         526.60KiB   default     full
signoz_logs.distributed_tag_attributes                        264.70KiB   default     full
signoz_metrics.distributed_samples_v2                         257.77KiB   default     full
signoz_analytics.rule_state_history                           56.81KiB    default     full
signoz_traces.usage_explorer                                  55.57KiB    default,s3  full
signoz_logs.distributed_logs_v2_resource                      42.04KiB    default     full
signoz_metrics.distributed_time_series_v4                     17.99KiB    default     full
signoz_metrics.usage                                          12.06KiB    default     full
signoz_logs.usage                                             10.00KiB    default     full
signoz_traces.usage                                           9.23KiB     default     full
signoz_traces.top_level_operations                            7.23KiB     default     full
signoz_metrics.distributed_time_series_v2                     5.51KiB     default     full
signoz_traces.span_attributes_keys                            5.37KiB     default     full
signoz_logs.logs_resource_keys                                1.08KiB     default     full
signoz_traces.schema_migrations                               1.00KiB     default     full
signoz_logs.schema_migrations                                 719B        default     full
signoz_logs.logs_attribute_keys                               708B        default     full
signoz_metrics.schema_migrations                              598B        default     full
signoz_logs.resource_keys_string_final_mv                     0B          default     full
signoz_metrics.distributed_samples_v4_agg_30m                 0B          default     full
signoz_metrics.distributed_samples_v4_agg_5m                  0B          default     full
signoz_logs.distributed_usage                                 0B          default     full
signoz_metrics.distributed_time_series_v3                     0B          default     full
signoz_logs.distributed_logs_resource_keys                    0B          default     full
signoz_metrics.distributed_time_series_v4_1day                0B          default     full
signoz_metrics.distributed_time_series_v4_1week               0B          default     full
signoz_metrics.distributed_time_series_v4_6hrs                0B          default     full
signoz_metrics.distributed_usage                              0B          default     full
signoz_metrics.exp_hist                                       0B          default     full
signoz_logs.distributed_logs_attribute_keys                   0B          default     full
signoz_logs.attribute_keys_string_final_mv                    0B          default     full
signoz_logs.attribute_keys_float64_final_mv                   0B          default     full
signoz_metrics.samples_v4_agg_30m_mv                          0B          default     full
signoz_logs.attribute_keys_bool_final_mv                      0B          default     full
signoz_metrics.samples_v4_agg_5m_mv                           0B          default     full
signoz_analytics.distributed_rule_state_history               0B          default     full
signoz_metrics.time_series_v3                                 0B          default     full
signoz_metrics.time_series_v4_1day_mv                         0B          s3,default  full
signoz_metrics.time_series_v4_1week_mv                        0B          default     full
signoz_metrics.time_series_v4_6hrs_mv                         0B          s3,default  full
signoz_traces.dependency_graph_minutes_db_calls_mv            0B          default     full
signoz_traces.dependency_graph_minutes_db_calls_mv_v2         0B          default,s3  full
signoz_traces.dependency_graph_minutes_messaging_calls_mv     0B          default     full
signoz_traces.dependency_graph_minutes_messaging_calls_mv_v2  0B          default,s3  full
signoz_traces.dependency_graph_minutes_service_calls_mv       0B          default     full
signoz_traces.dependency_graph_minutes_service_calls_mv_v2    0B          s3,default  full
signoz_traces.distributed_dependency_graph_minutes            0B          default     full
signoz_traces.distributed_dependency_graph_minutes_v2         0B          default     full
signoz_traces.distributed_durationSort                        0B          default     full
signoz_traces.distributed_signoz_error_index_v2               0B          default     full
signoz_traces.distributed_signoz_index_v2                     0B          default     full
signoz_traces.distributed_signoz_spans                        0B          default     full
signoz_traces.distributed_span_attributes                     0B          default     full
signoz_traces.distributed_span_attributes_keys                0B          default     full
signoz_traces.distributed_top_level_operations                0B          default     full
signoz_traces.distributed_usage                               0B          default     full
signoz_traces.distributed_usage_explorer                      0B          default     full
signoz_traces.durationSortMV                                  0B          default,s3  full
signoz_traces.root_operations                                 0B          default     full
signoz_traces.signoz_error_index                              0B          default     full
signoz_traces.signoz_index                                    0B          default     full
signoz_traces.sub_root_operations                             0B          default     full
signoz_traces.usage_explorer_mv                               0B          default,s3  full
signoz_metrics.distributed_exp_hist                           0B          default     full

When I selected just signoz_metrics.samples_v4 that containts data on local disk and remote (s3) the backup was successfully.

  • I am running clickhouse-backup on same host server
  • For tests I set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env vars but not worked
  • I run clickhouse-backup --env S3_ACCESS_KEY=xxx and --env SECRET_KEY=xxx and not worked

Note: my IAM role has full access on s3

jasondavindev avatar Oct 10 '24 21:10 jasondavindev

Another test

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml tables
signoz_metrics.samples_v2  11.97GiB  default     full
signoz_metrics.samples_v4  10.01GiB  default,s3  full
chi-signoz-tools-cluster-clickhouse-cluster-0-0-0:~$ ./clickhouse-backup -c config.yml create_remote partial
2024-10-10 21:55:48.653 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-10 21:55:48.925 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-10 21:55:49.845 INF pkg/backup/create.go:324 > done progress=9/215 table=signoz_metrics.samples_v2
2024-10-10 21:55:50.179 INF pkg/backup/create.go:324 > done progress=10/215 table=signoz_metrics.samples_v4
2024-10-10 21:55:50.197 INF pkg/backup/create.go:336 > done duration=2.128s operation=createBackupLocal version=2.6.2
2024-10-10 21:57:27.083 INF pkg/backup/upload.go:171 > done duration=1m36.326s operation=upload_data progress=2/2 size=10.01GiB table=signoz_metrics.samples_v4 version=2.6.2
2024-10-10 21:57:36.590 INF pkg/backup/upload.go:171 > done duration=1m45.832s operation=upload_data progress=1/2 size=11.97GiB table=signoz_metrics.samples_v2 version=2.6.2
2024-10-10 21:57:36.632 INF pkg/backup/upload.go:240 > done backup=partial duration=1m46.434s object_disk_size=0B operation=upload upload_size=21.98GiB version=2.6.2
2024-10-10 21:57:37.056 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/partial'
2024-10-10 21:57:37.142 INF pkg/backup/delete.go:166 > done backup=partial duration=496ms location=local operation=delete

The previous warning is not shown (the following log is from my previous post)

2024-10-10 21:40:38.196 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3_backup doesn't contains <access_key_id> and <secret_access_key> environment variables will use
2024-10-10 21:40:38.200 WRN pkg/storage/object_disk/object_disk.go:361 > /var/lib/clickhouse/preprocessed_configs/config.xml -> //storage_configuration/disks/s3 doesn't contains <access_key_id> and <secret_access_key> environment variables will use

jasondavindev avatar Oct 10 '24 21:10 jasondavindev

Thanks for the detailed report

Did you setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY inside clickhouse-backup container?

try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY or --env AWS_ROLE_ARN

Could you share your current pod manifest with replace sensitive credentials to XXX? kubectl -n <your-namespace> chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 -o yaml

When you use IRSA, which serviceAccount do you use? In this case, serviceAccount mounts into pod and some environment variables injected into env section.

Slach avatar Oct 11 '24 05:10 Slach

path: "" object_disk_path: "backups/"

better to replace it

    path: "backups"
    object_disk_path: "object_disks_backups"

Slach avatar Oct 11 '24 05:10 Slach

Warning and error will show only if you have data parts in s3 disk

Slach avatar Oct 11 '24 05:10 Slach

related code fragment https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/object_disk/object_disk.go#L354-L367

Slach avatar Oct 11 '24 05:10 Slach

Did you setup AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY inside clickhouse-backup container?

I running clickhouse-backup bin inside clickhouse-server container. The used service account works for normal clickhouse-server workloads (s3 disk as cold storage) with s3 full access

try --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY or --env AWS_ROLE_ARN

I tried but it didnt work

path: "" object_disk_path: "backups/" better to replace it

I changed path but no changes in s3 structure, like the config was ignored

I using SigNoz helm chart with clickhouse dependency 3 shards and 1 replica per shard

Clickhouse pod generated manifest


apiVersion: v1
kind: Pod
metadata:
  annotations:
    signoz.io/path: /metrics
    signoz.io/port: "9363"
    signoz.io/scrape: "true"
  labels:
    app.kubernetes.io/component: clickhouse
    app.kubernetes.io/instance: signoz-tools-cluster
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: clickhouse
    app.kubernetes.io/version: 24.1.2
    apps.kubernetes.io/pod-index: "0"
    argocd.argoproj.io/instance: signoz-tools-cluster
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: signoz-tools-cluster-clickhouse
    clickhouse.altinity.com/cluster: cluster
    clickhouse.altinity.com/namespace: signoz
    clickhouse.altinity.com/ready: "yes"
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
    helm.sh/chart: clickhouse-24.1.6
    statefulset.kubernetes.io/pod-name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  name: chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  namespace: signoz
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: chi-signoz-tools-cluster-clickhouse-cluster-0-0
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/component
            operator: In
            values:
            - zookeeper
            - clickhouse
        topologyKey: kubernetes.io/hostname
  containers:
  - command:
    - /bin/bash
    - -c
    - /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: clickhouse
    ports:
    - containerPort: 8123
      name: http
      protocol: TCP
    - containerPort: 9000
      name: client
      protocol: TCP
    - containerPort: 9009
      name: interserver
      protocol: TCP
    - containerPort: 9000
      name: tcp
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "4"
        memory: 12Gi
      requests:
        cpu: "3"
        memory: 8Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clickhouse
      name: data-volumeclaim-template
    - mountPath: /var/lib/clickhouse/user_scripts
      name: shared-binary-volume
    - mountPath: /etc/clickhouse-server/functions
      name: custom-functions-volume
    - mountPath: /etc/clickhouse-server/config.d/
      name: chi-signoz-tools-cluster-clickhouse-common-configd
    - mountPath: /etc/clickhouse-server/users.d/
      name: chi-signoz-tools-cluster-clickhouse-common-usersd
    - mountPath: /etc/clickhouse-server/conf.d/
      name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-hn6tq
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  initContainers:
  - command:
    - sh
    - -c
    - |
      set -x
      wget -O /tmp/histogramQuantile https://github.com/SigNoz/signoz/raw/develop/deploy/docker/clickhouse-setup/user_scripts/histogramQuantile
      mv /tmp/histogramQuantile  /var/lib/clickhouse/user_scripts/histogramQuantile
      chmod +x /var/lib/clickhouse/user_scripts/histogramQuantile
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::xxxxxxxxxxx:role/ClickhouseEKSRole
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: docker.io/alpine:3.18.2
    imagePullPolicy: IfNotPresent
    name: signoz-tools-cluster-clickhouse-udf-init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clickhouse/user_scripts
      name: shared-binary-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-hn6tq
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  nodeSelector:
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/provisioner-name: observability-stack-provisioner
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsGroup: 101
    runAsUser: 101
  serviceAccount: signoz-tools-cluster-clickhouse
  serviceAccountName: signoz-tools-cluster-clickhouse
  subdomain: chi-signoz-tools-cluster-clickhouse-cluster-0-0
  terminationGracePeriodSeconds: 30
  tolerations:
  - key: ObservabilityStackOnly
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
  - name: data-volumeclaim-template
    persistentVolumeClaim:
      claimName: data-volumeclaim-template-chi-signoz-tools-cluster-clickhouse-cluster-0-0-0
  - emptyDir: {}
    name: shared-binary-volume
  - configMap:
      defaultMode: 420
      name: signoz-tools-cluster-clickhouse-custom-functions
    name: custom-functions-volume
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-common-configd
    name: chi-signoz-tools-cluster-clickhouse-common-configd
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-common-usersd
    name: chi-signoz-tools-cluster-clickhouse-common-usersd
  - configMap:
      defaultMode: 420
      name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
    name: chi-signoz-tools-cluster-clickhouse-deploy-confd-cluster-0-0
  - name: kube-api-access-hn6tq
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace 

jasondavindev avatar Oct 11 '24 14:10 jasondavindev

I changed /var/lib/clickhouse/preprocessed_configs/config.xml file adding aws credentials and the warning is not shown, but access denied error remains

2024-10-11 14:40:56.684 INF pkg/backup/create.go:170 > done createBackupRBAC size=0B
2024-10-11 14:40:56.735 WRN pkg/backup/backuper.go:118 > MAX_FILE_SIZE=1073741824 is less than actual 17035327904, please remove general->max_file_size section from your config
2024-10-11 14:41:14.253 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied table=signoz_logs.logs
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.durationSort
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_spans
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v2
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_5m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.samples_v4_agg_30m
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4
2024-10-11 14:41:14.254 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_6hrs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1day
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v4_1week
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.dependency_graph_minutes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_v2_resource
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v2
2024-10-11 14:41:14.255 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.255 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_tag_attributes
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_v2_resource
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_analytics.rule_state_history
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage_explorer
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v2
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.usage
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.top_level_operations
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.span_attributes_keys
2024-10-11 14:41:14.256 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_attribute_keys
2024-10-11 14:41:14.256 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.logs_resource_keys
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_logs.schema_migrations
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.schema_migrations
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_30m
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_samples_v4_agg_5m
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.resource_keys_string_final_mv
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v3
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_usage
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1day
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_1week
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_time_series_v4_6hrs
2024-10-11 14:41:14.257 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_usage
2024-10-11 14:41:14.257 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.exp_hist
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_resource_keys
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.distributed_logs_attribute_keys
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_string_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_30m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_float64_final_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.samples_v4_agg_5m_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_logs.attribute_keys_bool_final_mv
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_analytics.distributed_rule_state_history
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_metrics.time_series_v3
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1day_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_1week_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.time_series_v4_6hrs_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_db_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_messaging_calls_mv_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.dependency_graph_minutes_service_calls_mv_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_dependency_graph_minutes_v2
2024-10-11 14:41:14.258 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.258 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_durationSort
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_error_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_index_v2
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_signoz_spans
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_span_attributes_keys
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_top_level_operations
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.distributed_usage_explorer
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.durationSortMV
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_error_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:278 > b.AddTableToLocalBackup error: context canceled table=signoz_traces.signoz_index
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.sub_root_operations
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_traces.usage_explorer_mv
2024-10-11 14:41:14.259 WRN pkg/backup/create.go:741 > supports only schema backup backup=2024-10-11-remote2 engine=Distributed operation=create table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:296 > b.ch.GetInProgressMutations error: can't get in progress mutations: context canceled table=signoz_metrics.distributed_exp_hist
2024-10-11 14:41:14.259 ERR pkg/backup/create.go:139 > backup failed error: one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:185 > cleanBackupObjectDisks deleted 0 keys backup=2024-10-11-remote2 duration=35ms
2024-10-11 14:41:14.525 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3_backup/backup/2024-10-11-remote2'
2024-10-11 14:41:14.613 INF pkg/backup/delete.go:157 > remove '/var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2'
2024-10-11 14:41:14.618 INF pkg/backup/delete.go:166 > done backup=2024-10-11-remote2 duration=359ms location=local operation=delete
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/shadow
2024-10-11 14:41:14.733 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3_backup/shadow
2024-10-11 14:41:14.741 INF pkg/backup/delete.go:43 > /var/lib/clickhouse/disks/s3/shadow
2024-10-11 14:41:14.741 FTL cmd/clickhouse-backup/main.go:658 > error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/ftx/jovjgrbdopnfqtkvwcgomhssdxifi -> my-bucket/backups/2024-10-11-remote2/s3/ftx/jovjgrbdopnfqtkvwcgomhssdxifi return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: 1K78RWBZEA6DMSVK, HostID: MkVUQCZEHvUFrbZAMUM+gn5mZMFuw8tHNmfLmJRMSv256nJiUKzfsiglbhhtgkzKq+bWMqqmPfs=, api error AccessDenied: Access Denied"

Note: IAM Role has s3 full access and the AWS credentials is for my aws user (admin access)

jasondavindev avatar Oct 11 '24 14:10 jasondavindev

See the code block here

Is the srcBucket variable empty?

Compare the output log

error="one of createBackupLocal go-routine return error: one of uploadObjectDiskParts go-routine return error: b.dst.CopyObject in /var/lib/clickhouse/disks/s3/backup/2024-10-11-remote2/shadow/signoz_logs/logs/s3 error: S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk return error: operation error S3: CopyObject, https response error StatusCode: 403, RequestID: AS1YYQCZF4KJ8QZY, HostID: /8ntBN2alKtBkXTy9YcODvCAnEb/bDf8KbJH1mOL0OlTJwChCkH3bysFHih4k9x+cVqKOST3Pd0=, api error AccessDenied: Access Denied"
S3->CopyObject data2/rky/guvjhazneieklouevfhijqiaduqlk -> my-bucket/backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk
               /\
               ||

The log shown only key but not the source bucket

jasondavindev avatar Oct 11 '24 15:10 jasondavindev

The s3 logs

2024-10-11 15:16:27.034 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
GET /?versioning= HTTP/1.1
Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 338f4e3a-00cd-4f38-a096-ea36114e0b97
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/xxxxxxxxxxxxx-cold-storage-tools/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Date: 20241011T151627Z


2024-10-11 15:16:27.051 INF pkg/storage/s3.go:49 > [s3:DEBUG] request failed with unretryable error https response error StatusCode: 0, RequestID: , HostID: , request send failed, Get "https://data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com/?versioning=": dial tcp: lookup data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com on 10.205.0.10:53: no such host
2024-10-11 15:16:27.071 INF pkg/storage/s3.go:49 > [s3:DEBUG] Request
PUT /backups/2024-10-11-remote2/s3/rky/guvjhazneieklouevfhijqiaduqlk?x-id=CopyObject HTTP/1.1
Host: xxxxxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com
User-Agent: m/F aws-sdk-go-v2/1.30.5 os/linux lang/go#1.22.7 md/GOOS#linux md/GOARCH#arm64 api/s3#1.61.2
Content-Length: 0
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 3736572a-701b-4537-978c-7d8d3b1d54e5
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=**********/20241011/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-security-token;x-amz-storage-class, Signature=xxxxxxxxxxx
X-Amz-Content-Sha256: xxxxxxxxxxx
X-Amz-Copy-Source: data2/rky/guvjhazneieklouevfhijqiaduqlk
X-Amz-Date: 20241011T151627Z
X-Amz-Security-Token: xxxxxxxxxxx
X-Amz-Storage-Class: STANDARD

The bucket key path (data2/) is inside s3 host Host: data2.s3.xxxxxxxxxxxxx-cold-storage-tools.amazonaws.com Is correct??

xxxxxxxxxxxxx-cold-storage-tools is the s3 disk set in config.xml

jasondavindev avatar Oct 11 '24 15:10 jasondavindev

The s3 endpoint string format was wrong.

I changed https://xxxxxx-storage-test-tools.s3.amazonaws.com/data/ to https://xxxxxx-storage-test-tools.s3.us-east-1.amazonaws.com/data/ and work

jasondavindev avatar Oct 11 '24 16:10 jasondavindev

Is image: xxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/docker-hub/clickhouse/clickhouse-server:24.1.2-alpine contains clickhouse-backup binary?

Slach avatar Oct 11 '24 18:10 Slach

No. I installed clickhouse-backup bin manually in the container

jasondavindev avatar Oct 11 '24 18:10 jasondavindev

Unfortunatelly, https://github.com/SigNoz/charts/blob/main/charts/clickhouse/templates/clickhouse-instance/clickhouse-instance.yaml#L202 doesn't allow run second container with clickhouse-backup

in this case i would like to propose use standard BACKUP and RESTORE sql commands which available with modern clickhouse-server version

look details in https://clickhouse.com/docs/en/operations/backup

you can just create kind: CronJob which will just execute something like clickhouse-client -h chi...-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)" and for restore kind: Job which will just execute something like clickhouse-client -h chi...-0-0-0 --user ... --password ... -q "BACKUP ALL ON CLUSTER '{cluster}' TO S3(...)"

Slach avatar Oct 11 '24 19:10 Slach

I think to fork the chart and customize to provide side cars containers to clickhouse-server.

About embbeded backup suggestions, I tried but the backup fails for clustered workload and I use clickhouse-backup for this.

Another option is run a cronjob that connects to clickhouse-server pod through kubectl command and runs backup in it.

jasondavindev avatar Oct 11 '24 19:10 jasondavindev

which failure do you have with BACKUP ALL ON CLUSTER ?

did you check SELECT * FROM system.backup_log?

Slach avatar Oct 11 '24 19:10 Slach

When backing up using ON CLUSTER flag I must do many parts synchronization and we do not have deep knowledge about this. We are new clickhouse users and we are learning about it. clickhouse-backup has deep managing features and I prefer it.

Before clickhouse BACKUP/RESTORE features, we used velero. But on recovery steps, we have too many parts and other errors to handle.

jasondavindev avatar Oct 11 '24 19:10 jasondavindev

I think, too many parts is not related to used backup tool ;) this is usually related to wrong INSERT pattern and insert rows batch size which produces a lot of small data parts

BACKUP ALL .. ON CLUSTER should work very well in clickhouse-server:24.8

on cluster means upload parts to s3 will just spread between replicas inside shard not so much parts sync as you think

Slach avatar Oct 11 '24 19:10 Slach

Example error

Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D2%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 has part 20240927_1_1_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum '5d2c4cb2a3959b040da2e13c398090fb' on replica chi-signoz-tools-cluster-clickhouse-cluster-0-0 != checksum 'a234ebfd4d43dbb6639eccbb5e286882' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)

When we changed from 1 shard to 2 shards, the organic replication was used and no manually steps was did. I dont know if later steps are necessary.

jasondavindev avatar Oct 11 '24 19:10 jasondavindev

has part 20240927_1_1_0 different from the part on replica When we changed from 1 shard to 2 shards, the organic replication was used

Hm, could you share

SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'

To fix your issue, i would like propse to run kubectl exec chi-signoz-tools-cluster-clickhouse-cluster-0-0 -- clickhouse-client --receive-timeout=86400 -q "OPTIMIZE TABLE signoz_logs.logs_v2 PARTITION 20240927 FINAL"

and try BACKUP again

Slach avatar Oct 12 '24 03:10 Slach

I ran the OPTIMIZE TABLE command for above partition and for each BACKUP execution I needed run the OPTIMIZE TABLE command for that partition. Finally, I ran for all partitions (no PARTITION arg in OPTIMIZE command), but still mismatch part error is shown.

Curious is that for some OPTIMIZE executions some errors were shown:

Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [chi-signoz-tools-cluster-clickhouse-cluster-2-0:9000]: Code: 53. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH) (version 24.1.2.5 (official build)). (TYPE_MISMATCH)
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: Type mismatch in IN or VALUES section. Expected: Date. Got: Float64. (TYPE_MISMATCH)

Hm, could you share

SELECT hostName(), engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

jasondavindev avatar Oct 14 '24 15:10 jasondavindev

Did you receive errors above when executing BACKUP command or something else? Could you share full stacktrace in this case?

Moreover, les's compare uuid

SELECT hostName(), uuid, engine_full FROM cluster('all-sharded',system.tables) WHERE database='signoz_logs' AND table='logs_v2'

Slach avatar Oct 14 '24 15:10 Slach

upgrade your clickhouse-server version to 24.8

Slach avatar Oct 14 '24 15:10 Slach

SELECT
    hostName(),
    uuid,
    engine_full
FROM cluster('all-sharded', system.tables)
WHERE (database = 'signoz_logs') AND (table = 'logs_v2')

┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-0-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-2-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─hostName()────────────────────────────────────────┬─uuid─────────────────────────────────┬─engine_full────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ chi-signoz-tools-cluster-clickhouse-cluster-1-0-0 │ c111787f-3753-4163-936e-89c8ffca0867 │ ReplicatedMergeTree('/clickhouse/tables/c111787f-3753-4163-936e-89c8ffca0867/{shard}', '{replica}') PARTITION BY toDate(timestamp / 1000000000) ORDER BY (ts_bucket_start, resource_fingerprint, severity_text, timestamp, id) TTL toDateTime(timestamp / 1000000000) + toIntervalSecond(1296000) SETTINGS ttl_only_drop_parts = 1, index_granularity = 8192 │
└───────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

yes, the above shown error came from BACKUP command

chi-signoz-tools-cluster-clickhouse-cluster-0-0-0.chi-signoz-tools-cluster-clickhouse-cluster-0-0.signoz.svc.cluster.local :) BACKUP ALL ON CLUSTER 'cluster' TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')

BACKUP ALL ON CLUSTER cluster TO S3('https://xxxxxxxxxx-backup-tools.s3.us-east-1.amazonaws.com/EMBED_BACKUP/')

Query id: 8bce120e-15cc-4051-ae34-21c8fe3adf6a


Elapsed: 7.454 sec. 

Received exception from server (version 24.1.2):
Code: 647. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dsignoz%2Dtools%2Dcluster%2Dclickhouse%2Dcluster%2D1%2D0:9000. DB::Exception: Table signoz_logs.logs_v2 on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 has part 20240929_2_2_0 different from the part on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0 (checksum 'b87066065558b8e0f1790072f9d48853' on replica chi-signoz-tools-cluster-clickhouse-cluster-1-0 != checksum '80a4583914d9af71921f01fa326978ab' on replica chi-signoz-tools-cluster-clickhouse-cluster-2-0). (CANNOT_BACKUP_TABLE)

upgrade your clickhouse-server version to 24.8

What is the motivation for?

jasondavindev avatar Oct 14 '24 16:10 jasondavindev

ok. uuid the same, so replication works

let's check how many parts have the same name but different hashes

SELECT groupArray(h) AS all_hosts, name, database, table, groupArray(hash_of_all_files) AS all_hashes FROM (
 SELECT hostName() h, name, database, table, hash_of_all_files FROM cluster('all-sharded',system.parts) WHERE engine ILIKE '%Replicated%'`
)
GROUP BY name, database, table
HAVING length(all_hashes) > 1 

Slach avatar Oct 17 '24 06:10 Slach

upgrade your clickhouse-server version to 24.8 What is the motivation for?

this is LTS release, hope it have more stable implementation for BACKUP

moreover, let's apply OPTIMIZE TABLE signoz_logs.logs_v2 ON CLUSTER '{cluster}' FINAL ? did you check mutation finished successful via SELECT hostName(), * FROM cluster('all-sharded',system.mutations) WHERE query ILIKE '%OPTIMIZE%FINAL%' FORMAT Vertical

Slach avatar Oct 17 '24 06:10 Slach

I'll close the issue because the initial problem was solved.

I using the clickhouse-backup instead of embeded backup.

jasondavindev avatar Oct 22 '24 13:10 jasondavindev