pulsar-helm-chart icon indicating copy to clipboard operation
pulsar-helm-chart copied to clipboard

Please add support for Tiered Storage in the Helm Chart

Open frankjkelly opened this issue 5 years ago • 6 comments

Is your feature request related to a problem? Please describe. We'd love to see easier integration with Pulsar Tiered Storage including S3 bucket etc. and AWS credentials

Describe the solution you'd like Please provide standard template / values for S3 integration (with appropriate AWS credentials / STS role)

Describe alternatives you've considered From Addison in Slack https://apache-pulsar.slack.com/archives/CJ0FMGHSM/p1602085159071600

To enable it with the official charts you can just set the following properties:
    # enable tiered storage
    managedLedgerOffloadDriver: aws-s3
    s3ManagedLedgerOffloadBucket: <bucket name>
    s3ManagedLedgerOffloadRegion: <region>
    managedLedgerOffloadAutoTriggerSizeThresholdBytes: "262144000"

Additional context It appears that Kafkaesque Helm chart currently supports it https://github.com/kafkaesque-io/pulsar-helm-chart#tiered-storage but we'd rather be on the "official" variant

frankjkelly avatar Oct 07 '20 15:10 frankjkelly

As a comment, for GCS it did not work just adding

broker: configData: managedLedgerOffloadDriver: google-cloud-storage gcsManagedLedgerOffloadBucket: pulsar-tiered gcsManagedLedgerOffloadRegion: us-east1 managedLedgerOffloadAutoTriggerSizeThresholdBytes: "262144000"

because it was lacking the service account key file, which needs to be mounted. I added by hand in the statefulset of the broker the following volumes and volumeMounts and worked adding a field:

gcsManagedLedgerOffloadServiceAccountKeyFile: "/pulsar/gcp-service-account/{{ .Values.broker.configData.gcsServiceAccountJsonFile }}"
+++ b/charts/pulsar/templates/broker-statefulset.yaml
@@ -225,7 +225,19 @@ spec:
           {{- end }}
           {{- end }}
           {{- include "pulsar.broker.certs.volumeMounts" . | nindent 10 }}
+          {{- if .Values.broker.configData.managedLedgerOffloadDriver }}
+          {{- if eq .Values.broker.configData.managedLedgerOffloadDriver "google-cloud-storage" }}
+          - name: gcp-service-account
+            readOnly: true
+            mountPath: /pulsar/gcp-service-account
+          {{- end }}
+          {{- end }}
       volumes:
+      {{- if eq .Values.broker.configData.managedLedgerOffloadDriver "google-cloud-storage" }}
+      - name: gcp-service-account
+        secret:
+          secretName: {{ .Values.broker.configData.gcsServiceAccountSecret }}
+      {{- end }}
       {{- if .Values.auth.authentication.enabled }}
       {{- if eq .Values.auth.authentication.provider "jwt" }}
       - name: token-keys

axel-sirota avatar Nov 16 '20 13:11 axel-sirota

As a comment, for GCS it did not work just adding

broker: configData: managedLedgerOffloadDriver: google-cloud-storage gcsManagedLedgerOffloadBucket: pulsar-tiered gcsManagedLedgerOffloadRegion: us-east1 managedLedgerOffloadAutoTriggerSizeThresholdBytes: "262144000"

because it was lacking the service account key file, which needs to be mounted. I added by hand in the statefulset of the broker the following volumes and volumeMounts and worked adding a field:

gcsManagedLedgerOffloadServiceAccountKeyFile: "/pulsar/gcp-service-account/{{ .Values.broker.configData.gcsServiceAccountJsonFile }}"
+++ b/charts/pulsar/templates/broker-statefulset.yaml
@@ -225,7 +225,19 @@ spec:
           {{- end }}
           {{- end }}
           {{- include "pulsar.broker.certs.volumeMounts" . | nindent 10 }}
+          {{- if .Values.broker.configData.managedLedgerOffloadDriver }}
+          {{- if eq .Values.broker.configData.managedLedgerOffloadDriver "google-cloud-storage" }}
+          - name: gcp-service-account
+            readOnly: true
+            mountPath: /pulsar/gcp-service-account
+          {{- end }}
+          {{- end }}
       volumes:
+      {{- if eq .Values.broker.configData.managedLedgerOffloadDriver "google-cloud-storage" }}
+      - name: gcp-service-account
+        secret:
+          secretName: {{ .Values.broker.configData.gcsServiceAccountSecret }}
+      {{- end }}
       {{- if .Values.auth.authentication.enabled }}
       {{- if eq .Values.auth.authentication.provider "jwt" }}
       - name: token-keys

is it possible to know how you applied the new b/charts/pulsar/templates/broker-statefulset.yaml? did you use helm upgrade?

DonghunLouisLee avatar Nov 17 '20 07:11 DonghunLouisLee

Nono, when you change the charts you cannot use the same chart, so I repackaged locally and applied those! Not as fancy haha Ideally these should go into the official charts! If I find time I will submit a PR. I think is following this path but adding AWS and Azure? @sijie ?

axel-sirota avatar Nov 17 '20 16:11 axel-sirota

Nono, when you change the charts you cannot use the same chart, so I repackaged locally and applied those! Not as fancy haha Ideally these should go into the official charts! If I find time I will submit a PR. I think is following this path but adding AWS and Azure? @sijie ?

Thanks, that's what i thought too.

as far as i know, tiered storage support for azure will be available in pulsar 2.7.0 so i guess you could first submit a PR for gcp and aws? although aws works fine without other configurations. Cheers

DonghunLouisLee avatar Nov 18 '20 02:11 DonghunLouisLee

i am trying to get GCS tiered storage to work with the pulsar helm charts 2.7.1. the above pointers are too difficult for me. can you help me answer a couple of questions i have

  • where to put the keyfile so it get's mounted properly?
  • do i need to set broker.configData.gcsServiceAccountSecret and broker.configData.gcsServiceAccountJsonFile in values.yaml as well?
  • how to package and install the helm with these changes?

i think i got the last question answered. but my statefulset failes with the following error

create Pod production-pulsar-broker-0 in StatefulSet production-pulsar-broker failed error: Pod "production-pulsar-broker-0" is invalid: [spec.volumes[0].secret.secretName: Required value, spec.containers[0].volumeMounts[0].name: Not found: "gcp-service-account"]

thanks in advance for the help!

truthtrap avatar May 11 '21 19:05 truthtrap

well, i got it to work. the broker.configData in values.yaml part looks like this

# tiered storage to gcs managedLedgerOffloadDriver: google-cloud-storage gcsManagedLedgerOffloadBucket: pulsar gcsManagedLedgerOffloadRegion: europe-west1 managedLedgerOffloadAutoTriggerSizeThresholdBytes: "262144000" gcsServiceAccountSecret: "pulsar-broker-service-account" gcsServiceAccountJsonFile: "serviceaccount.json" gcsManagedLedgerOffloadServiceAccountKeyFile: "/pulsar/gcp-service-account/serviceaccount.json"

for this to work with the diff above you need create the service account json as per the documentation. add the resulting json file with the credentials (serviceaccount.json) as a secret to your k8s cluster (make sure to add it to the right namespace)

$ kubectl -n pulsar create secret generic pulsar-broker-service-account --from-file=serviceaccount.json

in case of an existing release, upgrade your helm deployment and roll over your broker statefulset

$ helm upgrade <your-release-name> charts/pulsar $ kubectl -n pulsar rollout restart statefulset production-pulsar-broker

truthtrap avatar May 15 '21 08:05 truthtrap