cobrix icon indicating copy to clipboard operation
cobrix copied to clipboard

Transparent PGP Encryption

Open mark-weghorst opened this issue 10 months ago • 1 comments

Background

When we transfer our datasets outbound from z/OS, our information security policies require that the datasets first be PGP encrypted prior to transfer.

When the datasets arrive in our large data platform, we first decrypt them and then run our Spark job. It would be a more secure solution if Cobrix were able to decrypt the data in the byte stream as opposed to decrypting the file in-situ and then running the Spark job against decrypted data

Feature

Add support to enable Cobrix to read a PGP encyrpted dataset when provided with a valid encryption key.

Ideally this feature should not allow the key to be read from a filesystem, or contained in code and only support key storage in a secure key vault.

As for the key vaults that should be supported, I would suggest the following support list which would cover all of the most commonly used commercial solutions

  • Amazon Web Services Key Manager
  • Azure Key Vault
  • Google Cloud Platform Secret. Manager (my own needs)
  • Hashicorp Vault

mark-weghorst avatar Apr 02 '25 21:04 mark-weghorst

Hi @mark-weghorst ,

This is an interesting request. A couple of questions.

  1. is there a Spark source that supports this? This is so we could look at the implementation and do something similar potentially.

  2. Usually, secrets are managed externally to Spark, and provided as options. Something like:

    spark.read.option("secret", "xyz").parquet("s3:/bucket/bath")
    

    (more a suggestion than a question)

yruslan avatar Apr 10 '25 13:04 yruslan