nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

feat: remove storage account name if present from azure path

Open endre-seqera opened this issue 2 years ago • 5 comments

Closes: #4683

Remove storage account name if present from azure path

Tests

Tested using nextflow-publishdir pipeline.

Before:

nextflow run https://github.com/endre-seqera/nextflow-publishdir -r main --outdir "az://nfazurestore.eend-test" #FAILS

ERROR ~ Error executing process > 'TEST_PUBLISH_DIR (3)'

Caused by:
  /nfazurestore.eend-test: Unable to determine if root directory exists


 -- Check '.nextflow.log' file for details

After:

 ./launch.sh run -c ~/azure-nextflow.config https://github.com/endre-seqera/nextflow-publishdir -r main --outdir "az://nfazurestore.eend-test"
[00/6cfd76] process > TEST_PUBLISH_DIR (3) [100%] 3 of 3 ✔
Screenshot 2024-01-26 at 14 32 03

endre-seqera avatar Jan 26 '24 16:01 endre-seqera

Deploy Preview for nextflow-docs-staging canceled.

Name Link
Latest commit 4704d9579a5aed209f992de43844366d333b338a
Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/65bcbd679c18c90007d0dc7a

netlify[bot] avatar Jan 26 '24 16:01 netlify[bot]

Provided this may work, i'm bit concerned altering the container aka bucket name under the hood. This could result having nextflow reporting a different object file path compared to the one specified by the user, in the trace files, report, and other provenance record.

It may be worth to add full support for it

pditommaso avatar Jan 29 '24 11:01 pditommaso

Provided this may work, i'm bit concerned altering the container aka bucket name under the hood. This could result having nextflow reporting a different object file path compared to the one specified by the user, in the trace files, report, and other provenance record.

I understand your concern, but the bucket (container) name is not altered, only the additional information (storage account name) is removed. So object file path will still be correct, just missing this extra information.

But this extra information is redundant in a way, because it is already known to the user, since the user has to configure it with their azure credentials in the nextflow.config, so the user should not doubt which storage account is being used:

 azure {
   storage {
     accountName = '<YOUR STORAGE ACCOUNT NAME>'
   }
 }

The "convention" used in SeqeraPlatform "${providerSchema}://${storageName}.${bucket}" is arbitrary, it is just to have a unique path and differentiate between potentially duplicated bucket (container) names in different storage accounts (like in Data Explorer, where results from multiple credentials can show up).

But no one is expecting azure paths in this format, and for a given, specific nextflow configuration (or for one specific credential) this information (storage account name) is not necessary.

endre-seqera avatar Jan 29 '24 12:01 endre-seqera