feat: remove storage account name if present from azure path
Closes: #4683
Remove storage account name if present from azure path
Tests
Tested using nextflow-publishdir pipeline.
Before:
nextflow run https://github.com/endre-seqera/nextflow-publishdir -r main --outdir "az://nfazurestore.eend-test" #FAILS
ERROR ~ Error executing process > 'TEST_PUBLISH_DIR (3)'
Caused by:
/nfazurestore.eend-test: Unable to determine if root directory exists
-- Check '.nextflow.log' file for details
After:
./launch.sh run -c ~/azure-nextflow.config https://github.com/endre-seqera/nextflow-publishdir -r main --outdir "az://nfazurestore.eend-test"
[00/6cfd76] process > TEST_PUBLISH_DIR (3) [100%] 3 of 3 ✔
Deploy Preview for nextflow-docs-staging canceled.
| Name | Link |
|---|---|
| Latest commit | 4704d9579a5aed209f992de43844366d333b338a |
| Latest deploy log | https://app.netlify.com/sites/nextflow-docs-staging/deploys/65bcbd679c18c90007d0dc7a |
Provided this may work, i'm bit concerned altering the container aka bucket name under the hood. This could result having nextflow reporting a different object file path compared to the one specified by the user, in the trace files, report, and other provenance record.
It may be worth to add full support for it
Provided this may work, i'm bit concerned altering the container aka bucket name under the hood. This could result having nextflow reporting a different object file path compared to the one specified by the user, in the trace files, report, and other provenance record.
I understand your concern, but the bucket (container) name is not altered, only the additional information (storage account name) is removed. So object file path will still be correct, just missing this extra information.
But this extra information is redundant in a way, because it is already known to the user, since the user has to configure it with their azure credentials in the nextflow.config, so the user should not doubt which storage account is being used:
azure {
storage {
accountName = '<YOUR STORAGE ACCOUNT NAME>'
}
}
The "convention" used in SeqeraPlatform "${providerSchema}://${storageName}.${bucket}" is arbitrary, it is just to have a unique path and differentiate between potentially duplicated bucket (container) names in different storage accounts (like in Data Explorer, where results from multiple credentials can show up).
But no one is expecting azure paths in this format, and for a given, specific nextflow configuration (or for one specific credential) this information (storage account name) is not necessary.