Consistent handling of paths containing protocols
For my usecase I generally have a full path including the protocol like abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-within-container>.
I've found that different methods of AzureBlobFileSystem seem to handle the protocol information in different ways. Some examples:
-
ls: Doesn't work when provided a full path including the protocol. -
glob: Strips the protocol information then returns paths of the form<container-name>/<path-within-container>. Personally I would find it a lot more helpful if it returned paths using the same format that I provided which included the protocol. -
rm: Works as I would expect. It strips off the protocol information to operate but it doesn't return a path so it doesn't have the same issue thatglobhas.
The reason I like to use fully qualified paths with the protocol information is that it allows interacting with a local file system or blob storage with exactly the same code. The only thing I need to change is the path that I provide.
I will probably implement some kind of wrapper around AzureBlobFileSystem as a workaround myself but personally I think it would be best to resolve this at source.
My opinion on how it should ideally work:
- All methods should accept full paths with the protocol information.
- Methods with return paths e.g.
globandlsshould return the same format that was received.