adlfs icon indicating copy to clipboard operation
adlfs copied to clipboard

`fs.info()` and `fs.ls(detail=True)` return different etag formats

Open pmrowla opened this issue 1 year ago • 0 comments

I'm not sure this is something that needs to be addressed in adlfs, but I think it's worth noting that etags returned by fs.info() and fs.ls(detail=True) have different formats. When using fs.info() the etag will be quoted (wrapped in " quotes) but when using fs.ls() the etag will not be quoted. This means that the caller has to correct the quote format before being able to compare etags to see whether a file has been modified.

With adlfs 2024.2.0:

>>> from adlfs import AzureBlobFileSystem
>>> fs = AzureBlobFileSystem()
>>> fs.info("az://test-deletion/2024.2.0/foo", refresh=True)["etag"]
'"0x8DC2DFC9E520378"'
>>> [(f["name"], f["etag"]) for f in fs.ls("az://test-deletion/2024.2.0/", detail=True, refresh=True)]
[('test-deletion/2024.2.0/foo', '0x8DC2DFC9E520378')]

I did some investigation and it looks like this is due to differences in what the azure API returns for different calls. When using BlobClient.get_blob_properties() the returned etag property is wrapped in " quotes. However, when using BlobContainer.walk_blobs() the etag property for iterated blob properties is not wrapped in quotes.

pmrowla avatar Feb 15 '24 08:02 pmrowla