databricks-sdk-py icon indicating copy to clipboard operation
databricks-sdk-py copied to clipboard

[ISSUE] dbutils.fs.ls has different behaviour than Databricks clusters

Open william-conti opened this issue 1 year ago • 3 comments

Description When running ws.dbutils.fs.ls the output is different than when the same command runs in a Notebook on a Databricks cluster.

Reproduction Example: ws.dbutils.fs.ls -> FileInfo(path='dbfs:/mnt/things/a/b/0x7l', name='0x7l', size=0, modificationTime=xxx) FileInfo(path='dbfs:/mnt/things/a/b/dadf', name='dadf', size=0, modificationTime=xxx) FileInfo(path='dbfs:/mnt/things/a/b/xxx.csv', name='xxx.csv', size=0, modificationTime=xxx)

On a Notebook in a Databricks cluster: dbutils.fs.ls -> FileInfo(path='dbfs:/mnt/things/a/b/0x7l/', name='0x7l/', size=0, modificationTime=xxx) FileInfo(path='dbfs:/mnt/things/a/b/dadf/', name='dadf/', size=0, modificationTime=xxx) FileInfo(path='dbfs:/mnt/things/a/b/xxx.csv', name='xxx.csv', size=68, modificationTime=xxx)

Because of this behaviour, it's unconvenient to know if a given path is a folder or not, in addition to the lack of isDir() utility method.

william-conti avatar Apr 18 '24 11:04 william-conti

~~I wonder if this is an issue with the API not the python sdk...~~

Nevermind. I see the problem.

cezhunter avatar Apr 18 '24 18:04 cezhunter

The maintainers would need to update https://github.com/databricks/databricks-sdk-py/blob/c2367af3bf4d5d3636aab8ea8672e48d5539d36b/databricks/sdk/dbutils.py#L16 by adding the is_dir field in the named tuple and respectively update https://github.com/databricks/databricks-sdk-py/blob/c2367af3bf4d5d3636aab8ea8672e48d5539d36b/databricks/sdk/dbutils.py#L57 by passing f.is_dir

cezhunter avatar Apr 18 '24 19:04 cezhunter