s3fs icon indicating copy to clipboard operation
s3fs copied to clipboard

Does s3fs supports unix like os.path operations?

Open makquel opened this issue 4 years ago • 3 comments

I've tested a snippet to read a set of Avro partitioned files and in order to check for folder types and perform path operations I'm using methods such as isdir() and join(). I've been reading through s3fs docs and couldn't find whether these methods are s3 file system compatible. Is there any method for this purpose? The snippet below summarizes what I'm doing mostly on my local file system.

    try:
        folder_list = list(
            filter(
                lambda folder: os.path.isdir(
                    os.path.join(dump_avro_path, folder)
                ),  # no
                os.listdir(dump_avro_path),
            )
        )
    except FileNotFoundError:
        logger.debug(f"Avro path {dump_avro_path} does not exist")
        folder_list = list()
    finally:
        folder_list_ = [
            folder
            for folder in folder_list
            if folder >= sorted_folder_list[index] and folder <= end_epoch  # no
        ]
    avro_folder_list = [
        (os.path.join(dump_avro_path, folder) + "/*.avro") for folder in folder_list_
    ]

makquel avatar Jul 21 '21 12:07 makquel

Yes, you have methods isfile, isdir, exists. The other method you'll need is ls (which is also aliased as listdir). However there is no join - just use "/".join([...]).

Note that on s3, there are no real directories, only implied ones when a key exists with the given prefix. i.e., creating "mybucket/path/file" implicitly means the existence of "mybucket/path".

martindurant avatar Jul 21 '21 13:07 martindurant

@martindurant, Does, somehow, s3fs refers to a path as a dictionary? I made the following snippet to try to create a list of sub-folders inside a bucket path:

   listdir_fn = s3.listdir if dump_avro_path.startswith("s3://") else os.listdir 
   isdir_fn = s3.isdir if dump_avro_path.startswith("s3://") else os.path.isdir
   folder_list = list(
       filter(
                    lambda folder: isdir_fn("/".join([dump_avro_path, folder])),
                    listdir_fn(dump_avro_path),
       )
    )

This works well on my local file system, however, once I run it on my s3 bucket I get the following error:

TypeError: sequence item 1: expected str instance, dict found

makquel avatar Aug 04 '21 11:08 makquel

I'm not sure which line is causing your error, but you should be aware that listdir/ls returns a list of dicts by default, and you need detail=False to get a list of path strings.

Also, you should know that s3 does not really have directories; we emulate them, following the s3 REST API, but the act of creating a directory is actually a no-op, as you can always write to any arbitrary key in a bucket.

martindurant avatar Aug 04 '21 12:08 martindurant