cloudpathlib icon indicating copy to clipboard operation
cloudpathlib copied to clipboard

Generate Presigned URLs

Open kabirkhan opened this issue 3 years ago • 4 comments

Overview

It's nice to be able to generate presigned URLs for download/upload so you don't need to worry about passing around auth credentials. It's often a little non-obvious how to do these with the downstream clients and each client handles the operation pretty differently so this feels like something that could fit nicely into cloudpathlib.

Proposed Interface/ implementation

I'd imagine the code for this looking something like the below implementation. The S3Path implementation works and is tested, the others (GSPath/AzureBlobPath) aren't actually tested but are about what the code would look like.


from datetime import datetime, timedelta

class CloudPath:
    def generate_presigned_url(self, expire_seconds: int = 300):
        raise NotImplementedError


class S3Path(CloudPath):
    def generate_presigned_url(self, expire_seconds: int = 300):
        object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
        url = self.client.client.generate_presigned_url(
            "get_object",
            Params={"Bucket": self.bucket, "Key": object_key},
            ExpiresIn=expire_seconds,
        )


class GSPath(CloudPath):
    def generate_presigned_url(self, expire_seconds: int = 300):
        object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
        gs_client = path.client.client
        creds = gs_client.credentials
        gs_bucket = path.client.client.get_bucket(path.bucket)
        gs_blob = gs_bucket.blob(object_key)
        url = gs_blob.generate_signed_url(
            version="v4",
            expiration=timedelta(seconds=expire_seconds),
            service_account_email=creds.service_account_email,
            access_token=creds.token,
            method="GET"
        )
        return url



from azure.storage.blob import ResourceTypes, BlobSasPermissions, generate_blob_sas


class AzureBlobPath(CloudPath):
    def generate_presigned_url(self, expire_seconds: int = 300):
        object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
        az_client = self.client.client
        sas_token = generate_blob_sas(
            az_client.account_name,
            container_name=self.container,
            blob_name=object_key,
            account_key=az_client.credential.account_key,
            permission=BlobSasPermissions(read=True),
            expiry=datetime.utcnow() + timedelta(seconds=expire_seconds)
        )
        url = f"https://{az_client.account_name}.blob.core.windows.net/{self.container}/{object_key}?{sas_token}"
        return url

The above implementation only handles downloads. Uploading via presigned urls/tokens is a bit weird across the different clouds but still doable. Happy to research that step more if it's of interest.

Happy to contribute the implementation, just want to make sure it's on track with the goals of the project.

kabirkhan avatar Jun 02 '22 20:06 kabirkhan

Yeah, this is awesome.

Curious for your take on a couple related design decisions:

  • Does it make sense to have a generic url function that can get a presigned but also can get the vanilla if you're in an auth'd context? (I think #21 was trying to get at that)
  • Does it make sense to at the same time we're researching these URL formats, implement something like CloudPath.from_url constructors (or just support it in __init__ that can parse out the region/container, etc. and return the cloud path? This came up in #157. Doing that might be sufficient to support the upload scenario where you can do something like CloudPath.from_url(presigned_url).upload_from(my_local_path)

pjbull avatar Jun 02 '22 21:06 pjbull

  1. The as_url method described in that issue sounds good to me. I think in order to do either you'll need to be an authed context so just an explicit presign bool parameter would work well.
S3Path("s3://bucket/file.txt").as_url()
# > https://bucket.amazons3.com/file.txt 

S3Path("s3://bucket/file.txt").as_url(presign=True, expire_seconds=100)
# > https://bucket.amazons3.com/file.txt?presign=accesstokenwithexpiry
  1. The issue with the from_url is S3 e.g. returns a multi-part request back when you want to presign an upload. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#generating-a-presigned-url-to-upload-a-file Also, I think it'd need to be an explicit upload_file not upload_from for S3 as well since you need to pass a Bucket and Key as a part of the signature.

I'm sure there's a way to design a good API for this, but maybe we take this in 2 steps and start with generating presigned urls in the as_url method?

kabirkhan avatar Jun 02 '22 23:06 kabirkhan

Great, let's start with the as_url (maybe we just call it url?).

On the implementation, we like to isolate all communication with the provider to the *Client classes themselves, so let's implement the core logic on those objects and then just have a thin wrapper on the Path classes themselves.

One other small note is that instead of object_key = "/".join(list(self.parts)[2:]), you can get keys either with self.key which most cloud providers have or generically with self._no_prefix_no_drive (which, IIRC, may contain a leading slash depending on providers).

pjbull avatar Jun 03 '22 00:06 pjbull

Nice! That object_key thing I was doing was obviously bad, I just hadn't looked through the source code too much. Started a Draft PR with the majority of the implementation in the respective Client classes.

kabirkhan avatar Jun 03 '22 01:06 kabirkhan