Generate Presigned URLs
Overview
It's nice to be able to generate presigned URLs for download/upload so you don't need to worry about passing around auth credentials. It's often a little non-obvious how to do these with the downstream clients and each client handles the operation pretty differently so this feels like something that could fit nicely into cloudpathlib.
Proposed Interface/ implementation
I'd imagine the code for this looking something like the below implementation. The S3Path implementation works and is tested, the others (GSPath/AzureBlobPath) aren't actually tested but are about what the code would look like.
from datetime import datetime, timedelta
class CloudPath:
def generate_presigned_url(self, expire_seconds: int = 300):
raise NotImplementedError
class S3Path(CloudPath):
def generate_presigned_url(self, expire_seconds: int = 300):
object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
url = self.client.client.generate_presigned_url(
"get_object",
Params={"Bucket": self.bucket, "Key": object_key},
ExpiresIn=expire_seconds,
)
class GSPath(CloudPath):
def generate_presigned_url(self, expire_seconds: int = 300):
object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
gs_client = path.client.client
creds = gs_client.credentials
gs_bucket = path.client.client.get_bucket(path.bucket)
gs_blob = gs_bucket.blob(object_key)
url = gs_blob.generate_signed_url(
version="v4",
expiration=timedelta(seconds=expire_seconds),
service_account_email=creds.service_account_email,
access_token=creds.token,
method="GET"
)
return url
from azure.storage.blob import ResourceTypes, BlobSasPermissions, generate_blob_sas
class AzureBlobPath(CloudPath):
def generate_presigned_url(self, expire_seconds: int = 300):
object_key = "/".join(list(self.parts)[2:]) # Everything after the bucket name
az_client = self.client.client
sas_token = generate_blob_sas(
az_client.account_name,
container_name=self.container,
blob_name=object_key,
account_key=az_client.credential.account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(seconds=expire_seconds)
)
url = f"https://{az_client.account_name}.blob.core.windows.net/{self.container}/{object_key}?{sas_token}"
return url
The above implementation only handles downloads. Uploading via presigned urls/tokens is a bit weird across the different clouds but still doable. Happy to research that step more if it's of interest.
Happy to contribute the implementation, just want to make sure it's on track with the goals of the project.
Yeah, this is awesome.
Curious for your take on a couple related design decisions:
- Does it make sense to have a generic
urlfunction that can get a presigned but also can get the vanilla if you're in an auth'd context? (I think #21 was trying to get at that) - Does it make sense to at the same time we're researching these URL formats, implement something like
CloudPath.from_urlconstructors (or just support it in__init__that can parse out the region/container, etc. and return the cloud path? This came up in #157. Doing that might be sufficient to support the upload scenario where you can do something likeCloudPath.from_url(presigned_url).upload_from(my_local_path)
- The
as_urlmethod described in that issue sounds good to me. I think in order to do either you'll need to be an authed context so just an explicitpresignbool parameter would work well.
S3Path("s3://bucket/file.txt").as_url()
# > https://bucket.amazons3.com/file.txt
S3Path("s3://bucket/file.txt").as_url(presign=True, expire_seconds=100)
# > https://bucket.amazons3.com/file.txt?presign=accesstokenwithexpiry
- The issue with the from_url is S3 e.g. returns a multi-part request back when you want to presign an upload.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#generating-a-presigned-url-to-upload-a-file
Also, I think it'd need to be an explicit
upload_filenotupload_fromfor S3 as well since you need to pass a Bucket and Key as a part of the signature.
I'm sure there's a way to design a good API for this, but maybe we take this in 2 steps and start with generating presigned urls in the as_url method?
Great, let's start with the as_url (maybe we just call it url?).
On the implementation, we like to isolate all communication with the provider to the *Client classes themselves, so let's implement the core logic on those objects and then just have a thin wrapper on the Path classes themselves.
One other small note is that instead of object_key = "/".join(list(self.parts)[2:]), you can get keys either with self.key which most cloud providers have or generically with self._no_prefix_no_drive (which, IIRC, may contain a leading slash depending on providers).
Nice! That object_key thing I was doing was obviously bad, I just hadn't looked through the source code too much.
Started a Draft PR with the majority of the implementation in the respective Client classes.