please icon indicating copy to clipboard operation
please copied to clipboard

Remote file templated url

Open nmpowell opened this issue 3 years ago • 7 comments

See #2462. Allow remote_file rules to specify a template a Cache URL which can be prioritised over the url passed to the remote_file rule itself.

In a similar way to python_wheel rules specifying a custom wheel naming scheme, here custom cache name scheme(s) can be defined. This should allow escrow backups of remote files in e.g. a cloud storage bucket, or as local files with e.g. cacheurl = file:///var/tmp/escrow/local_storage.

Would use a name_scheme like this in .plzconfig:

[remote_file]
prioritisecache = true
cacheurl = https://some-cloud-provider.com/some-bucket-name
cachenamescheme = {url_base}/{cache_path}/{file_name}

Also add a base64url function.

By default base64url is used to generate the cache_path in the above name scheme using the dirname of a URL (i.e. omitting the file name). Thus for two remote files with the same dirname http://google.com/file.a: {cache_url}/aHR0cDovL2dvb2dsZS5jb20/file.a http://google.com/file.b: {cache_url}/aHR0cDovL2dvb2dsZS5jb20/file.b

nmpowell avatar May 26 '22 08:05 nmpowell

I'm not sure whether I've done everything necessary to set up the [remote_file] stuff in .plzconfig.

nmpowell avatar May 26 '22 09:05 nmpowell

remote_file supports multiple URLs passed to it, which are tried in sequence. That's the underlying mechanism used to back python_wheel. To me that seems sufficient; you simply put your cache URL in as the first entry and it'll be tried before the others, but it won't fail if it doesn't have the thing in question.

peterebden avatar May 26 '22 09:05 peterebden

To me that seems sufficient; you simply put your cache URL in as the first entry and it'll be tried before the others

We want to be able to download from our bucket for every remote file, for instance in case the external resource gets moved or deleted, we'll fall back to the cache. It would be arduous to define the cache url for every remote_file target.

nmpowell avatar May 26 '22 09:05 nmpowell

... Understand the concern though; I'll have a think about how to make this a bit more elegant ...

nmpowell avatar May 26 '22 13:05 nmpowell

remote_file supports multiple URLs passed to it, which are tried in sequence. That's the underlying mechanism used to back python_wheel. To me that seems sufficient; you simply put your cache URL in as the first entry and it'll be tried before the others, but it won't fail if it doesn't have the thing in question.

As Nick said, this is mostly about reducing the effort needed to effectively add an extra src for every remote_file - is there a better way to achieve that than what Nick is proposing here?

chrisnovakovic avatar May 30 '22 09:05 chrisnovakovic

You could define your own build_def that wraps remote file and adds the cache url first to the underlying remote file?

RichardoC avatar Jun 07 '22 14:06 RichardoC

You could define your own build_def that wraps remote file and adds the cache url first to the underlying remote file?

Yeah I think I prefer this idea. This seems a little too magical for a low level rule like remote file.

Tatskaari avatar Jun 10 '22 10:06 Tatskaari

This issue has been automatically marked as stale because it has not had any recent activity in the past 90 days. It will be closed if no further activity occurs. If you require additional support, please reply to this message. Thank you for your contributions.

stale[bot] avatar Sep 08 '22 23:09 stale[bot]