pyfilesystem
pyfilesystem copied to clipboard
Support extra parameters for s3fs / Boto / S3 Connection
boto.s3.connection.Connection() supports bunch of extra options, including:
* anon=True for read-only connection to public S3 shares
* proxy, proxy_port etc for HTTP(S) proxy options
* custom hosts, connection classes and whatnot
The attached patch simplifies fs.s3fs code to pass in all extra keyword
arguments to boto.s3.connection.Connection(). It also allows connecting to
public S3 shares with "anonymous" account (ie. no AWS credentials required).
This might need a bit more: tests, doc fixes etc, but the basic idea works and
is already in use in our internal project (using listdir, listdirinfo, open,
...).
Note: it _might_ be possible to also pass in all these options through a Boto
config file, but that is not feasible in our use-cases (many dynamic
end-points).
Original issue reported on code.google.com by [email protected] on 28 Oct 2014 at 2:12
Attachments:
My quick 2 cents... (take with a pinch of salt as I've never used s3fs or Boto)
By changing the prototype of the __init__ function, you've instantly broken
backwards-compatibility for people who were passing in positional arguments
rather than keyword arguments :-/
IMHO assuming *all* the kwargs are supposed to be passed to Boto feels a little
'greedy' and could cause problems with future extensions to the pyfs interface?
I wonder if you could add a prefix to all the boto-specific kwargs, and then
use this to filter out only the kwargs you need? (obviously removing the prefix
before passing them on to boto). Kinda in the same spirit as
http://docs.pyfilesystem.org/en/latest/implementersguide.html#meta-values ?
Original comment by [email protected] on 28 Oct 2014 at 2:45
- Changed title: Support extra parameters for s3fs / Boto / S3 Connection
- Added labels: Type-Enhancement
- Removed labels: Type-Defect
Yeah, changing the signature is a no go. I would suggest adding a 'boto_args'
keyword argument which should be a dictionary of these extra arguments.
Something like this:
def __init__(self, bucket, prefix="", aws_access_key=None, aws_secret_key=None, separator="/", thread_synchronize=True, key_sync_timeout=1, boto_args=None):
That should keep it backwards compatible.
There is also the possibility of alternative constructors. There could be a
S3FS.create_anonymous for example.
Original comment by willmcgugan on 28 Oct 2014 at 3:03
Also, one way would be to make it easier to override connection creation method
and maybe some other parts too. Now sub-classing the S3FS is a bit too hard.
Hmm, I first checked that boto/s3 connection keyword arguments for access key
and secret key were the same, but they actually are different names.
Anyway, this wasn't meant to be a full patch (I bet it also breaks unit tests),
just submitted what now is good enough for me.
Right now boto.s3.connection.Connection __init__ looks like this:
def __init__(self, aws_access_key_id=None, aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None,
host=NoHostProvided, debug=0, https_connection_factory=None,
calling_format=DefaultCallingFormat, path='/',
provider='aws', bucket_class=Bucket, security_token=None,
suppress_consec_slashes=True, anon=False,
validate_certs=None, profile_name=None):
https://github.com/boto/boto/blob/develop/boto/s3/connection.py#L167
Original comment by [email protected] on 28 Oct 2014 at 5:46
And here's a link to Boto config tutorial:
http://boto.readthedocs.org/en/latest/boto_config_tut.html
Maybe s3fs opener should support some of this stuff? At least passing in
"profile name", which would make it possible to add global or per-user boto
config file and have the settings there. So, profile name could be one part of
s3fs URL (username, parameter, anchor...?)
Original comment by [email protected] on 28 Oct 2014 at 5:55
Ended up creating AnonS3FS sub-class, looking something like this:
class AnonS3FS(S3FS):
def __init__(self, *args, **kwargs):
self.boto_kwargs = {'anon': True}
# need to fake these to be non-None, because S3FS checks them
kwargs['aws_access_key'] = 'bogus'
kwargs['aws_secret_key'] = 'bogus'
super(AnonS3FS, self).__init__(*args, **kwargs)
# copied from S3FS, changed to use **self.boto_kwargs, not *self._access_keys
def _s3conn(self):
try:
(c, ctime) = self._tlocal.s3conn
if time.time() - ctime > 60:
raise AttributeError
return c
except AttributeError:
c = boto.s3.connection.S3Connection(**self.boto_kwargs)
self._tlocal.s3conn = (c, time.time())
return c
_s3conn = property(_s3conn)
+ copied opener code to handle our local "anons3fs://" URLs. Works for me, and
this way it's also possible to set any extra options.
For the actual use-case, I ended up moving some variables to Boto config file
(required anyway for setting HTTP timeout to something sane).
I'm not 100% happy with this solution either...
Original comment by [email protected] on 6 Nov 2014 at 10:05