CloudFile download memory error on large files
Problem:
I'm trying using pyrax to download a 250 MB file from Rackspace Cloud Files and I'm encountering a memory error as shown below:
Traceback (most recent call last):
File "test.py", line 11, in <module>
container.download_object('mycode.tar.gz',dest)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 542, in download_object structure=structure)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 534, in download
return self.object_manager.download(obj, directory, structure=structure)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 138, in wrapped
return fnc(self, obj, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 2064, in download
content = self.fetch(obj)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 138, in wrapped
return fnc(self, obj, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 1971, in fetch
resp, resp_body = self.api.method_get(uri, headers=headers)
File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 251, in method_get
return self._api_request(uri, "GET", **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 232, in _api_request
resp, body = self._time_request(safe_uri, method, **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 194, in _time_request
resp, body = self.request(uri, method, **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 185, in request
resp, body = pyrax.http.request(method, uri, *args, **kwargs)
File "/usr/lib/python2.6/site-packages/pyrax/http.py", line 66, in request
body = resp.json()
File "/usr/lib/python2.6/site-packages/requests/models.py", line 763, in json
return json.loads(self.text, **kwargs)
File "/usr/lib/python2.6/site-packages/requests/models.py", line 726, in text
encoding = self.apparent_encoding
File "/usr/lib/python2.6/site-packages/requests/models.py", line 611, in apparent_encoding
return chardet.detect(self.content)['encoding']
File "/usr/lib/python2.6/site-packages/requests/packages/chardet/__init__.py", line 30, in detect
u.feed(aBuf)
File "/usr/lib/python2.6/site-packages/requests/packages/chardet/universaldetector.py", line 128, in feed
if prober.feed(aBuf) == constants.eFoundIt:
File "/usr/lib/python2.6/site-packages/requests/packages/chardet/charsetgroupprober.py", line 64, in feed
st = prober.feed(aBuf)
File "/usr/lib/python2.6/site-packages/requests/packages/chardet/sjisprober.py", line 54, in feed
for i in range(0, aLen):
MemoryError
Specifications:
I'm running pyrax from a Vagrant box running Centos 6.5
The VM has 589MB of RAM
Can verify the issue:
Downloading file from rackspace...
Traceback (most recent call last):
File "(omitted)", line 425, in
This does not happen on small files (memory fitting).
download_object seems to write the received files exclusively into memory before writing them ti disk. Cloudfiles is useless to me if I need to store downloaded file in main memory for download. It should write them in chunks to disk.
Using Python 2.7 and pyrax (1.9.2).
@benkantor @jynus It may be a little late for this suggestion but I ran into a similar memory issues when directly calling python request.get() to download files from AWS S3. The solution was to allocate swap space to the instance.
Object's return a fetch method you can use to avoid this situation, e.g.
obj = container_primary.get_object(version.location['object'])
with open(path, 'wb') as fp:
fetcher = obj.fetch(chunk_size=262144000) # 256mb chunks
while True:
try:
chunk = next(fetcher)
except StopIteration:
break
fp.write(chunk)