pyrax icon indicating copy to clipboard operation
pyrax copied to clipboard

CloudFile download memory error on large files

Open benkantor opened this issue 11 years ago • 3 comments

Problem:

I'm trying using pyrax to download a 250 MB file from Rackspace Cloud Files and I'm encountering a memory error as shown below:

Traceback (most recent call last):
  File "test.py", line 11, in <module>
    container.download_object('mycode.tar.gz',dest)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 542, in download_object structure=structure)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 534, in download
    return self.object_manager.download(obj, directory, structure=structure)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 138, in wrapped
    return fnc(self, obj, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 2064, in download
    content = self.fetch(obj)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 138, in wrapped
    return fnc(self, obj, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/object_storage.py", line 1971, in fetch
    resp, resp_body = self.api.method_get(uri, headers=headers)
  File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 251, in method_get
    return self._api_request(uri, "GET", **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 232, in _api_request
    resp, body = self._time_request(safe_uri, method, **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 194, in _time_request
    resp, body = self.request(uri, method, **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/client.py", line 185, in request
    resp, body = pyrax.http.request(method, uri, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pyrax/http.py", line 66, in request
    body = resp.json()
  File "/usr/lib/python2.6/site-packages/requests/models.py", line 763, in json
    return json.loads(self.text, **kwargs)
  File "/usr/lib/python2.6/site-packages/requests/models.py", line 726, in text
    encoding = self.apparent_encoding
  File "/usr/lib/python2.6/site-packages/requests/models.py", line 611, in apparent_encoding
    return chardet.detect(self.content)['encoding']
  File "/usr/lib/python2.6/site-packages/requests/packages/chardet/__init__.py", line 30, in detect
    u.feed(aBuf)
  File "/usr/lib/python2.6/site-packages/requests/packages/chardet/universaldetector.py", line 128, in feed
    if prober.feed(aBuf) == constants.eFoundIt:
  File "/usr/lib/python2.6/site-packages/requests/packages/chardet/charsetgroupprober.py", line 64, in feed
    st = prober.feed(aBuf)
  File "/usr/lib/python2.6/site-packages/requests/packages/chardet/sjisprober.py", line 54, in feed
    for i in range(0, aLen):
 MemoryError
Specifications:

I'm running pyrax from a Vagrant box running Centos 6.5

The VM has 589MB of RAM

benkantor avatar Aug 11 '14 19:08 benkantor

Can verify the issue:

Downloading file from rackspace... Traceback (most recent call last): File "(omitted)", line 425, in backup_restore(options) File "(omitted)", line 167, in backup_restore rackspace_download(target_dir, options) File "(omitted)", line 368, in rackspace_download container.download_object(last_backup.name, target_dir, structure=False) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 542, in download_object structure=structure) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 534, in download return self.object_manager.download(obj, directory, structure=structure) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 138, in wrapped return fnc(self, obj, _args, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 2074, in download content = self.fetch(obj) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 138, in wrapped return fnc(self, obj, _args, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/object_storage.py", line 1980, in fetch raw_content=True) File "/usr/local/lib/python2.7/dist-packages/pyrax/client.py", line 251, in method_get return self._api_request(uri, "GET", *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/client.py", line 232, in _api_request resp, body = self._time_request(safe_uri, method, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/client.py", line 194, in _time_request resp, body = self.request(uri, method, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/client.py", line 185, in request resp, body = pyrax.http.request(method, uri, *args, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyrax/http.py", line 65, in request resp = req_method(uri, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 59, in get return request('get', url, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 48, in request return session.request(method=method, url=url, *_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 451, in request resp = self.send(prep, *_send_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 594, in send r.content File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 707, in content self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes() MemoryError

This does not happen on small files (memory fitting).

download_object seems to write the received files exclusively into memory before writing them ti disk. Cloudfiles is useless to me if I need to store downloaded file in main memory for download. It should write them in chunks to disk.

Using Python 2.7 and pyrax (1.9.2).

jynus avatar Oct 21 '14 11:10 jynus

@benkantor @jynus It may be a little late for this suggestion but I ran into a similar memory issues when directly calling python request.get() to download files from AWS S3. The solution was to allocate swap space to the instance.

barsteadr avatar Nov 19 '14 18:11 barsteadr

Object's return a fetch method you can use to avoid this situation, e.g.

obj = container_primary.get_object(version.location['object'])
with open(path, 'wb') as fp:
    fetcher = obj.fetch(chunk_size=262144000)  # 256mb chunks
    while True:
        try:
            chunk = next(fetcher)
        except StopIteration:
            break
        fp.write(chunk)

icereval avatar Apr 02 '16 16:04 icereval