appengine-mapreduce icon indicating copy to clipboard operation
appengine-mapreduce copied to clipboard

RequestTooLargeError when using a lot of shards

Open waleedka opened this issue 10 years ago • 3 comments

I created a mapreduce job with 2048 shards (I needed it for a very large update job). I didn't get any warning or error that the number of shards is too high. The code tried to create the mapper but it failed with the error below.

After this error, the mapreduce is stuck in an error state: it's listed in the /mapreduce/status page as "running", but I can't "Abort" it or clean it up.

E 2015-08-27 23:35:40.070  500      4 KB  1.06 s I 23:35:39.012 E 23:35:40.067 /mapreduce/kickoffjob_callback/1573912547002E1E3DD63
  0.1.0.2 - - [27/Aug/2015:23:35:40 -0700] "POST /mapreduce/kickoffjob_callback/1573912547002E1E3DD63 HTTP/1.1" 500 4094 "http://live.symphonytools.appspot.com/mapreduce/pipeline/run" "AppEngine-Google; (+http://code.google.com/appengine)" "live.symphonytools.appspot.com" ms=1062 cpu_ms=1063 cpm_usd=0.000458 queue_name=default task_name=59300224872921797641 instance=00c61b117cc0391b13d22845bf6ae422d8f6c9ca app_engine_release=1.9.25
    I 23:35:39.012 Processing kickoff for job 1573912547002E1E3DD63
    E 23:35:40.067 The request to API call datastore_v3.Put() was too large.
      Traceback (most recent call last):
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
          rv = self.handle_exception(request, response, e)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
          rv = self.router.dispatch(request, response)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
          return route.handler_adapter(request, response)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
          return handler.dispatch()
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
          return self.handle_exception(e, self.app.debug)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
          return method(*args, **kwargs)
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/base_handler.py", line 135, in post
          self.handle()
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1385, in handle
          result = self._save_states(state, serialized_readers_entity)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2732, in inner_wrapper
          return RunInTransactionOptions(options, func, *args, **kwds)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2630, in RunInTransactionOptions
          ok, result = _DoOneTry(function, args, kwargs)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2650, in _DoOneTry
          result = function(*args, **kwargs)
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1493, in _save_states
          db.put([state, serialized_readers_entity], config=config)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1576, in put
          return put_async(models, **kwargs).get_result()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 929, in get_result
          result = rpc.get_result()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
          return self.__get_result_hook(self)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1881, in __put_hook
          self.check_rpc_success(rpc)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
          rpc.check_success()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
          self.__rpc.CheckSuccess()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
          raise self.exception
      RequestTooLargeError: The request to API call datastore_v3.Put() was too large.

waleedka avatar Aug 30 '15 00:08 waleedka

Short of clearing the task queue that was used for this MR job (or deleting the specific tasks) I am not sure I can give you better advice.

aozarov avatar Sep 15 '15 22:09 aozarov

We could cap the number of shards to prevent this sort of error. I have run 1024 successfully. In truth though, adding more shards once there are already that many cease to provide a performance boost due to the added overhead of managing them.

tkaitchuck avatar Oct 02 '15 19:10 tkaitchuck

@tkaitchuck That statement seems entirely dependent on the amount of work being performed for each datum.

I can confirm large 1,000+ shard jobs running, but @tkaitchuck is very right, that the added overhead will slow you down for most jobs.

soundofjw avatar Oct 05 '15 18:10 soundofjw