ipyparallel icon indicating copy to clipboard operation
ipyparallel copied to clipboard

ERROR | DB Error updating record '9472f58e-6f89-46a8-be69-10e09da7e2e1'

Open parashardhapola opened this issue 9 years ago • 11 comments

Hi,

I get following error when I try to run a simple test job:

Traceback (most recent call last):
File "/home/parashar/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/hub.py", line 687, in save_queue_result
self.db.update_record(msg_id, result)
File "/home/parashar/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/dictdb.py", line 232, in update_record
raise KeyError("Record %r has been culled for size" % msg_id)
KeyError: "Record '9472f58e-6f89-46a8-be69-10e09da7e2e1' has been culled for size"

I'm ran following code:

from ipyparallel import Client

def get_square(num):
    return num**2

_RC = Client()
_DVIEW = _RC[:]    
ar = _DVIEW.map_async(get_square, range(100000000))
ar.wait_interactive()

I have run 30 engines on two different hosts by running ipcluster engines -n 30 and have run ipcontroller --ip="*" on he host running jupyter notebook wait_interactive output hangs at 59/60.

Please check if this error can be reproduced.

parashardhapola avatar Oct 25 '16 12:10 parashardhapola

Thanks, I'll investigate.

minrk avatar Oct 31 '16 13:10 minrk

similar issue here

I am on ver 5.2 with anaconda installation

per
    return fn(*args, **kwargs)
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 325, in <lambda>
    lambda : self.handle_stranded_tasks(uid),
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 335, in handle_stranded_tasks
    for msg_id in lost.keys():
RuntimeError: dictionary changed size during iteration
2017-02-25 17:08:58.400 [IPControllerApp] task::task 'e7647038-edff-4814-8939-84afced09336' finished on 7
2017-02-25 17:08:58.401 [IPControllerApp] ERROR | DB Error saving task request 'e7647038-edff-4814-8939-84afced09336'
Traceback (most recent call last):
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/hub.py", line 794, in save_task_result
    self.db.update_record(msg_id, result)
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/dictdb.py", line 232, in update_record
    raise KeyError("Record %r has been culled for size" % msg_id)
KeyError: "Record 'e7647038-edff-4814-8939-84afced09336' has been culled for size"
2017-02-25 17:08:58.402 [IPControllerApp] task::task 'fce1ddc0-c360-43eb-902b-0477bd259dba' finished on 8
2017-02-25 17:08:58.402 [IPControllerApp] ERROR | DB Error saving task request 'fce1ddc0-c360-43eb-902b-0477bd259dba'
Traceback (most recent call last):
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/hub.py", line 794, in save_task_result
    self.db.update_record(msg_id, result)
  File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/dictdb.py", line 232, in update_record

controller runs on Linux while clients run on a variety of linux/windows machines.

littlegreenbean33 avatar Feb 25 '17 16:02 littlegreenbean33

Hi. Any updates on this issue? I'm having the same problem sometimes.

jayzed82 avatar May 30 '17 09:05 jayzed82

@jayzed82

My issues were my fault. I was sending more than 1024 tasks in parallel. You need to manually change the limit if you want to go beyond that limit

Have you checked if you try to fill the queue with more than 1024 tasks ?

littlegreenbean33 avatar May 30 '17 16:05 littlegreenbean33

Thank you @littlegreenbean33. That is my problem, I have a queue longer than 1024 task. I didn't know there was a limit. How do you increase it?

jayzed82 avatar May 30 '17 20:05 jayzed82

Look for 1024 or the text in the error report in the code. You will find informative comments as well inside. There was some balance to achieve with regards to memory usage and 1024 probably sound like a good number.

littlegreenbean33 avatar May 30 '17 20:05 littlegreenbean33

@littlegreenbean33 I'm not quite sure what you mean. Can you point us more specifically?

And what does actually happen when we encounter these ERROR | DB Error saving task request messages? Are the computation results going to be faulty, and hence useless? Then why isn't a warning or something more visible shown on client-side, i.e., in IPython/Jupyter? Or can that "error" just be ignored as the hub handles it somehow magically?

kostrykin avatar Aug 02 '17 09:08 kostrykin

if your task queue grows above 1024 bad things happen. Don't ignore the error. It means tasks won't be performed.

littlegreenbean33 avatar Aug 03 '17 19:08 littlegreenbean33

So does this actually mean that IPyParallel cannot have more than 1024 tasks queued? Then, why there is no error, or at least a warning? If you run ipcluster in --daemon mode, you won't even be able to notice that! And how can we lift that limit?

I've just run a quick test to see what happens if I submit more than 1024 tasks. In this test, I only have a single engine, hence the task queue should be about 2047 tasks in size, before the first task is finished:

import ipyparallel as ipp
import numpy as np

ipp_client = ipp.Client()
ipp_client[:].use_dill().get()

def f(ms):
    def _f(x):
        if ms > 0:
            import time
            time.sleep(ms * 1e-3)
        return x * 2
    return _f

data   = range(2048)
result = ipp_client[:].map(f(100), data).get()
print(np.allclose(result, map(f(0), data)))

This works like a charm. How does that match with your statement, that the task queue cannot grow beyond the size of 1024 tasks? @littlegreenbean33

kostrykin avatar Aug 03 '17 20:08 kostrykin

It means tasks won't be performed.

It does not mean that. This error does not affect execution or results during normal execution. The only thing affected is the result cache in the Hub, which can be used for delayed retrieval by id. If you are not using delayed retrieval (client.get_result(msg_ids) instead of asyncresult.get()), there should be no user-visible effect.

The default cache of results in the Hub is an in-memory DictDB, with a few limits. You can increase those limits, or tell the controller to use sqlite or mongodb to store these things out of memory. If you aren't using delayed retrieval at all, you can use NoDB to disable result caching entirely.

minrk avatar Aug 04 '17 11:08 minrk

Thanks a lot for that clarification @minrk

kostrykin avatar Aug 04 '17 13:08 kostrykin