lparallel KILL-TASKS and nested tasks

When KILL-TASKS is used with nested pmap calls, it sometimes kills unrelated tasks. It is not unexpected, because KILL-TASKS destroys threads, but in some cases it is inconvenient. What are the reasons for using DESTROY-THREAD in KILL-TASKS? It probably would be possible to use INTERRUPT-THREAD to abort a task without killing its thread. Here is an example that show the problem: The following code does not signal any errors

(handler-case 
    (let ((*task-category* :my-tasks))
      (pmapcar (lambda (i) 
                 (if (= i 0)
                     (progn
                       (sleep 0.1)
                       (kill-tasks :my-tasks))
                     (sleep 10))) :parts 2 '(0 1)))
  (task-killed-error ()
    nil))

but if it is wrapped with another pmapcar, it signals TASK-KILLED-ERROR

(pmapcar (lambda (x) 
           (handler-case 
               (let ((*task-category* :my-tasks))
                 (pmapcar (lambda (i) 
                            (if (= i 0)
                                (progn
                                  (sleep 0.1)
                                  (kill-tasks :my-tasks))
                                (sleep 10))) :parts 2 '(0 1)))
             (task-killed-error ()
               nil)))
         '(0))

May 10 '17 00:05 i-perminov

Interesting, all tests pass after replacing in kill.lisp

(destroy-thread (thread worker))

with

(bt:interrupt-thread
 (thread worker)
 (lambda ()
   (invoke-restart 'transfer-error (make-condition 'task-killed-error))))

For each lisp implementation I would have to check that interrupt-thread respects unwind-protect whenever destroy-thread does. I expect that's true, and if so, this looks like a definite improvement with no drawbacks that I can see. My concern, though, would be that some people may be depending upon the destroy-thread behavior.

I realize your example is just for demonstration purposes, but using pmapcar inside a worker thread is wasteful because it ties up that worker thread while it waits for other worker threads. One improvement I've been contemplating is making pmap et al aware of when they are being called inside a worker, as functions defined with defpun do.

May 10 '17 12:05 lmj

I do not think pmapcar inside a worker wastes a thread, your nice work stealing logic seems to prevent the problem.

May 10 '17 17:05 i-perminov

Ha, you're right: there it is in kernel-util.lisp: steal-until-receive-result. I completely forgot that I fixed that. There's even lparallel-test::cognate-steal-test. Thanks for the info!

May 10 '17 18:05 lmj

kill-tasks only kills active tasks, is there a good way to cancel queued tasks? For example, if a pmap produces more tasks than there are available workers, kill-tasks is not sufficient to abort the pmap cleanly. Is there a way to do anything better than

(defun abortable-pmapcar (func &rest rest)
  (let ((*task-category* (cons nil nil))
        (aborted nil))
    (unwind-protect-case ()
        (apply #'pmapcar (lambda (&rest args)
                           (if aborted
                               (invoke-transfer-error (make-condition 'task-killed-error))
                               (apply func args)))
               rest)
      (:abort
       (setf aborted t)
       (kill-tasks *task-category*)))))

May 10 '17 22:05 i-perminov

It's come up before, and I have a couple private branches which have defmacro with-tasks-flushed-on-abort (&key kill &body body). The first method of doing this is more or less what you have above, wrapping submitted tasks with some logic.

The second method flushes the internal lockless queues (called spin queues in lparallel), which also lets us write defun flush-tasks (task-category &key kill). However since the moment-to-moment state of spin queues can be quite volatile, flushing tasks this way could unexpectedly end up reordering the non-flushed tasks. (Enqueue and dequeue are the only operations you have; you can't lock the queue, replace its contents, then unlock it. So you have to remove the tasks one by one, then enqueue the unflushed tasks, during which time the queue may have changed. Tasks aren't lost, just possibly reordered.)

May 11 '17 13:05 lmj

Thanks for the explanation. The second method depends a lot on internals of lparallel, so I will use the first one for now. Did you consider adding an extension point to allow users to put arbitrary wrappers around tasks? E.g. each time a task lambda is created it is passed to a function specified in a variable (that users can set), and the function result is used instead of the task lambda. A version of with-kill-on-about that also cancels queued tasks would look like

(defmacro with-kill-on-abort (&body body)
  `(let ((aborted nil)
         (*wrap-task-func* (lambda (thunk)
                             (lambda ()
                               (if aborted
                                   (invoke-transfer-error (make-condition 'task-killed-error))
                                   (funcall thunk)))))
         ...)
     (unwind-protect-case ()
         (progn ,@body)
       (:abort
        (setf aborted t)
        ...))))

I am aware of 3 use cases where such an extension point would be useful: the one discussed above, passing values of special variables from a caller of pmap to tasks, and pmap/wait (we discussed it in issue 18).

May 11 '17 20:05 i-perminov

Yes, successively wrapping lambdas around tasks is the natural way to do this. There is already one such lambda wrapping in lparallel, namely the rebinding of the handlers established with task-handler-bind. In my kill-tasks-on-abort branch I do something similar to what you describe, but I hadn't considered adding the dynamic variable to the API. I think it's a good idea, thanks! Like the :context hook in make-kernel it would be an "advanced use" feature -- the user is screwed if the parameter isn't funcalled. Though I suppose the hook could be used to ignore any submitted tasks as well, or (deepening the level of trickery) execute something else.

May 12 '17 12:05 lmj