async-pool icon indicating copy to clipboard operation
async-pool copied to clipboard

Consume input lazily OR allow "querying" of TaskGroup

Open saurabhnanda opened this issue 6 years ago • 4 comments

While this library helps in ensuring that only a limited/pre-defined number of actions are evaluated in parallel, it still has one problem (especially with very large input data-sets). If the input data-set has N=2,000,000, this is going to create 2,000,000 asyncs, although 99% of them might not be getting concurrently evaluated. This still results in linear memory growth.

Even the most "lazy" function I could find, i.e. scatterFoldMapM, is only lazy wrt the output (i.e. it doesn't try to collect ALL the output). However, if I'm not mistaken, even this function will create all async immediately, even if it is not possible to run them concurrently.

Therefore, the title of this issue. I believe this can be handled in two possible ways:

  • Having a new function with the following type signature, which consumes the input lazily (is this another continuation? I'm not sure!):

    someFunc :: (MonadIO m, Monoid b) 
              => TaskGroup 
              -> m (IO a)                          -- ^ producer of monadic actions
              -> (Either SomeException a -> m b)   -- ^ consumer of results
              -> m b
    
  • Allowing one to query the TaskGroup to see how many slots are vacant. This allows one to write complex scheduling logic for when to push a task.

    vacantSlots :: TaskGroup -> Int
    

saurabhnanda avatar Jan 21 '20 21:01 saurabhnanda

I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.

jwiegley avatar Jan 21 '20 22:01 jwiegley

Should I attempt a PR?

On Wed, 22 Jan 2020, 03:51 John Wiegley, [email protected] wrote:

I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jwiegley/async-pool/issues/20?email_source=notifications&email_token=AAAG5UJNERA7UD4KWUHZOMTQ65YQBA5CNFSM4KJ3DOS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJRP3AY#issuecomment-576912771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAG5UONG7RZHIDRGRCXGHLQ65YQBANCNFSM4KJ3DOSQ .

saurabhnanda avatar Jan 22 '20 03:01 saurabhnanda

@saurabhnanda I'd be quite interested to see what you come up with, sure!

jwiegley avatar Jan 23 '20 19:01 jwiegley

Check out the proposed solution, could be also used for scatterFoldMapM.

l29ah avatar Oct 03 '22 23:10 l29ah