modelfox icon indicating copy to clipboard operation
modelfox copied to clipboard

Parallelize `predict` subcommand

Open koute opened this issue 3 years ago • 2 comments

Predicting from the CLI is painfully slow since it runs only on a single thread, so here's a quick fix.

Caveat: this essentially preloads the whole CSV into memory. It is still a net improvement though since prediction on a single core becomes a problem far earlier (for a CSV which is less than 100MB it's already unusably slow) than consuming too much memory does.

koute avatar Mar 13 '22 10:03 koute

@koute thanks for the PR! We decided to make prediction single threaded for exactly the reason you identified: to avoid buffering the whole CSV. I agree that it is a good idea to provide the option to parallelize predictions, but I think we should put it behind a command line argument, --parallel, that explains the tradeoff. Would you be willing to update this PR with that change?

nitsky avatar Mar 13 '22 15:03 nitsky

Okay, now it doesn't read the whole CSV but will still run in parallel. (:

Speed-wise it's the same; the old version with preloading the CSV:

real	0m11.739s
user	10m29.132s
sys	0m14.350s

the new version without preloading the CSV:

real	0m11.425s
user	9m53.702s
sys	0m51.725s

Is this acceptable or do you still want me to add an argument to disable parallelization?

koute avatar Mar 13 '22 16:03 koute