Parallelize BB processing script
This patch parallelizes the BB processing script. This significantly speeds up the processing of BBs. Eventually diminishing returns are reached, especially on systems with a large number of threads, most likely due to blocking on IO.
I'm seeing near linear speedup up to about 16 threads and then I start to hit diminishing returns.
Converting to a draft as it needs some more work. Currently not handling the case where a batch is left with some items at the end.
Closing this for now. I think I want to restructure this to do something different rather than just parallelizing in process. Adding Python bindings and using something like Apache Beam I think would enable more scalability for dataset processing.