keystone icon indicating copy to clipboard operation
keystone copied to clipboard

Integrate Block Operators more neatly with the DAG

Open tomerk opened this issue 10 years ago • 2 comments

Currently so as to be easily chainable with the rest of the code, block operators (such as block solves and block transformers) take a single complete RDD and manually split it into multiple blocks in a way that is hidden from the DAG.

If we add some DAG rewriting rules to detect this and integrate block operators better with the DAG, we should be able to take advantage of optimizations like auto-caching more effectively, and we can allow the block operators to operate on blocks lazily.

tomerk avatar Jan 26 '16 18:01 tomerk

One thing that makes the block solves tricky is that the blocks are not independent. That is - we pass a Seq[RDD[T]] because the solution to the second block depends on the solution to the first block. It is not clear to me how to capture this in the DAG.

etrain avatar Jan 26 '16 20:01 etrain

I think it should be able to work the same way the GatherTransformer works: a TransformerNode that takes multiple RDDs together as input.

tomerk avatar Jan 26 '16 20:01 tomerk