drill icon indicating copy to clipboard operation
drill copied to clipboard

DRILL-4706: Fragment planning causes Drillbits to read remote chunks when local copies are available.

Open ppadma opened this issue 9 years ago • 9 comments

ppadma avatar Oct 31 '16 16:10 ppadma

Updated the JIRA with details on how current algorithm works, why remote reads were happening and the new algorithm details. https://issues.apache.org/jira/browse/DRILL-4706

ppadma avatar Oct 31 '16 22:10 ppadma

@vkorukanti if you don't mind, can you review this?

sudheeshkatkam avatar Nov 01 '16 15:11 sudheeshkatkam

Updated with all review comments taken care of. Added TestLocalAffinityFragmentParallelizer.java which has bunch of test cases with examples.

ppadma avatar Nov 04 '16 17:11 ppadma

Some initial comments.

The issue is regarding assigning fragments based on strict locality. So why is the parallelization logic affected, and not exclusively locality?

Parallelization logic is affected because it decides how many fragments to run on each node and that is dependent on locality.

ppadma avatar Nov 04 '16 17:11 ppadma

Hmm the answer seems like a rephrasing of the question. Sorry, I misspoke. Better asked:

The issue is regarding assigning work to fragments based on strict locality (decide which fragment does what). So why is the parallelization (decide how many fragments) logic affected?

sudheeshkatkam avatar Nov 04 '16 20:11 sudheeshkatkam

Parallelization logic is affected for following reasons: Depending upon how many rowGroups to scan on a node (based on locality information) i.e. how much work the node has to do, we want to adjust the number of fragments on the node (constrained to usual global and per node limits). We do not want to schedule fragment(s) on a node which do not have data. Because we want pure locality, we may have fewer fragments doing more work.

ppadma avatar Nov 04 '16 22:11 ppadma

Merged with latest code. All review comments taken care of. All tests pass with the option store.parquet.use_local_affinity = true and false, both.

ppadma avatar Nov 10 '16 02:11 ppadma

@ppadma was this merged? I don't see a plus one and the PR isn't closed.

kkhatua avatar May 21 '18 18:05 kkhatua

Even though it is old, this PR is still very much relevant and useful feature to have in Drill for certain use cases/scenarios. I request a committer to work with me so we can get it in. Any volunteers ?

ppadma avatar Jun 07 '18 00:06 ppadma