tez icon indicating copy to clipboard operation
tez copied to clipboard

TEZ-4246[WIP]: Avoid uneven local disk usage for spills

Open okumin opened this issue 5 years ago • 0 comments

https://issues.apache.org/jira/browse/TEZ-4246

In case that there are just two disks, the current implementation is likely to use one of them to write spill data and the other one to store the index files. All file.out, bigger than file.out.index, are written on the same disk.

  1. write spill data on /data/0/..../file.out
  2. write a spill index file on the other directory, /data/1/.../file.out.index
  3. write spill data on /data/0/..../file.out
  4. ...

This PR would change the behavior so as to utilize both disks more proportionally.

  1. write spill data on /data0/..../file.out
  2. write the spill index file on the same directory, `/data/0/.../file.out
  3. write spill data on /data1/..../file.out
  4. ...

Index files are relatively small and I think it's reasonable to put it on the same directory as file.out.

okumin avatar Nov 10 '20 05:11 okumin