tez
tez copied to clipboard
TEZ-4246[WIP]: Avoid uneven local disk usage for spills
https://issues.apache.org/jira/browse/TEZ-4246
In case that there are just two disks, the current implementation is likely to use one of them to write spill data and the other one to store the index files. All file.out, bigger than file.out.index, are written on the same disk.
- write spill data on
/data/0/..../file.out - write a spill index file on the other directory,
/data/1/.../file.out.index - write spill data on
/data/0/..../file.out - ...
This PR would change the behavior so as to utilize both disks more proportionally.
- write spill data on
/data0/..../file.out - write the spill index file on the same directory, `/data/0/.../file.out
- write spill data on
/data1/..../file.out - ...
Index files are relatively small and I think it's reasonable to put it on the same directory as file.out.