RemoteShuffleService icon indicating copy to clipboard operation
RemoteShuffleService copied to clipboard

write amplification

Open cpd85 opened this issue 3 years ago • 2 comments

i'm noticing running some spark apps that produce 11TB of shuffle data on external shuffle service, that they produce closer to 18TB of shuffle data on remote shuffle service. is some write amplification expected?

cpd85 avatar May 23 '22 21:05 cpd85

It may depend on how these metrics are calculated. Remote shuffle service does write some extra data for each shuffle record like task attempt id and partition id to track the record. But sometime, the metics may be also off a little bit due to serialization/compressing.

hiboyang avatar May 26 '22 04:05 hiboyang

got it. looks like compression isn't supported at the moment on server side? my workloads tend to stress out the SSD and not use computation so I think they could benefit from compression. I see this class https://github.com/uber/RemoteShuffleService/blob/7220c23694e0175e01719621707680a2718173cf/src/main/java/com/uber/rss/common/Compression.java but as far as I can tell it it isn't actually used or configurable

cpd85 avatar May 31 '22 15:05 cpd85