YutingWang98
YutingWang98
@mayurdb Hi mayurdb! We also have this server down/restart issue quite frequently. Do you mind sharing your progress on the stage retry and new server list picking, or how you...
@hiboyang Hi, I fould the bug and fixed it in a pull request
Thank you for the suggestions @hiboyang ! Does this mean the shuffle data written to the server will be doubled if I set 'spark.shuffle.rss.replicas' to 2? If so, this will...
Hi, @hiboyang. If the 'spark.shuffle.rss.replicas' does write double size of data to server, we won't be able to use this to large jobs with 400+ TB shuffle data unfortunatly. So...
Thanks for the replay! Will see what I can do to improve this.
@hiboyang Hi! I attempted to contribute to adding stage retry, but there seems to be a difficulty due to the implementation of Rss. Wondering if I can have some insights...
Hi @mayurdb, thank you for the reply, and sharing your implementation! I have a question here: If the spark stages are cascading, then one stage may depend on the previous...
> @mayurdb Thank you for sharing it, will take a look!
Hi @mayurdb, we have also been experiencing memory and map stage latency issues using Rss. We plan to test and work on this implementation as well. Wondering if you have...