Firestorm
Firestorm copied to clipboard
To support more tasks with Firestorm
The current blockId is designed as following:
// BlockId is long and composed by partitionId, executorId and AtomicInteger
// AtomicInteger is first 19 bit, max value is 2^19 - 1
// partitionId is next 24 bit, max value is 2^24 - 1
// taskAttemptId is rest of 20 bit, max value is 2^20 - 1
Why we need blockId? It's designed for data check, filter, memory data read, etc.
Why blockId is designed as above?
BlockId will be stored in Shuffle server, to reduce memory cost. Roaringbitmap is used to cache it.
According to implementation of Roaringbitmap, the design of BlockId is target to use BitmapContainer instead of ArrayContainer for memory saving.
What's the problem of blockId? It can't support taskId which is greater than 2^20 - 1
Proposal I think the first 19 bit is too much for atomic int, and we can leverage some of them for taskId.