Shark

Results 2 issues of Shark

Currently, the calculated flink heap size is larger than it really is, As it used 1000 as divisor. This pr used 1024 as divisor to fix the issue.

when run in distribute mode, it will cause race condition of `_save_first_checkpoint`. So, we should judge if its task_type is chief.

cla: yes
XS
awaiting review