clusterdata icon indicating copy to clipboard operation
clusterdata copied to clipboard

Question Regarding Normalized Memory Usage

Open shinyehtsai opened this issue 6 years ago • 6 comments

My question is in column plan_mem in batch_task.csv. I am still a little bit confused about the normalization standard. The schema mentions this column, plan_mem, specifies normalized memory requested for each instance of the task and it's Normalized to the largest memory size of all machines.

Few tasks have an even larger number than 15 in plan_mem column (e.g., 15.45, 17.17).

  1. task_NDg2ODM2NDIyMDczNDQ4NzMzOA==,50,j_3003670,11,Terminated,233037,234282,700,17.17
  2. task_NDg2ODM2NDIyMDczNDQ4NzMzOA==,50,j_3685110,11,Terminated,471423,493080,700,15.45

Does it mean each instance in this task take around 17.17 times of the largest memory in a physical machine in the whole memory? If I just assume the biggest memory on a physical machine in the cluster is 128GB, does each instance will take 2TB memory and the whole task will take 100TB memory eventually?

Thank you very much.

shinyehtsai avatar Mar 25 '19 16:03 shinyehtsai

I tried to figure out which machines in the dataset and found only one type with 96 cores on Alibaba Cloud website. According to this page https://www.alibabacloud.com/help/doc-detail/25378.htm This is ebmg5 or sccg5 type machines which have 384 Gb.

dautovri avatar Mar 27 '19 03:03 dautovri

But if the normalization target is to the largest memory size of all machines (assuming 384Gb is the largest one), it means each instance in this task takes around 6.5TB, and this task needs around 325TB. Did I misunderstand anything?

shinyehtsai avatar Mar 27 '19 03:03 shinyehtsai

Hi, sorry for the confusion. All values are normalized between [0, 100], not [0, 1]. That said, 100 means the memory is equal to the capacity of the machine.

HaiyangDING avatar Mar 27 '19 04:03 HaiyangDING

Thanks for the clarification for plan_mem. I also observed 50 within plan_cpu. We could also observe 5, 12, or 60 in batch_task.csv

task_MzQwMDQzNjY0MzkwNzU2ODYyMg==,2511,j_424759,9,Terminated,366991,372325,60,0.39 task_ODg3NTU4NTE3NDcxODA4NjA5Ng==,168,j_366800,3,Terminated,466234,467613,12,1.57

Does it mean half core? If it does mean half core, what is the physical meaning of half core? I thought the processing unit is at least 1 core.

shinyehtsai avatar Mar 28 '19 04:03 shinyehtsai

Thanks for the clarification for plan_mem. I also observed 50 within plan_cpu. We could also observe 5, 12, or 60 in batch_task.csv

task_MzQwMDQzNjY0MzkwNzU2ODYyMg==,2511,j_424759,9,Terminated,366991,372325,60,0.39 task_ODg3NTU4NTE3NDcxODA4NjA5Ng==,168,j_366800,3,Terminated,466234,467613,12,1.57

Does it mean half core? If it does mean half core, what is the physical meaning of half core? I thought the processing unit is at least 1 core.

Sorry for the delayed response.

There is more than one way to achieve the 0.5 core allocation.

One is cpushare, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu#sect-cfs.

The other oversubscription: say we have two tasks binding to one core.

In both cases, what matters is how the CPU scheduler manages the actual amount of computation time for each task.

HaiyangDING avatar Apr 07 '19 13:04 HaiyangDING

Hello, after normalization, when is the value assigned to be -1? When is the value assigned to be 101?

WHLOrchid avatar Aug 02 '19 09:08 WHLOrchid