Kai Zhang
Kai Zhang
besides submit training jobs, user also wants to create a development box which contains jupyter, math lib, frameworks, in order to dev and debug algorithms before starting to train in...
It should be helpful for data scientists to use command like "arena create data imagenet-full" to create, index and manage different training datasets for different training jobs. Then when use...
cpu and mem resource info is useful for both non-gpu and gpu jobs. these should be included in "arena top "
arena should submit jobs with support for customized label, toleration, securityContext in podSpec
more and more user cases are request specify customized label, toleration, securityContext, priorityClass, etc to job underlaying Pods. arena should give a unified mechanism to meet those customization requirement.
"arena get jobname -e" only shows events of chief worker pod's event. while some meaningful events info of job level should also be shown. e.g. when ResourceQuotas enabled, if job...
to support look up previous jobs command details is helpful to diagnose client problems or reference for re-run
arena version command should contain more info of which versions of charts, apiVersion of tfjob, mpijob etc, and corresponding operators release version deployed in cluster. it's helpful to quickly identify...