ml-hub icon indicating copy to clipboard operation
ml-hub copied to clipboard

Support GPUs on multiple machines (via docker-swarm or kubernetes)?

Open Ledenel opened this issue 5 years ago • 1 comments

Feature description:

Support docker-swarm (with GPUs support) out-of-the-box.

Problem and motivation:

As here describes, CURRENTLY it is not possible to run ml-hub with GPU support across multiple machines (while every machine may have one or more GPU cards). Since it is not easy to build a kubernetes cluster with GPU support and management (and I'm not farmiliar with kubernetes), maybe a more lightweight solution (like docker-swarm?) would support it more seamlessly (via nvidia-docker).

Is this something you're interested in working on?

Yes

Ledenel avatar Oct 20 '20 10:10 Ledenel

By the way, kubernetes seems to support GPU management via Device plugins https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins. So why the gpu mode is not supported in kubernetes? Is it due to lack of standards, historical reasons, or just waiting someone to implement?

Ledenel avatar Oct 20 '20 10:10 Ledenel