OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Feat]: Gradio and Runpod

Open AbstractEyes opened this issue 1 year ago • 8 comments

Describe your use-case.

Gradio that supports a web interface to use specifically with runpod.

What would you like to see as a solution?

If there's already something like this other than command line let me know. I know there's command line support, but it's not good for direct inline use and can't really be tinkered with unless you're already advanced or have a pipeline setup.

Gradio + Runpod would be really nice. Gradio lets you host web interfaces locally with gradio, while it handles the web hosting elsewhere, which is convenient for immediate and fairly streamlined GUI support.

I use runpod for most of my training with Kohya. Kohya does a pretty good job supporting multiple gpus, but I dunno what sort of challenges you face when working with a complex system like this. Kohya has a few disadvantages compared to what you built here as well, so I'd like to have a supported system at some point or other to play with.

Have you considered alternatives? List them here.

No response

AbstractEyes avatar Oct 01 '24 12:10 AbstractEyes

I remember I asked this and it wasn't planned

Therefore I use Massed Compute Ubuntu desktop image to use OneTrainer there

FurkanGozukara avatar Oct 01 '24 13:10 FurkanGozukara

I remember I asked this and it wasn't planned

Therefore I use Massed Compute Ubuntu desktop image to use OneTrainer there

Thank you professor I'll have a look.

AbstractEyes avatar Oct 01 '24 15:10 AbstractEyes

I didn't read your entire message but it seems you're asking about RunPod support. Someone on the #development discord channel is working on integrated RunPod support which will automatically connect and upload and sync your dataset to RunPod to run the training there with 1 click in the local UI.

Arcitec avatar Oct 01 '24 15:10 Arcitec

As for Gradio, web interfaces have been discussed but decided against, because running a web browser while training will waste GIGABYTES of Graphics VRAM for the browser's GPU acceleration, which hurts training.

Also, someone would have to port all code to a web UI, which neither Nero nor any other big contributors want to do. If someone wants to try it, feel free though. If it's well-made, it could be accepted. But it would also require creating an ELECTRON application wrapper which DISABLES HARDWARE ACCELERATION, using entirely software-based rendering, to avoid the VRAM waste issue.

Here's an example of how much VRAM a web browser can steal (and it can be a lot more than this too):

image

Brave (chrome-based) browser is taking 1.5 GB of VRAM there, right now.

Such a big waste would completely break training large models.

Arcitec avatar Oct 01 '24 15:10 Arcitec

As for Gradio, web interfaces have been discussed but decided against, because running a web browser while training will waste GIGABYTES of Graphics VRAM for the browser's GPU acceleration, which hurts training.

Also, someone would have to port all code to a web UI, which neither Nero nor any other big contributors want to do. If someone wants to try it, feel free though. If it's well-made, it could be accepted. But it would also require creating an ELECTRON application wrapper which DISABLES HARDWARE ACCELERATION, using entirely software-based rendering, to avoid the VRAM waste issue.

Here's an example of how much VRAM a web browser can steal (and it can be a lot more than this too):

image

Brave (chrome-based) browser is taking 1.5 GB of VRAM there, right now.

Ideally, you wouldn't want to run the browser ON the runpod device. You would run the web interface server, most likely in a headless state so it only handles standard request/response/error information.

Having a whole linux based GUI interface cuts into your vram substantially as well.

To PLAY with the tool it would be an option at least, but ideally you would want to use distributed training via multiple cards and devices for a larger scale training with more than just a single or a couple devices.

AbstractEyes avatar Oct 01 '24 15:10 AbstractEyes

Yeah of course, if the training runs on another machine, then the web browser using up VRAM to render a website doesn't matter.

But 99% of people use local training. Which is why a browser UI is very harmful for local training. This problem affects Kohya's web UI for example. Lots of VRAM is being wasted by your web browser. Typically 1.5 GB but I've even seen 6 GB being used by my browser.

The OneTrainer Tk UI does not use any VRAM at all (zero megabytes). It's entirely software rendered.

Anyway, as mentioned, someone is working on RunPod support so hop on and search in the #development discord channel. :) I expect that it could take up to a few months before it's fully ready though, because a volunteer is doing it in their free time.

Arcitec avatar Oct 01 '24 15:10 Arcitec

Yeah of course, if the training runs on another machine, then the web browser using up VRAM to render a website doesn't matter.

But 99% of people use local training. Which is why a browser UI is very harmful for local training. This problem affects Kohya's web UI for example. Lots of VRAM is being wasted by your web browser. Typically 1.5 GB but I've even seen 6 GB being used by my browser.

The OneTrainer Tk UI does not use any VRAM at all (zero megabytes). It's entirely software rendered.

Anyway, as mentioned, someone is working on RunPod support so hop on and search in the #development discord channel. :) I expect that it could take up to a few months before it's fully ready though, because a volunteer is doing it in their free time.

I mean, getting it to start on runpod only took about... 30 minutes or so, give or take. There just isn't a GUI interface that makes it convenient for tinkering and quick adjustment. I didn't give it a benchmark but I did manage to get everything set up with minimal resistance.

The main problem being the lack of a GUI interface connected to onetrainer on the runpod. I could use runpodctl send to send a config over and over, send models, retrieve models, whatever. It's just not convenient having to debug using something like VIM because I'm a millennial who grew up on text editors with a mouse.

I suppose I'm spoiled by Kohya.

AbstractEyes avatar Oct 01 '24 15:10 AbstractEyes

Yes. The volunteer is working on something that automatically syncs OneTrainer, the models and the dataset to RunPod and executes training, all in a dedicated Cloud tab of the OneTrainer GUI. I don't know much more about it, so hop on the discord, check out the message history with the search for "runpod", and join the discussion if you wanna ask them some stuff. :)

Arcitec avatar Oct 01 '24 15:10 Arcitec

The RunPod/Cloud feature has been progressing steadily. People who want to help test it can find the discussions in the developer channel on Discord.

Arcitec avatar Nov 15 '24 05:11 Arcitec

This use case is now supported in 2 ways:

  • run OneTrainer locally, train remotely: https://github.com/Nerogar/OneTrainer/wiki/Cloud-Training
  • run OneTrainer remotely, by starting a Desktop Runpod and installing OneTrainer there: image

Therefore I'd suggest to close as "not planned"

dxqb avatar Jan 10 '25 07:01 dxqb

Closed as not planned as we now support it directly in the tkinter UI. Additionally we will never transition to Gradio because of its VRAM overhead.

To see how to use the native functionality see here: https://github.com/Nerogar/OneTrainer/wiki/Cloud-Training

O-J1 avatar Feb 08 '25 12:02 O-J1