[Feat]: Gradio and Runpod
Describe your use-case.
Gradio that supports a web interface to use specifically with runpod.
What would you like to see as a solution?
If there's already something like this other than command line let me know. I know there's command line support, but it's not good for direct inline use and can't really be tinkered with unless you're already advanced or have a pipeline setup.
Gradio + Runpod would be really nice. Gradio lets you host web interfaces locally with gradio, while it handles the web hosting elsewhere, which is convenient for immediate and fairly streamlined GUI support.
I use runpod for most of my training with Kohya. Kohya does a pretty good job supporting multiple gpus, but I dunno what sort of challenges you face when working with a complex system like this. Kohya has a few disadvantages compared to what you built here as well, so I'd like to have a supported system at some point or other to play with.
Have you considered alternatives? List them here.
No response
I remember I asked this and it wasn't planned
Therefore I use Massed Compute Ubuntu desktop image to use OneTrainer there
I remember I asked this and it wasn't planned
Therefore I use Massed Compute Ubuntu desktop image to use OneTrainer there
Thank you professor I'll have a look.
I didn't read your entire message but it seems you're asking about RunPod support. Someone on the #development discord channel is working on integrated RunPod support which will automatically connect and upload and sync your dataset to RunPod to run the training there with 1 click in the local UI.
As for Gradio, web interfaces have been discussed but decided against, because running a web browser while training will waste GIGABYTES of Graphics VRAM for the browser's GPU acceleration, which hurts training.
Also, someone would have to port all code to a web UI, which neither Nero nor any other big contributors want to do. If someone wants to try it, feel free though. If it's well-made, it could be accepted. But it would also require creating an ELECTRON application wrapper which DISABLES HARDWARE ACCELERATION, using entirely software-based rendering, to avoid the VRAM waste issue.
Here's an example of how much VRAM a web browser can steal (and it can be a lot more than this too):
Brave (chrome-based) browser is taking 1.5 GB of VRAM there, right now.
Such a big waste would completely break training large models.
As for Gradio, web interfaces have been discussed but decided against, because running a web browser while training will waste GIGABYTES of Graphics VRAM for the browser's GPU acceleration, which hurts training.
Also, someone would have to port all code to a web UI, which neither Nero nor any other big contributors want to do. If someone wants to try it, feel free though. If it's well-made, it could be accepted. But it would also require creating an ELECTRON application wrapper which DISABLES HARDWARE ACCELERATION, using entirely software-based rendering, to avoid the VRAM waste issue.
Here's an example of how much VRAM a web browser can steal (and it can be a lot more than this too):
Brave (chrome-based) browser is taking 1.5 GB of VRAM there, right now.
Ideally, you wouldn't want to run the browser ON the runpod device. You would run the web interface server, most likely in a headless state so it only handles standard request/response/error information.
Having a whole linux based GUI interface cuts into your vram substantially as well.
To PLAY with the tool it would be an option at least, but ideally you would want to use distributed training via multiple cards and devices for a larger scale training with more than just a single or a couple devices.
Yeah of course, if the training runs on another machine, then the web browser using up VRAM to render a website doesn't matter.
But 99% of people use local training. Which is why a browser UI is very harmful for local training. This problem affects Kohya's web UI for example. Lots of VRAM is being wasted by your web browser. Typically 1.5 GB but I've even seen 6 GB being used by my browser.
The OneTrainer Tk UI does not use any VRAM at all (zero megabytes). It's entirely software rendered.
Anyway, as mentioned, someone is working on RunPod support so hop on and search in the #development discord channel. :) I expect that it could take up to a few months before it's fully ready though, because a volunteer is doing it in their free time.
Yeah of course, if the training runs on another machine, then the web browser using up VRAM to render a website doesn't matter.
But 99% of people use local training. Which is why a browser UI is very harmful for local training. This problem affects Kohya's web UI for example. Lots of VRAM is being wasted by your web browser. Typically 1.5 GB but I've even seen 6 GB being used by my browser.
The OneTrainer Tk UI does not use any VRAM at all (zero megabytes). It's entirely software rendered.
Anyway, as mentioned, someone is working on RunPod support so hop on and search in the #development discord channel. :) I expect that it could take up to a few months before it's fully ready though, because a volunteer is doing it in their free time.
I mean, getting it to start on runpod only took about... 30 minutes or so, give or take. There just isn't a GUI interface that makes it convenient for tinkering and quick adjustment. I didn't give it a benchmark but I did manage to get everything set up with minimal resistance.
The main problem being the lack of a GUI interface connected to onetrainer on the runpod. I could use runpodctl send to send a config over and over, send models, retrieve models, whatever. It's just not convenient having to debug using something like VIM because I'm a millennial who grew up on text editors with a mouse.
I suppose I'm spoiled by Kohya.
Yes. The volunteer is working on something that automatically syncs OneTrainer, the models and the dataset to RunPod and executes training, all in a dedicated Cloud tab of the OneTrainer GUI. I don't know much more about it, so hop on the discord, check out the message history with the search for "runpod", and join the discussion if you wanna ask them some stuff. :)
The RunPod/Cloud feature has been progressing steadily. People who want to help test it can find the discussions in the developer channel on Discord.
This use case is now supported in 2 ways:
- run OneTrainer locally, train remotely: https://github.com/Nerogar/OneTrainer/wiki/Cloud-Training
- run OneTrainer remotely, by starting a Desktop Runpod and installing OneTrainer there:
Therefore I'd suggest to close as "not planned"
Closed as not planned as we now support it directly in the tkinter UI. Additionally we will never transition to Gradio because of its VRAM overhead.
To see how to use the native functionality see here: https://github.com/Nerogar/OneTrainer/wiki/Cloud-Training
