[Feat]: Add Dockerfile for local Windows 11 + CUDA 13 + RTX 5090 setup
Describe your use-case.
Summary I created and tested a new Dockerfile that allows running OneTrainer locally on:
- Windows 11 host (via Docker Desktop / WSL2),
- CUDA 13,
- NVIDIA RTX 5090,
- with optional S3 storage mounting for training data.
Motivation Current Docker resources seem focused on cloud providers (RunPod, Vast, etc.). This Dockerfile provides a tested setup for local Windows users with modern NVIDIA GPUs.
File
FROM nvidia/cuda:13.0.0-cudnn-runtime-ubuntu24.04
LABEL authors="aleksander.marszalki"
LABEL name="trainer-s3"
RUN apt-get update && apt-get install -y \
software-properties-common \
build-essential apt-utils \
wget curl vim git ca-certificates kmod \
nvidia-driver-525 \
python3 python3-pip python3.12-venv \
git libgl1 libglib2.0-0 s3fs \
&& rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python && \
ln -sf /usr/bin/pip3 /usr/bin/pip
RUN pip install --no-cache-dir --break-system-packages torch torchvision --index-url https://download.pytorch.org/whl/cu129 && \
pip install --no-cache-dir --break-system-packages tensorflow && \
pip install --no-cache-dir --break-system-packages -U "huggingface_hub[cli]"
WORKDIR /opt/onetrainer
RUN git clone https://github.com/Nerogar/OneTrainer.git --single-branch --branch master --depth 1
RUN mkdir -p instance training_concepts training_samples training_data
WORKDIR /opt/onetrainer/OneTrainer
RUN ./install.sh
CMD hf auth login --token ${HF_TOKEN} && \
hf download black-forest-labs/FLUX.1-dev --local-dir ~/.cache/huggingface/flux && \
mkdir -p ${S3_MOUNT} && \
s3fs ${S3_BUCKET} ${S3_MOUNT} -o allow_other && \
mkdir -p /opt/onetrainer/local_training_data && \
cp -r ${S3_MOUNT}/training_data* /opt/onetrainer/local_training_data/ && \
nvidia-smi && \
ulimit -a && \
./venv/bin/python scripts/train.py --config-path ${S3_MOUNT}/OneTrainer/training_presets/flux_lora_backup_af_epoch_sierpien_JDTAG_S3.json
What would you like to see as a solution?
Proposal
- Generalize and contribute my Dockerfile as an additional resource (e.g.
resources/docker/Dockerfile.windows-cuda13) - Add short build/run instructions in a README
- Optionally provide a
compose.yamlexample for mounting S3 storage - Workflows tested:
docker build,docker run, training script startup
Questions
- Would you like me to open a PR with this Dockerfile (and make it more general-purpose)?
- Should this file live in
resources/docker/, or do you prefer a different structure? - Do you also want a
compose.yamlexamples for S3 and MINIO mounting?
Have you considered alternatives? List them here.
No response
Please see here: https://github.com/Nerogar/OneTrainer/pull/963
Note it’s a draft.
Hi @Cybernetic-Ransomware We're already working on improving and unifying Docker support for all platforms and some cloud providers. However, if you could test my PR as mentioned by O-J1, and port some of your changes to fix any issue you encounter, we'll get closer to a tested and ready local Docker stack.
Fwiw, I'll probably make a first merge for the local changes since 1. I believe that's how most user wish to use OneTrainer, 2. I feel like a delivery is due and ready and 3. to get some user feedback to help me with the cloud stack revamp.
(cc @O-J1 @dxqb since you're watching #963)
@bbergeron0 I just ran your PR on Windows using VcXsrv with a few minor changes in the docker-compose file:
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
# Allow X11 access
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
- DISPLAY=host.docker.internal:0.0
- UID=1000
- GID=1000
Surprisingly, it worked without the usual PyTorch exceptions, and the image size is about three times smaller compared to my previous Dockerfile based on NVIDIA's distribution.
I think a major improvement could be to properly map volumes for configuration presets and training data. I’ve spent quite some time exploring container paths and am still halfway through. One option I’m considering is mounting a separate filebrowser container via Compose to simplify local data access.
I’ll continue exploring your Dockerfile and will share any useful findings from my previous setup to help improve the local Docker stack.
@Cybernetic-Ransomware Thanks for the feedback! I'd like a bit more information to help me integrate your changes into my PR.
First, could you share the errors you encountered that led to these changes?
Second, what happens if you revert all your changes except DISPLAY=host.docker.internal:0.0?
I think a major improvement could be to properly map volumes for configuration presets and training data
What do you mean? Creating a volume for each writable directory?
simplify local data access
In my next commit, I'll mount each writable directory directly to its local counterpart. I hope that will make accessing data easier.
PS: You can reply in #963, it's easier for me to track a single thread '^^