terrificdm issues

Results 5 issues of


                                            terrificdm

How to set "use_fsdp=True" with "SLURM_LOCALID" and "SLURM_PROCID" for multi-gpus training?

For single GPU training, every time I run the script, I have to "export SLURM_LOCALID=0", "export SLURM_PROCID=0" and "export SLURM_NNODES=1" before I start the training successfully. My question is for...

How to change model parameters(temperature, top_k, top_p, etc.) in dynamic when making an inference call?

By changing "generation_config.json"? or any other flexible method? Thanks.

CDK deployment for CloudFront with L@E and CFF serving static and dynamic content

## :rocket: Feature Request ### General Information * [X] :wave: I may be able to implement this feature request * [ ] :warning: This feature might incur a breaking change...

feature-request

effort/medium

Flux Lora training seems not to converge with big dataset(140 images)

I use one NVIDIA L40S(48GB VRAM) to train a Lora for Flux, and here is my training script: `./sd-scripts/flux_train_network.py --pretrained_model_name_or_path ./model/flux1-dev.safetensors --clip_l ./model/clip_l.safetensors --t5xxl ./model/t5xxl_fp16.safetensors --ae ./model/ae.safetensors --cache_latents_to_disk --save_model_as safetensors...

Networking requirements(latency, bandwidth, throughput) for EP vs TP?

I'm curious about the differences in networking requirements between EP and TP, particularly regarding latency, bandwidth, and throughput. I know that both distribution strategies are highly demanding in terms of...