Jiatong (Julius) Han comments

Results 231 comments of


                                            Jiatong (Julius) Han

[DOC]: 文档这么少，几乎没法用

Which examples are you referring to? Would you please point them out and let us enhance the documentation accordingly?

train error with exitcode: -4

May I know your command for training? Have you changed any line in your training codes?

train error with exitcode: -4

May I know why you set `use_reentrant = True`? It might be not recommended when there are nested modules ([link](https://pytorch.org/docs/stable/checkpoint.html)).

train error with exitcode: -4

May I know your outputs of `nvidia-smi`? The error codes of `-4` are often flagged with Out-Of-Memory (OOM) issue.

torch.distributed.elastic.multiprocessing.api

It could be due to the mismatch between cuda and pytorch versions. Run `nvcc --version` and `python -c 'import torch; print(torch.version.cuda);'` to see if they match.

torch.distributed.elastic.multiprocessing.api

Thanks for sharing @erichtho . Would this solve your issue as well? @MrD005

You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.

May I know the contents of the local model path?

You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.

Can you please print the contents of `/home/zdw/Open-Sora/pre_training/llava-v1.6-34b`; it should hold the `.bin` file that contains model weights. Otherwise, HF will download to some default caching space as set with...

When I enable sequence parallelism, it throws a bug.

What was the error? Would you please try changing it while also passing the tests at `~/tests` folder?

python gradio/app.py --port 8001 --host 0.0.0.0 --enable-optimization --model-type v1-HQ-16x512x512 一直报torch.cuda.OutOfMemoryError: CUDA out of memory

You may try replacing the Boolean values from [this line](https://github.com/hpcaitech/Open-Sora/blob/81982f60c42fa9e864dce669ea9d97552820125b/gradio/app.py#L104) and [this line](https://github.com/hpcaitech/Open-Sora/blob/81982f60c42fa9e864dce669ea9d97552820125b/gradio/app.py#L105) with `args.enable_optimization`.