Jiatong (Julius) Han

Results 231 comments of Jiatong (Julius) Han

Which examples are you referring to? Would you please point them out and let us enhance the documentation accordingly?

May I know your command for training? Have you changed any line in your training codes?

May I know why you set `use_reentrant = True`? It might be not recommended when there are nested modules ([link](https://pytorch.org/docs/stable/checkpoint.html)).

May I know your outputs of `nvidia-smi`? The error codes of `-4` are often flagged with Out-Of-Memory (OOM) issue.

It could be due to the mismatch between cuda and pytorch versions. Run `nvcc --version` and `python -c 'import torch; print(torch.version.cuda);'` to see if they match.

Thanks for sharing @erichtho . Would this solve your issue as well? @MrD005

Can you please print the contents of `/home/zdw/Open-Sora/pre_training/llava-v1.6-34b`; it should hold the `.bin` file that contains model weights. Otherwise, HF will download to some default caching space as set with...

What was the error? Would you please try changing it while also passing the tests at `~/tests` folder?

You may try replacing the Boolean values from [this line](https://github.com/hpcaitech/Open-Sora/blob/81982f60c42fa9e864dce669ea9d97552820125b/gradio/app.py#L104) and [this line](https://github.com/hpcaitech/Open-Sora/blob/81982f60c42fa9e864dce669ea9d97552820125b/gradio/app.py#L105) with `args.enable_optimization`.