Open-Sora Can we use t5-large text encoder model to use with the opensora pretrained weights?

Edit: seems we can't use the smaller models, so it would be handy to have a way to load the xxl models in 8bit format for smaller vram GPUs. its doable for pixart image gen models using diffusers library.

I tried to use google/t5-v1_1-large model as text encoder instead of DeepFloyd/t5-v1_1-xxl, but encountered following error.

RuntimeError: Error(s) in loading state_dict for STDiT:
	size mismatch for y_embedder.y_embedding: copying a param with shape torch.Size([120, 4096]) from checkpoint, the shape in current model is torch.Size([120, 1024]).
	size mismatch for y_embedder.y_proj.fc1.weight: copying a param with shape torch.Size([1152, 4096]) from checkpoint, the shape in current model is torch.Size([1152, 1024]).

It seems the output embedding dimension for large model is 1024 and for xxl is 4096, and opensora weights only accept weights from xxl model i.e. 4096 dim weights.

is there anyway we can use the t5-large model instead of the xxl model? I want to run inference in cloud gpus i.e. T4 in colab notebooks.

here's my notebook as a gist i used to run on colab. https://gist.github.com/sandeshrajbhandari/ac3857cd2aaae5e3a9de0d7c219ac351

Mar 18 '24 03:03 sandeshrajbhandari

I am creating a demo for the HuggingFace platform and I cannot install the apex library. Is there a way to skip this? How can I add this code to req.txt file?

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "- -build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

I am getting errors due to parameters.

Mar 18 '24 03:03 kadirnar

install the apex library since you need it to use FusedLayerNorm even if the README.md says its optional. take a look at the notebook I'm using right now for reference.

I installed the apex package by cloning the original repo and modifying the check_cuda_torch_binary_vs_bare_metal(cuda_dir): in setup.py to remove runtime error to compile apex. i replace raise RuntimeError with print function.

def check_cuda_torch_binary_vs_bare_metal(cuda_dir):
    raw_output, bare_metal_version = get_cuda_bare_metal_version(cuda_dir)
    torch_binary_version = parse(torch.version.cuda)

    print("\nCompiling cuda extensions with")
    print(raw_output + "from " + cuda_dir + "/bin\n")

    if (bare_metal_version != torch_binary_version):
        print(
            "Cuda extensions are being compiled with a version of Cuda that does "
            "not match the version used to compile Pytorch binaries.  "
            "Pytorch binaries were compiled with Cuda {}.\n".format(torch.version.cuda)
            + "In some cases, a minor-version mismatch will not cause later errors:  "
            "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
            "You can try commenting out this check (at your own risk)."
        )

let me know if this helps, im trying to run this in colab.

Mar 18 '24 05:03 sandeshrajbhandari

@sandeshrajbhandari

I downloaded the Apex repo and updated the setup.py file. to req.txt file

-v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \" --build-option=--cuda_ext\" ./"

I added the code.

Error Message:

ERROR: Invalid requirement: -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" ./"
Could not split options: -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" ./"

Can you help? Demo: https://huggingface.co/spaces/kadirnar/Open-Sora

Mar 18 '24 10:03 kadirnar

try adding this one line in the requirements.txt. it will point to the local folder to install apex from. I guess this is the issue. let me know if it helps. -e ./apex --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext"

the requirement.txt file would look like this after that.

xformers
git+https://github.com/hpcaitech/Open-Sora.git#egg=opensora
-e ./apex --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext"

I haven't tried installing the edited apex package using requirements.txt so I'm not entirely sure if it would work. this is the exact command i used to install it for my colab environment.

## warning takes more than 28 minutes to run this. go take a break.
%cd /content/apex
!pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

if that didn't work just clone the apex repository and make the setup.py changes. then replace the git link in the following command with your custom one. this should definitely work in theory pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/yourUSERNAME/apex.git

Mar 18 '24 11:03 sandeshrajbhandari

@sandeshrajbhandari ,

I cannot use the notebook feature on the HuggingFace platform. That's why I can't use the cd command. I git cloned the Apex repo. Then I updated the setup.py file.

ERROR: Invalid requirement: -e ./apex --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext"
pip: error: no such option: --disable-pip-version-check

Should I export the files in the apex folder?

Mar 18 '24 11:03 kadirnar

XXL weights might be actually better, because as the research on story understanding shows, the larger the hidden dimension, the better it understands the plot and relationships between IRL objects https://arxiv.org/abs/2305.07759

Mar 18 '24 12:03 kabachuha