Larry Law
Larry Law
Thanks @sgugger for the advice! I've added the `_no_split_modules` attributes in this [PR](https://huggingface.co/THUDM/glm-10b-chinese/discussions/2/files). However, when I tried using `device_map` with the following code... ``` from transformers import AutoModelForSeq2SeqLM model_name_or_path =...
Nope, same error. Here's my dependencies: ``` accelerate==0.18.0 aiohttp==3.8.4 aiosignal==1.3.1 anyio==3.6.2 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 async-timeout==4.0.2 attrs==22.2.0 backcall==0.2.0 beautifulsoup4==4.12.0 bitsandbytes==0.37.2 bleach==6.0.0 certifi==2022.12.7 cffi==1.15.1 charset-normalizer==3.1.0 cmake==3.26.1 comm==0.1.3 datasets==2.11.0 debugpy==1.6.6 decorator==5.1.1 defusedxml==0.7.1...
When I killed the process, it gives the log ``` ^[[A^[[A^[[A^[[A^[[A^C[2023-06-16 08:32:00,972] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1947071 Traceback (most recent call last): File "/home/users/industry/dso/lannliat/.local/bin/deepspeed", line 6, in [2023-06-16 08:32:01,072] [INFO]...
@Ricardokevins I hypothesise that it's a flash attention issue. It works fine with deepspeed only (for my case) and fsdp only (for @pacman100 )
@Ricardokevins Oh nice that you fixed it! Can I ask for some advice since I'm still facing the issue: - What do you mean by "replaced the code with flash-attn...
Thanks @Ricardokevins ! Btw I've fixed the issue by setting number of threads used for intraop parallelism to 1 ``` torch.set_num_threads(1) ``` This [thread](https://discuss.pytorch.org/t/cpu-usage-far-too-high-and-training-inefficient/57228) explains why the above works. Also,...