周博洋
周博洋
root@A100:/aml/conda/lib/python3.10/site-packages# python -m xformers.info Traceback (most recent call last): File "/aml/unsloth_env/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/aml/unsloth_env/lib/python3.10/runpy.py", line 110, in _get_module_details __import__(pkg_name) File "/aml/conda/lib/python3.10/site-packages/xformers/__init__.py",...
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]  **Describe the solution...
### Describe the issue Issue: Log said gradient will be none Command: ``` PASTE THE COMMANDS HERE. ``` just using pretran Log: ``` /data22/llava/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True...
### Question I only found that there are some file like:  How can I merge them to base model , or something I should do, any help is very...
### Question [2024-04-29 06:52:01,294] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 295, num_elems = 6.76B Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00
**Describe the bug** A clear and concise description of what the bug is. Stonge issue /aml2/ds) root@A100:/aml2/Megatron-LM# from megatron.training.tokenizer import build_tokenizer from: can't read /var/mail/megatron.training.tokenizer (/aml2/ds) root@A100:/aml2/Megatron-LM# python tools/preprocess_data.py \...
Lora+base is working good   When merge (/data2/llava-phi) root@A100:/data2/LLaVA-pp# python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /data2/phi3-vlm3 2024-05-05 10:36:00 | INFO | model_worker |...
 模型文件具体在哪,我没看到,但是再开头的log日志里我看到快7个G的文件是下载完了的,我用的hugging face
(TE) root@bjdb-h20-node-118:/aml/TransformerEngine/examples/pytorch/fsdp# torchrun --standalone --nnodes=1 --nproc-per-node=$(nvidia-smi -L | wc -l) fsdp.py W0712 09:57:45.035000 139805827512128 torch/distributed/run.py:757] W0712 09:57:45.035000 139805827512128 torch/distributed/run.py:757] ***************************************** W0712 09:57:45.035000 139805827512128 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each...
### 📦 Environment - [ ] Official - [ ] Official Preview - [ ] Vercel / Zeabur / Sealos - [X] Docker - [ ] Other ### 📌 Version...