tensor_parallel
tensor_parallel copied to clipboard

→

Metadata

Automatically split your PyTorch models on multiple GPUs for training & inference

Reame
Issues

Results 35 tensor_parallel issues

Sort by recently updated

Does tensor_parallel support multi-node tensor parallel training?

6

comment

Customized generate func support?

Hi, If my model is multimodal and the geneates actually defines different like this: ``` generation_output = model_tp.generate( pixel_values=pixel_values, input_ids=input_ids, attention_mask=attention_mask, **generation_config, ) ``` it is don't work. How to...

MonolithFoundation

RuntimeError: NCCL Error 3: internal error

1

comment

[0] NCCL INFO cudaDriverVersion 11040 [0] NCCL INFO Bootstrap : Using eth0:10.84.253.70 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation File "/usr/local/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context...

Slow inference performance for large Llama models compared to naive MP

26

comment

The inference speed of naive model parallel is much better than tensor parallel: Setup: Llama-30b on 2080Ti 22G x4 Naive: 31.64s 4-way TP, main branch: 177.78s 4-way TP, llama branch:...

Add mixtral support

It is roughly a duplicate of llama config, adds support for mixtral models.

tensor_parallel int4 LLM is not working since release v2.0.0

It works fine on v1.3.2, however ``` RuntimeError: Trying to shard a model containing 'meta' parameters. Please set `sharded=False` during model creation and call `.apply_sharding()` only after dispatch ``` occurres...

Now, does tensor_parallel no longer support the huggingface trainer?

I'm trying to use the huggingface trainer after using tensor_parallel with the Llama2 7b model, by calling ```python model = tp.tensor_parallel(model) ``` but I'm getting the following error. ``` ValueError:...

Can I use tensor_parallel to inference for a GPTQ quantized model?

What should I do if I want to use tensor_parallel for a GPTQ quantized model([Llama-2-7b-Chat-GPTQ](https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ) for examlpe) to inference on 2 or more GPUs? Currently, I am using AutoGPTQ to...

No implement of generate() when using models from hugging face.

What an amazing work, however, when I tried to inference kosmos model from hugging face, there was an error: NotImplementedError: A model class needs to define a `prepare_inputs_for_generation` method in...

TensorParallel object has no attribute save_pretrained

I used tensor_parallel to finetune qwen model with lora in tensor parallel way. However, it cannot save the model in the end. Any help can you provide? Thanks.

1
2
3
4
›

About

Automatically split your PyTorch models on multiple GPUs for training & inference

python

pytorch

machine-learning

deep-learning

natural-language-processing

nlp

pytorch-transformers

581

Stars

36

Forks

Watchers

Owner

← Metadata

581

Stars

36

Forks

Watchers

Owner

Metadata

Automatically split your PyTorch models on multiple GPUs for training & inference