WangXu comments

Results 9 comments of


                                            WangXu

Reduce the amount of gpu memory used in the quantification process

> I quantized llama 3 70B on 3x A6000 48GB. Did you adjust the calibration dataset? No, I did not change the calibration dataset

Reduce the amount of gpu memory used in the quantification process

> Ahh I see the issue. This is a transformers issue where they have a memory leak in their cache. > > if you see the examples/quantize.py, we use in...

Reduce the amount of gpu memory used in the quantification process

This still doesn't seem to work，my complete code is: ``` python from awq import AutoAWQForCausalLM from transformers import AutoTokenizer import os # os.environ["CUDA_VISIBLE_DEVICES"] = "2,3,4,5,6,7" model_path = '/data1/models/llms/llama3_8b_it' quant_path =...

Reduce the amount of gpu memory used in the quantification process

> One last thing that I noticed about your code that can cause OOM. You use `device_map='auto'` which makes accelerate fill all GPUs with the modeling. It's better to set...

使用多卡加载模型，推理时报错

> 您好，我使用单块A800进行部署推理时正常，但是使用多卡推理会报错： `Task exception was never retrieved future: Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 268, in read buf = self.sock.recv(min(self.MAX_IO_CHUNK, count)) ConnectionResetError: [Errno 104] Connection reset by...

使用多卡加载模型，推理时报错

> @hiworldwzj @wx971025 看起来不像是shm-size 的问题。llama 70b启动时给了3T的shm-size ，还是会有这个问题。 > > 10-12 22:27:39: Task exception was never retrieved future: Traceback (most recent call last): File "/app/lightllm-main/lightllm/server/router/manager.py", line 91, in loop_for_fwd await self._step()...

训练llama3-8b-it报错

> 能看看train_args么 ```shell { "output_dir": "output/firefly-llama3-8b-sft-qlora", "model_name_or_path": "/data1/models/llms/llama3_8b_it", "train_file": "./data/llama3/dummy_data.jsonl", "template_name": "llama3", "num_train_epochs": 2, "per_device_train_batch_size": 1, "gradient_accumulation_steps": 1, "learning_rate": 2e-4, "max_seq_length": 1024, "logging_steps": 100, "save_steps": 100, "save_total_limit": 1, "lr_scheduler_type": "constant_with_warmup",...

api_v2.py中workers大于1的时候，子进程会一直重启

> > ``` > > if __name__ == '__main__': > > > > # 阻止子进程重启 > > import multiprocessing > > multiprocessing.freeze_support() > > # ------------------------------------ > > > >...

api_v2.py中workers大于1的时候，子进程会一直重启

> (GPT-SoVITS) E:\bak>git clone https://github.com/RVC-Boss/GPT-SoVITS.git Cloning into 'GPT-SoVITS'... error: RPC failed; curl 28 Recv failure: Connection was reset fatal: expected flush after ref listing > > fast_inference_ 分支clone一直失败首先 clone特定分支要加-b参数，其次...