Alfredo Ortega comments

Results 11 comments of


                                            Alfredo Ortega

Unable to load lora for large models

I think its likely because of this code: params = {} params['device_map'] = {'': 0} #params['dtype'] = shared.model.dtype shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params) See how it resets the 'device_map' that...

Unable to load lora for large models

I confirm that @sgsdxzy patch now successfully loads alpaca-lora-30b on 2x3090 GPUs using int8 quantization

Unable to load lora for large models

I'm on 8747c74339cf1e7f1d45f4aa1dcc090e9eba94a3, now it loads Lora and 30b in 2x3090, no problem.

125G of memory, executing merge-weights.py on 30B will oom

Set up at least a 40G swap, it need about 130G of memory for merging 30B

4bit LLaMA-65B: DefaultCPUAllocator:not enough memory, on 64GB RAM and 48GB total VRAM

Try adding --auto-devices

4bit LLaMA-65B: DefaultCPUAllocator:not enough memory, on 64GB RAM and 48GB total VRAM

Maybe increase swap file? I have same setup but 96 GB RAM and it uses swap.

Can't resume from checkpoint

Ok found the problem. This PR fixes it https://github.com/artidoro/qlora/pull/44 But it is not yet merged (even if a comment says it is)

Fixed ordered-list render bug

Yes, here's an html example. I'm using Lazarus 3.0.0, FPC 3.2.2 and Ubuntu (but the bug happens in Windows too): ``` body{background-color:white;}table{width:100%;margin:0 auto;} td{width:100%;word-wrap:break-word;}pre{}write a list of 10 words Abundance...

[Bug]: Flash attention cannot be used on v0.5.3

I can use it and it works, but its slightly slower, 9tok/s activated, 11.5 tok/s deactivated, inference on Llama3-70B-8bpw, 4x3090 gpu.

[Bug]: Cannot load 70b exl2 5bpw model across 4 GPUs.

I hit a similar bug: Environment: 4x3090, Cuda 12.4, Aphrodite 0.53, total 96GB of VRAM, tensor parallel=4. When I try to load elinas_Meta-Llama-3-120B-Instruct-4.0bpw-exl2 (61GB), it runs out of VRAM instantly,...