Andcircle comments

Results 30 comments of


                                            Andcircle

(Python) OSError: protocol not found

Here's the completed code, thanks

(Python) OSError: protocol not found

@temoto I use Win 7, can I do something like reinstall netbase as in ubuntu? Thanks for help.....

(Python) OSError: protocol not found

@temoto Thanks, that's I'm thinking about. Never use Linux before, it maybe the time to take the shot....

Model Parallelism and accelerate's usage of DDP aren't compatible

@sgugger need your guidance, I wanna use ``` model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map={'':torch.cuda.current_device()}, trust_remote_code=True ) ``` to train 40b, but also wanna DDP, then how should I achieve it?...

Model Parallelism and accelerate's usage of DDP aren't compatible

@sgugger Thanks for your fast help. But what if the model is too big for one GPU device?

Model Parallelism and accelerate's usage of DDP aren't compatible

> I feel like you are not listening. You cannot use `DDP + device_map="auto" ` and thus not `DDP + device_map="auto" + DeepSpeep` either. You need to just use DeepSpeed...

Model Parallelism and accelerate's usage of DDP aren't compatible

@muellerzr Sorry I wanna bring this up again, is it possible to add this functionality as a feature, background is we wanna tune 70b or 8x7b model as a teacher,...

Model Parallelism and accelerate's usage of DDP aren't compatible

@muellerzr @maxidl, because I loaded the model in 4bit so I also comment out this line: https://github.com/maxidl/accelerate/blob/332d960d625deda76090c32a6e67dee70be76761/src/accelerate/accelerator.py#L1342 But don't know is there any bad effect, it starts to train at...

[Usage] AttributeError: 'Parameter' object has no attribute 'quant_state'

Any updates here?

NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@kushalj001 I'm also facing similar error, but like you said, it can be reproduced but very tricky to write a small code snippet to do it. My project is a...