Leon Song comments

Results 14 comments of


                                            Leon Song

Nan loss for quantization

> Same issue, using PyTorch 1.1.0 Change PyTorch version to 1.0.1 may solve the NaN loss issue.

Reading a multi-frame PDB file created by VMD

Same issue needs help (MDAnaylsis == 2.3.0)

Single-line Infilling Results reproduction

Same issues. Cannot reproduce the infilling results as paper reported, a bit lower. Any ideas?

Single-line Infilling Results reproduction

> Dear @shivamag125 , @timxx and @stgzr, thanks for reporting! > > @timxx : The instruction models are not intended to be used for infilling, please use the pretrained models....

[BUG] Deepspeed Training with Stage 3 job hang and fail

Same issue.

Norm 1 of tensor - bfloat16 - rounding errors?

Any update to this issue? It still happens in `pytorch==1.14.0a0+410ce96`

Loss does not converge with FSDP cpu offloading

I suppose there is a bug in the gradient accumulation implementation. If the model is wrapped by a `DistributedDataParallel` module, when calling `backward`, the gradient should be averaged across GPUs....

aborted (core dumped)

Same issue, > F tensorflow/stream_executor/cuda/cuda_driver.cc:316] current context was not created by the StreamExecutor cuda_driver API: 0x42aa310; a CUDA runtime call was likely performed without using a StreamExecutor context Aborted (core...

The provided qkv memory layout is not supported! When using RoPE

Same issue. Any solutions?