thies1006
thies1006
Great, thank you for the quick reply! Unfortunately, with your script I get the same results as before: BS=2: Hi! How are you? I just got back from a long...
Let me add: - Switching off the GPU seems to solve the problem. By setting CUDA_VISIBLE_DEVICES=-1 I get (nearly) the same scores across different batch sizes. Generated is always "Hi!...
First, to your questions, sorry if it wasn't 100% clear. - yes, the results I got from your script. Copied, pasted and run. - ParlAI was freshly installed from scratch,...
I was printing tensors to find where the differences occur and the first one I found is here: [https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331](https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331) (line: x = self.lin2(x)) I was looking only at the very...
The second error appears very sporadically (after thousands of cycles) as well when only using `generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=True)`, but only (I think) when changing the input between each cycle,...
Thanks @pai4451. To follow up, my impression is that the script works fine for short texts, but when it comes to longer ones (>200 tokens) it more likely to crash....
On my side I still get the error `RuntimeError: CUDA error: an illegal memory access was encountered` (with 128 input tokens, Cuda 11.3 and batch size 1). However, my impression...
Probably related: https://github.com/microsoft/DeepSpeed/issues/2062
I tried to create an universal checkpoint of the 3B model (following the instructions in the link) and I get `KeyError: 'param_slice_mappings'`. It seems that this key is created by...
I tried with those PRs and it seems that the problem stays the same. ``` Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args,...