thies1006 comments

Results 27 comments of


                                            thies1006

Question: Running generation with batches

Great, thank you for the quick reply! Unfortunately, with your script I get the same results as before: BS=2: Hi! How are you? I just got back from a long...

Question: Running generation with batches

Let me add: - Switching off the GPU seems to solve the problem. By setting CUDA_VISIBLE_DEVICES=-1 I get (nearly) the same scores across different batch sizes. Generated is always "Hi!...

Question: Running generation with batches

First, to your questions, sorry if it wasn't 100% clear. - yes, the results I got from your script. Copied, pasted and run. - ParlAI was freshly installed from scratch,...

Question: Running generation with batches

I was printing tensors to find where the differences occur and the first one I found is here: [https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331](https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules.py#L1331) (line: x = self.lin2(x)) I was looking only at the very...

Errors in generation (Bloom) when changing options sampling/use_cache

The second error appears very sporadically (after thousands of cycles) as well when only using `generate_kwargs = dict(max_new_tokens=num_tokens, do_sample=True)`, but only (I think) when changing the input between each cycle,...

Errors in generation (Bloom) when changing options sampling/use_cache

Thanks @pai4451. To follow up, my impression is that the script works fine for short texts, but when it comes to longer ones (>200 tokens) it more likely to crash....

Errors in generation (Bloom) when changing options sampling/use_cache

On my side I still get the error `RuntimeError: CUDA error: an illegal memory access was encountered` (with 128 input tokens, Cuda 11.3 and batch size 1). However, my impression...

Errors in generation (Bloom) when changing options sampling/use_cache

Probably related: https://github.com/microsoft/DeepSpeed/issues/2062

About reshape deepspeed checkpoint

I tried to create an universal checkpoint of the 3B model (following the instructions in the link) and I get `KeyError: 'param_slice_mappings'`. It seems that this key is created by...

About reshape deepspeed checkpoint

I tried with those PRs and it seems that the problem stays the same. ``` Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args,...