Jerry Mannil

Results 10 comments of Jerry Mannil

@nivibilla Do you have time to work on this soon ?

> Hi, thanks for this PR and this codebase! > > I tested this and the model works well for short context lengths but fails on longer (>500 token) context....

@Chillee @yanboliang Can one of you approve?

PR merged. This issue can be closed now.

Seeing similar issues with AMD gpus as well. With AMD GPUs, we are seeing a memory fault rather that device assertions. Looks like kernels generated for AMD doesn't have these...

Observations: 1. Running with "--compile_prefill" alone without "--compile" can run fine (i.e I had to move prefill compile outside of [if compile](https://github.com/pytorch-labs/gpt-fast/blob/f6973170327003c6b1ce7edb5c015b4fa0097e6d/generate.py#L306) check 2. The error happens during the first...

Looks like prefill compile can work, if I change `next_token.view(1, -1)` to `next_token.clone().view(1, -1)` [here](https://github.com/pytorch-labs/gpt-fast/blob/f6973170327003c6b1ce7edb5c015b4fa0097e6d/generate.py#L202C54-L202C76)

The prefill issue w.r.t assertion (Nvidia) and memory fault (AMD) should be fixed by https://github.com/pytorch-labs/gpt-fast/commit/2c339141640155b8e7e38c252b7601c07305685b So we can close this issue.

I am looking at this today. So plz hold off on merging.

@doru1004 Plz run pytorch UTs as well. UTs that reference softmax: https://github.com/search?q=repo%3Apytorch%2Fpytorch+path%3A%2F%5Etest%5C%2F%2F+softmax&type=code After that you can also try to run the full pytorch UT test suite `CONTINUE_THROUGH_ERROR=1 .ci/pytorch/test.sh`