YooSungHyun
YooSungHyun
### System Info ubuntu 18.04 python 3.6, 3.9 transformers 1.18.0 ### Who can help? @patrickvonplaten, @anton-l ### Information - [X] The official example scripts - [ ] My own modified...
### System Info Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points. - `transformers` version: 4.21.1 - Platform: Linux-4.15.0-177-generic-x86_64-with-glibc2.27 - Python version: 3.9.13 -...
UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP...
## Title RNNT Doesn't backpropa with BPTT? ## Description i see, Trainer code and model loss output, RNN-T does't BPTT is that right? and why? - resolved #
## Context * Pytorch version: 3.10 * Operating System and version: ubuntu 22.04 ## Your Environment * Installed using source? [yes/no]: no * Are you planning to deploy it using...
### System Info transformers==4.38.2 torch==2.2.1 rtx 3090 * 4 and cuda 12.1 ### Who can help? @muellerz @pacman100 @stevhliu ### Information - [X] The official example scripts - [ ]...
i looked pred_grad, trans_grad... so, warp-transducer's backprop like bptt?? or, i have to implement bptt?? but, that is so hard, because rnnt_loss output is batch meaning loss... not time sequence..
In example, only adding LoRA to Linear. I was thinking of a general QRoLA learning, and I want to add LoRA to q,k,v,o. So if I set `auto_find_all_linears=False` and enter...
# What does this PR do? Fix typo `from_pretained` to `from_pretrained` Fixes # (issue) https://github.com/huggingface/optimum/issues/1088 ## Before submitting - [x] This PR fixes a typo or improves the docs (you...
논문에서는 bptt를 사용했다고 봤던 것 같은데, 해당 소스에는 bptt 구현은 없는 것 같네요? layer가 한종류라면은 쉽게 구현 가능할텐데, transcript와 prediction이 따로 네트워크로 존재하다보니, 어떻게 해야할지 크게 난항을 겪고있습니다 ㅠㅠㅠ 혹시...