Raj Dabre

Results 14 issues of Raj Dabre

Hi, Your attention mechanism is quite slow. Since you compute the linear projections (aw and bw) each time although they do not change, the time is almost quadratic. I have...

## ❓ Questions and Help ### Before asking: 1. search the issues. 2. search the docs. #### What is your question? The readme [here](https://github.com/facebookresearch/fairseq/blob/main/examples/fully_sharded_data_parallel/README.md) shows how to run fairseq in...

question
needs triage

When I use an 8-bit ADAM with FSDP, I get an error as follows: `RuntimeError: output tensor must have the same type as input tensor` If my understanding is correct,...

Duplicate
Contributions Welcome
FSDP
To Discuss Internally
Optimizers

Currently I only support Adam but would be nice to have all others like SGD, adagrad etc. Ditto for schedulers.

good first issue

Currently, I have implemented the mBART (span denoising) and mT5 (span prediction) pre-training approaches but according to the ULL2 paper (https://arxiv.org/pdf/2205.05131.pdf) a more comprehensive mixture of denoisers would help a...

Currently the mbart backbone code I use has pre-norm which is layer(norm(input))+input whereas some people seem to say that postnorm which is norm(layer(input)+input) might be better for zeor shot. Lord...

good first issue

Like: WPS (padding/non-padding), average sentence lengths in batch, etc. This one is totally easy.

good first issue

Currently I have provided my own modded fork of transformers but if someone doesnt care about the features I have added and only wants to work with the code mbart...

help wanted

Currently, YANMTT assumes that only GPUs are to be used but some people might want to use a massive cpu cluster. Add a flag and then modify the relevant parts...

good first issue

Currently, if you want to run a command it has to be "python [script] [arguments]". Someone told me that it would be [cooler](https://user-images.githubusercontent.com/8413449/192447747-ce3de819-173c-4cc7-b793-a7749421cf35.jpeg) if people could do the same via...

documentation
enhancement