Irhum Shafkat issues

Results 5 issues of


                                            Irhum Shafkat

Parameter collection and GPU movement fail on models defined via functions

In the Flux [docs](https://fluxml.ai/Flux.jl/stable/models/basics/), one of the ways in which a model can be constructed is shown as ```julia function linear(in, out) W = randn(out, in) b = randn(out) x...

Creating a new Stacks.jl library

The Stacks structure introduced in this package (https://chengchingwen.github.io/Transformers.jl/dev/stacks/) is versatile enough that any multi-input multi-output model in the Julia ecosystem could potentially benefit. Opening this issue to suggest that it...

Elaboration on the pretrained model used

The paper mentions that for VGG-like training, a pretrained model was used. Could a link be provided for the checkpoint file of the pretrained model so the vgglike-sbp.py experiment can...

torch.compile graph breaks at `forward`

When using `torch.compile`, we observe the following graph breaks at all TransformerEngine components. This appears to lead to a large number of lookups by TorchDynamo for each subgraph, resulting in...

performance

Assess model perplexity on eval split

The original splits over UniRef50 can be found in the original [repo](https://github.com/facebookresearch/esm). Using a subset of them, need to compute either: * the randomized masking perplexity * the pseudo-perplexity with...

enhancement