vladyorsh comments

Results 10 comments of


                                            vladyorsh

Pathfinder task

@yinzhangyue > I can't reproduce performer's result in pathfinder32_hard task either. Get just 50.47% best eval result. My training shell script is as follow : `PYTHONPATH="$(pwd)":"$PYTHON_PATH" python lra_benchmarks/image/train.py \ --config=lra_benchmarks/image/configs/pathfinder32/performer_base.py...

Pathfinder task

@MostafaDehghani Thanks, these make sense. Albeit I'm still struggling to reproduce the results after re-implementation.

Publish number of parameters for each task

Hi @redna11, can you tell which MLP dim you considered when calculated the size of a text classification model? It seems that it was 512, while I see 1024 in...

Is it possible to include instructions on how to run it on GPUs

I'd also like to duplicate @La-SilverLand question. Currently I'm trying to fit the Pathfinder model code into a V100 GPU, and you have provided all tools for that except the...

Is it possible to include instructions on how to run it on GPUs

Thanks for response. It seems that in this case Transformer implementations in the repo should be fine (at least most of them) -- LayerNorms won't use batch-wise statistics.

Approximated values are off

Can you also take a look at the https://github.com/lucidrains/h-transformer-1d/issues/22? This may be relevant.

Fix indexing

Thank you for fixing the "-2 -> -1" number of levels issue, I was going to make a pull request on it myself.

Some questions about the shape of A,B,C,D

Hi @Hprairie, thanks for clarifying, just got brought to the issue because of the discrepancy between the paper and the code. Have your or any of your colleagues run experiments...

Some questions about the shape of A,B,C,D

Hi @Hprairie, hi @albertfgu, thank you for the very comprehensive answers, I appreciate that!

Some questions about the shape of A,B,C,D

Hi @Hprairie , hi @albertfgu, sorry to bother you again, but you mentioned that you experimented extensively with some variants of discretization and potentially other parameters. From the whole variety...