vladyorsh

Results 10 comments of vladyorsh

@yinzhangyue > I can't reproduce performer's result in pathfinder32_hard task either. Get just 50.47% best eval result. My training shell script is as follow : `PYTHONPATH="$(pwd)":"$PYTHON_PATH" python lra_benchmarks/image/train.py \ --config=lra_benchmarks/image/configs/pathfinder32/performer_base.py...

@MostafaDehghani Thanks, these make sense. Albeit I'm still struggling to reproduce the results after re-implementation.

Hi @redna11, can you tell which MLP dim you considered when calculated the size of a text classification model? It seems that it was 512, while I see 1024 in...

I'd also like to duplicate @La-SilverLand question. Currently I'm trying to fit the Pathfinder model code into a V100 GPU, and you have provided all tools for that except the...

Thanks for response. It seems that in this case Transformer implementations in the repo should be fine (at least most of them) -- LayerNorms won't use batch-wise statistics.

Can you also take a look at the https://github.com/lucidrains/h-transformer-1d/issues/22? This may be relevant.

Thank you for fixing the "-2 -> -1" number of levels issue, I was going to make a pull request on it myself.

Hi @Hprairie, thanks for clarifying, just got brought to the issue because of the discrepancy between the paper and the code. Have your or any of your colleagues run experiments...

Hi @Hprairie, hi @albertfgu, thank you for the very comprehensive answers, I appreciate that!

Hi @Hprairie , hi @albertfgu, sorry to bother you again, but you mentioned that you experimented extensively with some variants of discretization and potentially other parameters. From the whole variety...