Yaroslav Bulatov
Yaroslav Bulatov
1. Use pre-warming since first call is slower due to initializations (ie PTX compilation) 2. Use `sess.run(a.op)` instead of `sess.run(a)` to avoid transferring any data from TF runtime 3. Make...
AMD GPUs are not supported - for neural nets in general you need cudnn which is NVidia only
oops, my bad...I guess the standard -march=native flag should work
if you run things in parallel, then AVX2 could reduce your performance since each process will run more cores so parallel processes will compete with each other
Letting square lattice labels stay as "row,col" would be useful. I was playing around with IGSquareLattice for slitherinks puzzle (https://mathematica.stackexchange.com/questions/212388/importing-a-grid-of-numbers-from-an-image) , and row,col would make things easier
Some more thoughts. - Having 1 run entry per process and using grouping is the wrong abstraction for distributed training. Even though there are several processes, there's still one set...
I eventually settled on following logic. 1. Use a global "globals.py" module to keep track of global step and global settings. This simplifies things when training/initialization logic is split across...
It's the complete binary tree of depth 5  Behavior is still present in master, this colab is an easy way to reproduce https://colab.research.google.com/drive/14JwffnpbyxlQUv93js7JzZ2j38VIE2FD Search time jumps 150x when going...
A work-around is to use `size` option when creating optimizer, the slowdown is not observed in such case. IE `opt=DynamicProgramming(minimize='size')` I'm curious why this algorithm gives such a jump in...
Perhaps random subgraphs of k-trees? Any einsum graph computable in O(n^{k+1}) time is a subgraph of a k-tree, so if you make it work with probability 1 for some k,...