Seth Troisi
Seth Troisi
It looks like this is covered in your paper (http://www.acsel-lab.com/arithmetic/arith23/data/1616a047.pdf). I'm working on grokking the relevant section, the new question is: was it determined possible but not implemented? or determined...
I'm currently looking at [core_mont_xmad.cu](https://github.com/NVlabs/CGBN/blob/master/include/cgbn/core/core_mont_xmad.cu) because that's the code that executes on my 1080ti (`CUDA_ARCH` = 610) I see the double nested for loop handling each limb of `a` and...
I cloned `mont_mul` from [core_mont_imad.cu](https://github.com/NVlabs/CGBN/blob/master/include/cgbn/core/core_mont_imad.cu) and specialized it for squaring. I used a similar approach to your xmad two stage 16 bit alignment trick. [My code](https://github.com/NVlabs/CGBN/compare/master...sethtroisi:square?expand=1) only works when each...
@darbysauter, very late to this party but `pfgw` would be a better starting place for large prime searching. While I'm sure the limit could be increased to 64K digits it's...
If you are looking for integer log you can approximate it with `clz` `truncate(log2(n)) = BITS - clz(N) - 1`
I increased the reproducibility by setting `a = n - bn2mont(1)` With `BITS=1024 TPI={8,16,32}`, this fails for all 2^n-1, n >= 683 With `BITS=512 TPI={8,16}`, this fails for all 2^n-1,...
This error is present if I use `core_mul_xmad.cu` or `core_mul_imad.cu`
I looked at testing this with unit_test but I found that experience hard. Is it expected that make takes ~90 seconds for tester? Is there a faster way to write...
This requires an index on filename which increases database size ~30% which will be >10gb so I'm going to not implement at this time. If there are more use cases...
start = 52.1 gb sqlite> delete from games where model_id not between 16000000 and 16999999; vacuum => 6.2 gb sqlite> delete from eval_games where model_id_1 not between 16000000 and 16999999;...