Lianke Qin comments

Results 15 comments of


                                            Lianke Qin

Applications (photo proof and ML model)

I have the same question here.

parallelize inline_all_lcs() API

> Ah, it is a little bit hard to rewrite this function in a parallel way (somehow linear). > > One potential temporary solution is to avoid adding many FpVars...

parallelize inline_all_lcs() API

> AFAIU, the correctness guarantees of that function as written require it to be executed sequentially. > > Theres a couple things I'd like to understand: > > 1. How...

parallelize inline_all_lcs() API

Yeah. And the performance suffers a lot from such a huge amount of symbolic LC. I thought the constant * FpVar is almost for free but it turns out to...

parallelize inline_all_lcs() API

Some microbenchmarking results: constant vector * witness vector length of 10000 : CRS generation time ~60 seconds, prove time ~60 seconds, which is far from almost free. witness vector *...

parallelize inline_all_lcs() API

This is the microbenchmark I wrote: https://github.com/brucechin/vector-dot-product-microbench/tree/master

what if I write a syncBN from mx.symbol.BatchNorm?

yeah, we're planning to implement it in the backend using shared memory, which I think is slower than NCCL

what if I write a syncBN from mx.symbol.BatchNorm?

In mx.symbol.batchnorm_v1, the operator is a class and I can add NCCL communicator/cudaStream into class private variable. they only need to be initialized once. In mx.symbol.batchnorm, NNVM is introduced and...

what if I write a syncBN from mx.symbol.BatchNorm?

btw, A deep neural net consists of more than one BN layer at most time. I'm wondering how to assure the same layer in the DNN across different GPUs can...

what if I write a syncBN from mx.symbol.BatchNorm?

I'll try the FStatefulCompute later, thanks for your suggestion.