hiyijian

Results 21 comments of hiyijian

I had the same problem. We need a way to exclude SW layer from O2 just like BN. But I have not found a proper way

Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how? Thanks

how about the finnal ROC performance on FDDB please? Is it also the same as original one ?

Yes. The Plugin only support for RoCE now?

@paravmellanox is there any update now? Thanks

@addcloud I am not an expert at network stuff at all. I used to stuck in enabling SRIOV for a quite long time. The reason for failing to enable it...

These is no network initialization in this repo. Probably, this is the reason why we get totally diffrient results by using CUDA10.2 and CUDA 9.2

@danthe3rd I also need alibi support. for now, I pass ```bias = LowerTriangularMaskWithTensorBias(alibi_bias)``` to ```xops.memory_efficient_attention(..., attn_bias=bias )```. The forward only is ok, but failed at backward in training mode. Is...

@borisfom Maybe another mismatch: wgrad_norm in your code is computed from "g + beta* w"(it is computed after regularization), not exactly the same as paper's "g".