HashingDeepLearning icon indicating copy to clipboard operation
HashingDeepLearning copied to clipboard

Porting to Cython

Open mritunjaymusale opened this issue 5 years ago • 6 comments

Is it possible for you to port this into Cython and release it as PyPI package ? It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.

mritunjaymusale avatar Mar 03 '20 19:03 mritunjaymusale

Thanks for your suggestion. Let us make this the priority! We'll @ you when it is done.

keroro824 avatar Mar 04 '20 22:03 keroro824

Thank you @keroro824 !

rahulunair avatar Mar 04 '20 22:03 rahulunair

Sort of related, but I've been building R bindings.

wrathematics avatar Mar 05 '20 18:03 wrathematics

@wrathematics Thanks for contributing 👍

keroro824 avatar Mar 06 '20 00:03 keroro824

Hi, are there any updates on this?

its-sandy avatar Jul 28 '20 03:07 its-sandy

Is it possible for you to port this into Cython and release it as PyPI package ? It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.

I'm also interested in implementing such a thing. But it seems to me the way to do this would be to implement custom layers instead of builtin ones. This could be added to the main codebase once it is tested rather than a separate package.

For example in pytorch you would first subclass 'torch.autograd.Function' to implement forward and backward operations which calculate the hashing operations and take that into account in forward and back propagation. Cython might not be needed as you might be able to use numba and get better performance more easily.

@keroro824 I've actually started doing what I described. I have a question: Do you have some justification for only propagating the gradient to active neurons? It's not obvious to me why this would be a good approximation of the true gradient. There is another method the math would suggest: the gradient w.r.t the input of a linear layer is (repeated indices indicate a sum): y_i = W_{ij} * x_{j} + b_i dx_k = dy_i * W_{ik} So we can use LSH for calculating the backprop but we need more hash tables than the paper suggests. The multiplications in the backprop are by columns of the weight matrix, and the forward prop is multiplication by rows of the weight matrix. Did you try something like this?

It would be very interesting to me to implement this.

nomadbl avatar Nov 06 '21 17:11 nomadbl