LSH_DeepLearning icon indicating copy to clipboard operation
LSH_DeepLearning copied to clipboard

Batched Computation

Open allenanie opened this issue 8 years ago • 4 comments

Hi Ryan,

The LSH based forward and backward computation is a great idea and your paper has a nice contribution!

However, you have to sample from the LSH buckets (different weight vectors w_i) for each input because they might indicate different weight vectors. Is the implementation here batched? Do you have suggestions to how to implement your algorithm in a batched setting?

Allen

allenanie avatar Aug 29 '17 17:08 allenanie

In a batched setting, I would combine the weight vectors for all of the examples.

Say you have a very large hidden layer with 10K neurons and a batch size of 32. If you had 100 neurons per example (about 1%), you would have at most 3200 active neurons.

We envision our idea is most effective for very large datasets that require large parameter spaces. Obviously, if you have a small dataset, you can go ahead and use a smaller network.

We plan to implement this approach in our PyTorch/TensorFlow implementation.

Cheers, Ryan

rdspring1 avatar Aug 29 '17 17:08 rdspring1

Hi Ryan, I'm trying to implement the algorithm in PyTorch :)

But the hashing :( does your Java implementation has GPU-based hashing? Since you mentioned that implementing this in PyTorch/TF is one of your objectives, do you have some pointers on how to do GPU-based hashing in PyTorch?

allenanie avatar Aug 31 '17 04:08 allenanie

SimHash or Signed Random Projection (SRP) uses a standard matrix-multiplication. You will need a GPU kernel to turn the signed bits into integers.

The tricky part depends on the size of your dataset. Computing the hashes is expensive. i.e. (100K nodes and 1K dimension) weight matrix and (1K dimension x kL bits) random projection

There are advancements in the hashing literature to make this step more efficient. i.e. Welsh-Hadamard Transform, Very-Sparse SRP

rdspring1 avatar Aug 31 '17 19:08 rdspring1

If for now I ignore the inefficiency that occurs during hashing of my weight matrices (for every several batches), I should write a CUDA C++ file that does SRP (which is simple), but I'll still need a GPU kernel that casts signed bits into integers right? Is this accomplishable by just C++ CUDA?

allenanie avatar Sep 01 '17 23:09 allenanie