htm.core icon indicating copy to clipboard operation
htm.core copied to clipboard

SimHash Document Encoder: real-world validation example of the encoder (~MNIST)

Open brev opened this issue 6 years ago • 2 comments

From https://github.com/htm-community/htm.core/pull/603#issuecomment-520743858

Would be nice to try some real-world validation of the encoder. @crimsoncress is working on classification in the vision domain (using the MNIST examples), you could do similar with a text-classification dataset.

  • spam/ham is the classics of text-classification
  • we might look for some interesting NLP datasets, such as "topic classification", "document summary"

@breznak

brev avatar Aug 13 '19 22:08 brev

@brev up to a game? We should revisit the SimHash and vision domain.

breznak avatar Jun 02 '20 12:06 breznak

@breznak Sorry for the slow reply! I've been off deeply focused trying to get a little business off the ground, and 2020 has not been helping much. Once I can take a little break from that, I'll be very excited to work on another htm.core project. Hopefully around New Years, I'll let you know.

In the meantime, I can definitely get my brain slowly mulling on a future topic, until I'm able to work on it. Let me know what you think, you mentioned the vision domain? I'm still pondering on simhash word ordering still, too.

Hope all is well!

brev avatar Sep 23 '20 23:09 brev