SimHash Document Encoder: real-world validation example of the encoder (~MNIST)
From https://github.com/htm-community/htm.core/pull/603#issuecomment-520743858
Would be nice to try some real-world validation of the encoder. @crimsoncress is working on classification in the vision domain (using the MNIST examples), you could do similar with a text-classification dataset.
- spam/ham is the classics of text-classification
- we might look for some interesting NLP datasets, such as "topic classification", "document summary"
@breznak
@brev up to a game? We should revisit the SimHash and vision domain.
@breznak Sorry for the slow reply! I've been off deeply focused trying to get a little business off the ground, and 2020 has not been helping much. Once I can take a little break from that, I'll be very excited to work on another htm.core project. Hopefully around New Years, I'll let you know.
In the meantime, I can definitely get my brain slowly mulling on a future topic, until I'm able to work on it. Let me know what you think, you mentioned the vision domain? I'm still pondering on simhash word ordering still, too.
Hope all is well!