artitw

Results 90 comments of artitw

Can we store the model in somewhere like Google Drive and only download it when the Identifier is used? This approach would follow the existing convention to keep the core...

@Mofetoluwa thanks for the updates and the pull request. I added some comments there. With regards to the third point you raise, when I tested the model, it returned "hy"...

I think training with shorter texts and approach 2 would address the issue. Another approach us to use 2D embeddings. Currently we are using 1D embeddings, which are calculate by...

@Mofetoluwa yes, we can do `vectorize(output_dimension=2)` as specified in the [latest version](https://github.com/artitw/text2text/blob/77b548d5f9855088db149a94bbbfa310b7c0e3e1/text2text/vectorizer.py#L7). Also note that the default 1D output should be improved now compared to the version you used most...

Yes, a comparison of both would be useful. Thanks so much for checking the shorter texts. It will help to confirm the fix for the way 1D embeddings are calculated.

Hi Mofe, 1. Are we sampling the data so that each class is balanced when training? 2. Could we update the README so that users could have some documentation to...

Could we also add the `Identifier` in the README's [class diagram](https://github.com/artitw/text2text#class-diagram)?

@Mofetoluwa, what do you think about using the TFIDF embeddings to perform the language prediction? I think that might be better than the neural embeddings currently used, as it won't...

No, it should work on just CPU. Give it a try and let us know if you have any issues

This is possible. You would want to control the answer using the [SEP] token. We could also consider implementing the functionality directly into the code base if that’s what you...