Anurag Singh comments

Results 11 comments of


                                            Anurag Singh

Hindi NER Support for Inltk

@avinsit123 How about using word level inltk embedding and then xgboost to classify the tokens?

Urdu, Kashmiri and Maithili Support

I am falling short of memory while creating TextLMDataBunch with only 100K articles and 32K vocabulary. How much memory is required to create the data for language model?

Urdu, Kashmiri and Maithili Support

Thank you for the information. The issue was that single file was having over 350K character which was unable to tokenized and numericalized at once and loaded into main memory...

Urdu, Kashmiri and Maithili Support

I have completed for Urdu and here is the [link](https://github.com/anuragshas/nlp-for-urdu) Resources for Kashmiri language is very scarce and some of them are paid, there are epaper websites having images. I...

Urdu, Kashmiri and Maithili Support

You are welcome. I am really happy that I will be able to raise my first PR on github. After going through the the code, I guess i will have...

Urdu, Kashmiri and Maithili Support

@goru001 Here is the link of [MaithiliWikiArticles](https://drive.google.com/open?id=15-Yy5Zfr7GIKEN0-d7kWMRGRhYamd6vH). I have been busy searching for job, I will create PR for urdu lm as soon as I get free.

Urdu, Kashmiri and Maithili Support

@goru001 I have uploaded the urdu model using the instructions mentioned [here](https://github.com/goru001/inltk/issues/2#issuecomment-478350926). Please let me know what changes shall I make to create PR

Urdu, Kashmiri and Maithili Support

pycache and .idea folder is already present to the repo shouldn't that be removed?

Urdu, Kashmiri and Maithili Support

@ankur220693 I am actually short of data for working on maithili. The model that I had created was overfitting therefore I had to put it on hold. If you can...

Urdu, Kashmiri and Maithili Support

For Kashmiri there is not enough data available publicly to work on, check on Oscar or Wikipedia dump if there is data available. Last time I had scraped it was...