prediction_api icon indicating copy to clipboard operation
prediction_api copied to clipboard

How to train new model?

Open snakecy opened this issue 6 years ago • 5 comments

As the tool supply the funcion of "NSFC Subject Classifier", which input is chinese. so how can i train the same model for english version? Q: A) What is the train data format? B) What is the definition of the level? C) What is the training tool?

so appreciation for your reply.

snakecy avatar Sep 02 '19 11:09 snakecy

hi,let me answer your question. (A) We use abstracts of paper to train the model. The lable is NSFC uses a three-level depends on the classification of Natural Science Foundation of China(NSFC)
e.g: " A04 abstract_paper" ",when predic the level-1 subject. The predict for subject level-2, you should do like that "A040412 abstract_paper" .We trained three models, in order to predic Three Level Disciplines. (B) NSFC three-level depends on the classification of Natural Science Foundation of China(NSFC) (C) In this cases ,We use fasttext model as a classifier. You can also replace it with other model that suits your work

in fact,the answer for your Q2,Q3 is in the classifier.py lied on "prediction_api/src/classifier.py". see the line 4 and line 25.

wengenihaoshuai avatar Sep 03 '19 12:09 wengenihaoshuai

@wengenihaoshuai Do you mean the model trained for each level separatly?

snakecy avatar Sep 06 '19 03:09 snakecy

Not one model, but three.We trained three models for three levels separatly. It is here that, the code at line25-27 in prediction_api/src/classifier.py is used for load three model.

wengenihaoshuai avatar Sep 07 '19 05:09 wengenihaoshuai

@wengenihaoshuai Okay, Thanks a lot!

snakecy avatar Sep 09 '19 06:09 snakecy

@wengenihaoshuai

We use abstracts of paper to train the model. The lable is NSFC uses a three-level depends on the classification of Natural Science Foundation of China(NSFC)
e.g: " A04 abstract_paper" ",when predic the level-1 subject. The predict for subject level-2, you should do like that "A040412 abstract_paper" .We trained three models, in order to predic Three Level Disciplines.

I download the AMiner paper data, which does not label subject category. How do I map the paper label to NSFC?

snakecy avatar Oct 25 '19 08:10 snakecy