MITIE too long too train new model use mitie

how can i improve this situation give me some suggestion ，please

Dec 27 '17 06:12 autopost-get

As far as I know, it will depend on the size of your datasets. Also, check the memory usage (maybe you are out of memory and the system starts to use swap)

Jan 03 '18 01:01 grafael

You should also make sure that your labels are consistent. Datasets that are harder to label take longer to train. So if, for example, you have a huge number of labeling mistakes training will take a long time.

Jan 03 '18 02:01 davisking

@davisking @grafael could you please elaborate or give a little more examples what is inconsistent labeling ? - is it mean number of labels varies from sentence to sentence !! like one sentence has 3 labels and other sentence has 5 labels..etc how labeling mistakes affecting application performance ? can MITIE NER run on a distributed architecture ?

Feb 28 '18 06:02 munaAchyuta

It just means you are labeling your data with incorrect labels. Like maybe sometimes you label references to the city of Boston as "city" and other times as "place". Or maybe other times you don't label it at all.

Feb 28 '18 12:02 davisking

Thanks @davisking for your quick reply. this time i made sure that my annotated training data have consistent labels. but still i don't see any progress in performance.

from log what i found, In mitie there are two trainings going on..

PART-I( Train Segmenter) : which is working fine as it is using all cores and using required memory.
PART-II(Train Segment Classifier) : this is not using all cores. and also using huge memory compare to data size. CRITICAL - taking huge time.

here is some log .. Training example annotated sentences count : 373 Machine Details : RAM - 8GB, cores - 4 cores Training to recognize 3 labels: 'B-act', 'B-sub', 'B-org' Part I: train segmenter words in dictionary: 200000 num features: 271 now do training C: 20 epsilon: 0.01 num threads: 4 cache size: 5 max iterations: 2000 loss per missed segment: 3 C: 20 loss: 3 0.431569 C: 35 loss: 3 0.437956 C: 20 loss: 4.5 0.443431 C: 5 loss: 3 0.440693 C: 20 loss: 1.5 0.415146 C: 6.5555 loss: 5.16517 0.457117 C: 0.1 loss: 8.09489 0.327555 C: 0.1 loss: 3.81119 0.30292 C: 0.549491 loss: 5.61437 0.430657 C: 7.8466 loss: 5.43597 0.457117 C: 10.2883 loss: 5.1293 0.452555 C: 6.0607 loss: 5.02357 0.456204 C: 7.41861 loss: 5.28618 0.457117 C: 7.13806 loss: 5.20935 0.458029 C: 6.98092 loss: 5.16756 0.455292 best C: 7.13806 best loss: 5.20935 num feats in chunker model: 4095 train: precision, recall, f1-score: 0.703608 0.747263 0.724779 Part I: elapsed time: 1027 seconds.

Part II: train segment classifier now do training num training samples: 1441 C: 200 f-score: 0.734335 C: 400 f-score: 0.735081 C: 300 f-score: 0.731994 C: 500 f-score: 0.735241 C: 700 f-score: 0.734709 C: 520 f-score: 0.733273 C: 450.957 f-score: 0.733804 C: 483.4 f-score: 0.736308 C: 480.156 f-score: 0.735241 C: 490.078 f-score: 0.734653 C: 484.607 f-score: 0.735241 C: 482.381 f-score: 0.732305 C: 483.799 f-score: 0.734653 C: 483.236 f-score: 0.732149 best C: 483.4

test on train: 286 2 0 3 0 759 0 3 0 0 43 0 4 6 0 335

overall accuracy: 0.987509 Part II: elapsed time: 19417 seconds.

total time took in hour : 5

not sure what's going on !! yes memory always available.

Thanks in Advance.

Mar 07 '18 10:03 munaAchyuta

Sometimes it takes a while. Be patient.

What's happening is MITIE is repeatedly training a classifier and doing hyper parameter selection to find the best one. So MITIE training is always going to take longer than other systems since it does a whole lot of internal validation and retraining so that you never have to fiddle with any parameters.

Mar 07 '18 12:03 davisking

Thanks @davisking .

could you please help me to understand why large value of "C" takes more time compare to small value of "C" ? where Accuracy and F1/F score are mostly same for different values of "C".

from my understanding "C" just a regularisation parameter which helps to reduce/avoid mis-classification. so it doesn't have any effect on Accuracy and F/F1 score. if my understanding is correct , can i use small value of "C" !!. if so, then what is the max minimum value of "C" i can use ? (i mean what is the minimum threshold value of "C" i can use ?) and specially in this problem ?

for above problem please find log..

=============================================== C=300 num training samples: 1441 C: 200 f-score: 0.734335 C: 400 f-score: 0.735081 C: 300 f-score: 0.731994 C: 500 f-score: 0.735241 C: 700 f-score: 0.734709 C: 520 f-score: 0.733273 C: 450.957 f-score: 0.733804 C: 483.4 f-score: 0.736308 C: 480.156 f-score: 0.735241 C: 490.078 f-score: 0.734653 C: 484.607 f-score: 0.735241 C: 482.381 f-score: 0.732305 C: 483.799 f-score: 0.734653 C: 483.236 f-score: 0.732149 best C: 483.4

test on train: 286 2 0 3 0 759 0 3 0 0 43 0 4 6 0 335

overall accuracy: 0.987509 Part II: elapsed time: 19417 seconds. ============================================== C=100 num training samples: 1420 C: 0.01 f-score: 0.673219 C: 200 f-score: 0.75807 C: 100 f-score: 0.758977 C: 148.954 f-score: 0.758783 C: 124.134 f-score: 0.759333 C: 121.721 f-score: 0.757521 C: 136.154 f-score: 0.760752 C: 134.952 f-score: 0.756639 C: 142.253 f-score: 0.757164 C: 138.668 f-score: 0.758945 C: 137.088 f-score: 0.756806 C: 136.031 f-score: 0.759333 C: 136.479 f-score: 0.759459 best C: 136.154 test on train: 286 2 0 3 0 761 0 1 0 0 43 0 4 9 0 311

overall accuracy: 0.98662 Part II: elapsed time: 6148 seconds. ============================================== C=50 num training samples: 1432 C: 0.01 f-score: 0.670678 C: 200 f-score: 0.754349 C: 100 f-score: 0.755016 C: 149.215 f-score: 0.753461 C: 121.914 f-score: 0.755938 C: 118.753 f-score: 0.753097 C: 134.168 f-score: 0.75631 C: 129.929 f-score: 0.756474 C: 129.128 f-score: 0.755917 C: 131.916 f-score: 0.754349 C: 130.128 f-score: 0.755402 C: 129.586 f-score: 0.755938 best C: 129.929 test on train: 286 2 0 3 0 761 0 1 0 0 43 0 5 10 0 321

overall accuracy: 0.985335 Part II: elapsed time: 5562 seconds. df.number_of_classes(): 4 ============================================== C=300 num training samples: 1455 C: 200 f-score: 0.73822 C: 400 f-score: 0.736475 C: 300 f-score: 0.738895 C: 271.805 f-score: 0.737705 C: 326.638 f-score: 0.735243 C: 292.355 f-score: 0.738378 C: 302.664 f-score: 0.733705 C: 296.35 f-score: 0.736475 C: 298.977 f-score: 0.737146 C: 300.35 f-score: 0.736944 C: 299.649 f-score: 0.738933 C: 299.804 f-score: 0.735961 best C: 299.649 test on train: 288 2 0 1 0 760 0 2 0 0 43 0 5 8 0 346

overall accuracy: 0.987629 Part II: elapsed time: 11576 seconds. df.number_of_classes(): 4

============================================== C=500 Part II: train segment classifier now do training num training samples: 1358 PART-II C: 500 PART-II epsilon: 0.0001 PART-II num threads: 4 PART-II max iterations: 2000 C: 400 f-score: 0.774171 C: 600 f-score: 0.778615 C: 500 f-score: 0.779291 C: 538.343 f-score: 0.774471 C: 470.021 f-score: 0.779522 C: 480.425 f-score: 0.776386 C: 443.145 f-score: 0.774217 C: 463.96 f-score: 0.775954 C: 472.435 f-score: 0.775831 C: 468.168 f-score: 0.770751 C: 470.707 f-score: 0.772416 C: 469.493 f-score: 0.770333 C: 470.138 f-score: 0.779291 best C: 470.021 test on train: 287 2 0 2 0 761 0 1 0 0 43 0 6 9 0 247

overall accuracy: 0.985272 Part II: elapsed time: 18762 seconds. df.number_of_classes(): 4

==============================================

from above log : why best C is coming nearer value of given "C" value ? no matter what C value i choose. you can see above log. my point is what is minimum best C or any threshold value of C which can be used for starting point ?

what is "num features" and why is it always 271 ?

correct me whether my interpretation is wrong !! --> "number of samples" is ( sum of number of labels in each sentence ). e.g : 2 sentence each has 3 label then number of samples is 6. right !!

Thanks in Advance. @grafael @lopuhin @baali @davisking @autopost-get @autopost-

Mar 08 '18 10:03 munaAchyuta