MITIE icon indicating copy to clipboard operation
MITIE copied to clipboard

How does mitie deal with the segmentation of OOV

Open rookiebird opened this issue 6 years ago • 1 comments

Expected Behavior

Hi,I want to know how does mitie deal with the segmentation of OOV. In fact, two of my train example like this: 1.The daily life of the [League Of Legends](name) on November 10 (chinese: [英雄联盟](name)11.10的日活) 2. The daily life of the [Tomb Raider3](name) on November 10 (chinese: [古墓丽影3](name)11.10的日活) My training sample is in Chinese which contains many entities related to the game name. Some game names contain numbers, some have no numbers,like "古墓丽影3" and ”英雄联盟“.In the example above , I want mitie to identify the entities as "古墓丽影3" and the ”英雄联盟“. 11.10 is a simple representation of the date,which should not be include.

Current Behavior

I label the entity correctly.However, the first sample is often identified as ”英雄联盟11" rather than ”英雄联盟". How can I deal with this problem? I try to add several data,but It's work. Should I add more data ?

  • Version: 0.7.0
  • Where did you get MITIE: pip install
  • Platform: windows64 and linux64

rookiebird avatar Dec 23 '19 07:12 rookiebird

This means you need more training data. Be sure you also generate your own word model, rather than using the English model file.

davisking avatar Dec 23 '19 22:12 davisking