Prediction of multi-word expression

Open gifdog97 opened this issue 4 years ago • 0 comments

Is it possible to predict multi-word expression (MWE) from raw text? I run predict.py with option --raw_text to find that MWE cannot be predicted.

For example, in Italy, "della" is abbreviation of "di la" and UD annotates such token like as follows:

31-32	della	_	_	_	_	_	_	_	_
31	di	di	ADP	E	_	35	case	35:case	_
32	la	il	DET	RD	Definite=Def|Gender=Fem|Number=Sing|PronType=Art	35	det	35:det	_

However, the output of UDify is something like this:

31	della	della	ADP	_	_	3	case	_	_

I hope to obtain the conllu output with proper MWE. Are there any way to realize it?

Oct 31 '21 07:10 gifdog97