tdparse icon indicating copy to clipboard operation
tdparse copied to clipboard

How can I use the data of election as lidong?

Open gpf951101 opened this issue 6 years ago • 3 comments

Hi, I have downloaded the data of election you provided. How can I use is as lidong? Because the format is different. Can you tell me how to preprocess the data of election? Thanks!

gpf951101 avatar Dec 10 '19 10:12 gpf951101

Hi there, sorry for the slow reply. First of all you need to parse the data, I'd recommend using Stanford parser. Then you may use the streamtwElec function in https://github.com/bluemonk482/tdparse/blob/master/data/dataprocessing.py for preprocessing. Hope this helps.

bwang482 avatar Dec 17 '19 14:12 bwang482

Hey, I have a bunch of data formatting questions for the election dataset. What version of CoNLL should the data be formatted in? Is Stanford preferable to a tweet-specific one? Should streamtwElec be fed the two folders one train and test? How does SemEval format work for multiple entities? Is there a nice package/function used to write to CoNLL and Semeval, in the specific versions that the code relies on?

miloKnell avatar Mar 06 '20 01:03 miloKnell

Hi @miloKnell The version CoNLL is same as the parsed data in https://github.com/bluemonk482/tdparse/tree/master/data/lidong/parses. I think I used the old Stanford parser, which gives more parsed type information than the CMU tweet-specific one. Yes. The simplest way is to duplicate a tweet with more than one entities multiple times. However there has been works on utilising the relationships between all the entities for assigning sentiments.

Let me know if you have any other problems.

bwang482 avatar Mar 07 '20 17:03 bwang482