tdparse How can I use the data of election as lidong?

Hi, I have downloaded the data of election you provided. How can I use is as lidong? Because the format is different. Can you tell me how to preprocess the data of election? Thanks!

Dec 10 '19 10:12 gpf951101

Hi there, sorry for the slow reply. First of all you need to parse the data, I'd recommend using Stanford parser. Then you may use the streamtwElec function in https://github.com/bluemonk482/tdparse/blob/master/data/dataprocessing.py for preprocessing. Hope this helps.

Dec 17 '19 14:12 bwang482

Hey, I have a bunch of data formatting questions for the election dataset. What version of CoNLL should the data be formatted in? Is Stanford preferable to a tweet-specific one? Should streamtwElec be fed the two folders one train and test? How does SemEval format work for multiple entities? Is there a nice package/function used to write to CoNLL and Semeval, in the specific versions that the code relies on?

Mar 06 '20 01:03 miloKnell

Hi @miloKnell The version CoNLL is same as the parsed data in https://github.com/bluemonk482/tdparse/tree/master/data/lidong/parses. I think I used the old Stanford parser, which gives more parsed type information than the CMU tweet-specific one. Yes. The simplest way is to duplicate a tweet with more than one entities multiple times. However there has been works on utilising the relationships between all the entities for assigning sentiments.

Let me know if you have any other problems.

Mar 07 '20 17:03 bwang482