How can I use the data of election as lidong?
Hi, I have downloaded the data of election you provided. How can I use is as lidong? Because the format is different. Can you tell me how to preprocess the data of election? Thanks!
Hi there, sorry for the slow reply. First of all you need to parse the data, I'd recommend using Stanford parser. Then you may use the streamtwElec function in https://github.com/bluemonk482/tdparse/blob/master/data/dataprocessing.py for preprocessing. Hope this helps.
Hey, I have a bunch of data formatting questions for the election dataset. What version of CoNLL should the data be formatted in? Is Stanford preferable to a tweet-specific one? Should streamtwElec be fed the two folders one train and test? How does SemEval format work for multiple entities? Is there a nice package/function used to write to CoNLL and Semeval, in the specific versions that the code relies on?
Hi @miloKnell The version CoNLL is same as the parsed data in https://github.com/bluemonk482/tdparse/tree/master/data/lidong/parses. I think I used the old Stanford parser, which gives more parsed type information than the CMU tweet-specific one. Yes. The simplest way is to duplicate a tweet with more than one entities multiple times. However there has been works on utilising the relationships between all the entities for assigning sentiments.
Let me know if you have any other problems.