Bug in load_twitter function
Hi, I ran your code, I think there are some issues that make the data weird
with open(test_file, 'r', encoding='latin1') as f:
lines = f.readlines()
for i in range(0, len(lines), 3):
if(lines[i+2][:-1] == '-1'):
lines[i+2] = '2'
curind = lines[i].find('$T$')
asp = lines[i+1][1]
sen = lines[i][0: curind] + " " + asp + " " + lines[i][curind + 3: -1]
test_sentence.append(sen)
test_aspect.append(asp)
test_sentiment.append(literal_eval(lines[i+2]))
First, there are some sentences that have '$T$' multiple times! But you only consider the first one. Second, the "test_aspect" is not a list of aspects, but a list of characters. The characters are the second character of the aspect line! Based on this, I am confused about what exactly your code is training!
This is part of the test_sentence when I print it:
[' h to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... .', "Dear u , Gray Hoodies turned into Leather Jackets , '' Ay ! '' turned into '' Swag '' , u grew up , and we 've been here all the way . RT i u love Bieber â\x99¥", 'received my o account today ! sorry have no invites , but i will spread the love if i receive any , thanks twitter community !', 'epascarello I know ! Man I get pissed when I try to copy a link from o search results and paste it in a forum or whatever .', 'Is it just me , or does o sound like a newsman ? Sounds like he belongs on CBS Nightly News .']
So you can see for example for the first sentence we still have another $T$ in the tweet.
This is their corresponding aspect in test_aspect:
['h', 'u', 'o', 'o', 'o']
This is the actual data:
'$T$ to miss 3rd straight playoff game | The ... : $T$ will miss his third straight play ... .',
"shaquille o'neal",
'2',
"Dear $T$ , Gray Hoodies turned into Leather Jackets , '' Ay ! '' turned into '' Swag '' , u grew up , and we 've been here all the way . RT i u love Bieber â\x99¥",
'justin',
'1',
'received my $T$ account today ! sorry have no invites , but i will spread the love if i receive any , thanks twitter community !',
'google wave',
'0\n',
'epascarello I know ! Man I get pissed when I try to copy a link from $T$ search results and paste it in a forum or whatever .',
'google',
'0',
'Is it just me , or does $T$ sound like a newsman ? Sounds like he belongs on CBS Nightly News .',
'john boehner',
'2']