problem with retweet files
Hi, contents of almost all the retweet files are {"retweets": []}, without any data. what is the problem? Is it normal?
Look into the your data_collection.out log file. This might be a key issue with your twitter keys. At least that was it, when it happened to me. I had to restart the collection process, as all my data was empty as well.
@SaschaStenger Hi! I just want to check it with my problem and see if you are having the same. So what is happening in my case is that due to errors most retweets aren't being collected but a few are (as described in this issue). Are you also having the same troubles with retweets or have you solved them?
Hi
The issue that you are describing is not unusual. I asked the same question more or less at the Twitter development forum. The reason being, that lots of tweets decay over time, especially in a context like fake news (with decay meaning, that they either get deleted, or hidden by the user). This then in turn leads to them not being available anymore.
The same goes for, if the original tweet has been deleted or hidden. This can also lead to errors thrown by the twitter API.
Lastly, just a fraction of all tweets have been retweeted. So this might be a reason, as to why your resulting .json data is empty. Check the corresponding tweet .json and look under the key: retweet_count. This will tell you, if the download code has missed any retweets due to errors or anything similar.
Thanks a lot! I will take a look into it.
@rlleshi Have you checked if it worked? because im having the same problem as your last issue Thanks.
@Dahabium as @SaschaStenger mentioned, this is due to the fact that this code is also crawling tweets which have decayed (deleted, hid). So this is normal. However, she seems to have created a version of the repository which skips these tweets and its crawling the whole dataset much faster.
@SaschaStenger's, what num_processes do you set? (does it matter on the amount of tokens you are using). By the way, thanks for your mods in your repo!
I'm using one process less, then I use keys. And i don't know if that makes any difference, but my thought behind it was, so that if every process is using one key, then there is a backup key when the others are on timeout. But that is really your own preference. Personally, i wouldn't go with more processes then keys, just seeing as the keys are the bottleneck and not the number of processes that you are running. Maybe except for when your machine is really slow.