get_comment_threads does not return max results
When I manually set maximum_results to the specified # of comments for a video_id, there are less results than specified. For example, I recently tried to fetch 4362 comments for a video, but the function returned only 1900.
Apologies for the delay in responding.
Two things:
- Can you share a reproducible example so that I try debugging at my end.
- A brief tour of Stackoverflow suggests some similar issues: https://stackoverflow.com/questions/31546995/youtube-data-api-v3-commentthread-call-doesnt-give-replies-for-some-comment-th Not sure if you have investigated. If yes, do share what you find. Thanks. And looking forward.
I am also seeing this. Does the get_comment_threads function only get posts and not any comments that are replies to it? Thank you!
Hey @jcmundy: can you send a reproducible example. Would def. look into it.
I used the following query and only got 200 results even though there are 365 comments:
results <- get_comment_threads(c(video_id="ZjZBZxPk4Pc"), max_results = 500)
ok --- I think I have a diagnosis of the problem. We are getting the replies. There is a way to get the replies by adding in the part 'id,replies,snippet'. I will create support for get_all and post here when done.
Hi, i'm using get_all_comments and still not getting all results. Example:
Video has 969 comments, function returning 746 records
get_all_comments(c(video_id = 'gr0XWmEbiMQ'))
Hey @rodik, I did some due diligence around whether whether we are getting the latest and the oldest comment. We are.
a[747,]
authorDisplayName authorProfileImageUrl
1208 wìld wingõ https://yt3.ggpht.com/-aaHd2yRLPtw/AAAAAAAAAAI/AAAAAAAAAAA/AA-MPoywE0Y/s28-c-k-no-mo-rj-c0xffffff/photo.jpg
authorChannelUrl authorChannelId.value videoId
1208 http://www.youtube.com/channel/UCO6xnwYAHgJT6rAJwQG2ZCg UCO6xnwYAHgJT6rAJwQG2ZCg gr0XWmEbiMQ
textDisplay
1208 I found this at a book/record store in Texas for $30. Except someone had put it on hold before me, so they didn't let me get it. I found another copy though! Bless people who share amazing things such as this!!
textOriginal
1208 I found this at a book/record store in Texas for $30. Except someone had put it on hold before me, so they didn't let me get it. I found another copy though! Bless people who share amazing things such as this!!
canRate viewerRating likeCount publishedAt updatedAt id parentId moderationStatus
1208 TRUE none 0 2017-12-18T01:04:52.000Z 2017-12-18T01:04:52.000Z UgzcFu3qnVT8S5QjJAp4AaABAg <NA> <NA>
a[1,]
authorDisplayName authorProfileImageUrl
61 Ismael M https://yt3.ggpht.com/-zbvP1JLp-S0/AAAAAAAAAAI/AAAAAAAAAAA/TtrCvlT0zuo/s28-c-k-no-mo-rj-c0xffffff/photo.jpg
authorChannelUrl authorChannelId.value videoId
61 http://www.youtube.com/channel/UC3SA9eWOqgb-YnoVLY4gEOQ UC3SA9eWOqgb-YnoVLY4gEOQ gr0XWmEbiMQ
textDisplay
61 Really many pleasure listening to this so good music. Thanks more; I didn't seen You had these artits. I am very glad to find it.
textOriginal canRate viewerRating
61 Really many pleasure listening to this so good music. Thanks more; I didn't seen You had these artits. I am very glad to find it. TRUE none
likeCount publishedAt updatedAt id parentId moderationStatus
61 4 2015-11-19T21:09:45.000Z 2015-11-19T21:09:45.000Z UghJyBzi3KN0_3gCoAEC <NA> <NA>
I wish I knew what comment we aren't seeing. If you have an idea, let me know.
It is possible that some people left comments and later deleted them and that is why the count is a bit higher.
@soodoku I am still getting 200 for 368 comments in the video I posted above. When I go through my file, I get the comments, but not replies to those comments. While, I'm sure some posts have been deleted, is there a way to get all comments including the replies to those comments? (for example user vmware posted a comment 10 months ago, and there are 5 replies to it, but the next post I get is Guillermo Torres/earlier and Mikkel Andersen/later - the 5 users who replied do not show up). Thank you!
@jcmundy
I can see that there are replies.
a <- get_all_comments(video_id = "ZjZBZxPk4Pc")
sum(!is.na(a$parentId))
# [1] 37
Thank you for your help. I do not fully understand your answer. I can see in my csv output that some replies are visible on the youtube video and are not in my csv.
@jcmundy --- apologies for being cryptic!
my point was that the function does pull in replies. your point = it isn't pulling in all the replies.
investigating the vmware thing.
problem is that when you pull things programmatically, errors don't tend to be ad hoc. so trying to find a pattern to see how i can troubleshoot this.
@soodoku Thank you for the help! I will let you know if I see anything with more of a pattern.
I'm testing on a video with a smaller number of comments, this is what i have found so far:
The video has 10 comments but only 5 are returned (no replies are returned):
get_all_comments(c(video_id = 'mvpacG68TFg'))
Some replies are lost here:
replies <- lapply(res$items[n_replies > 0], function(x) { unlist(x$replies$comments) })
What happens is that only the first reply from a comment thread was returned. So one bug is here.
The other one is on line 32:
agg_res <- simpler_res
After adding replies to agg_res you overrun it with the data.frame containing only comments.
I didn't dive in deeper, but it looks to me that line 32 shouldn't be there. Hope this helps.
Hey, @soodoku , this was bothering me so I investigated some more :)
What happens with your first bug is that unlist(x$replies$comments) creates a longer vector instead of a data.frame with all replies. So this is what you need here:
replies <- lapply(res$items[n_replies > 0], function(x) {
ldply(x$replies$comments, unlist)
})
The second problem is simple, I think you just included the line by accident. The reason why some replies do get returned is Paging (imho).
By the way, thank you for your work. You might find the Rfacebook repo to be useful, especially the utils for parsing json to "classes".
thanks @rodik!
the function is super kludgy. will rewrite this. will finalize and release soon.
so @rodik,
i have broken the function down into two to make it cleaner. and the replies stuff was just fine. the problem was with the ifelse. the upshot is that w/ the example you gave, i get all the 9 comments. will be testing more but there is progress
# a <- get_all_comments(c(video_id = 'mvpacG68TFg'))
nrow(a)
# [1] 9
Hey @soodoku,
you still need the ldply when parsing replies (take a look at my last comment in this thread). This is the reason you are getting 9 comments instead of 10 in my example.
Even with that fixed, the function still returns only a fraction of all the replies. But this time it is the API returning only 5 out of 12 replies for comment number 3 in this video: C1rvIRtb1AM
You can see the same results on the api explorer: https://www.googleapis.com/youtube/v3/commentThreads?videoId=C1rvIRtb1AM&part=id%2Creplies%2Csnippet&maxResults=100
The function shoul probably iterate commentThreads and try to get replies for each one of them rather than including "id,replies,snippet" in the part parameter.
thanks so much @rodik !
Was looking into that precise point right now. really appreciate it.
10 out of 10!
will push a cleaner version of the function later
a <- get_all_comments(c(video_id = 'mvpacG68TFg'))
# nrow(a)
[1] 10
Just an update for @jcmundy
Wrote a cleaner function and now up to 304:
results <- get_all_comments(video_id="ZjZBZxPk4Pc")
nrow(results)
[1] 304