Can't get an entire channel's videos by last_published
I've adapted the solution used for the Yt::Collections::Videos#add_offset_to method to grab video information for an entire channel by date instead of by page, but it missed almost half of the videos in the channel I polled (successes: 1362 out of 2460).
Here's a snippet of the relevant code I used in a local Channel model:
def scrape
yt_channel = Yt::Channel.new id: uid
yt_videos = yt_channel.videos
.where(published_before: last_published)
.first(50)
scrape_page(yt_videos)
save
scrape if yt_videos.count == 50
end
def last_published
time = videos.any? ? videos.last.published_at : Time.zone.now
time.strftime '%FT%T.999Z'
end
If this fails due to some inconsistencies in the YouTube API's sorting of results by most recent first, it seems the gem's own method of accessing more than 500 videos will also miss some results.
I'd need to look into this but I think there are a few places where the gem uses the search endpoint instead of the list endpoint, which has the limitations you described. I'd like to make that more predictable and less surprising. IMO we should more closely follow the YouTube API's behavior.
Did you ever find a solution to this issue? I'm currently working on a project where I would like to obtain all the video_id's from a list of channels.
Ok, I (finally) had some time to look into this, and I think I know what's going on.
With the YouTube Data API there's a 500 video limit on hitting the search endpoint (which Yt uses most of the time for these operations), including channel videos unless you are properly authorized (see the docs for "channelId" here: https://developers.google.com/youtube/v3/docs/search/list). In other words, you won't be able to get an arbitrary channel's videos via the Data API.
Furthermore, I think there's a bug in the case of using a content owner and fetching its channels where the expected params (on_behalf_of_content_owner / for_content_owner) aren't being passed through (though these can be explicitly set).
In summary, these should work for fetching all videos from a channel (I tested them both).
Account auth:
token = 'ya29.abcdef'
a = Yt::Account.new(access_token: token)
puts a.videos.take(1000).size # => 1000
Content Owner auth:
content_owner_id = 'CBA4321'
channel_id = 'UCFOO'
token = 'ya29.123456'
co = Yt::ContentOwner.new(owner_name: content_owner_id, access_token: token)
channel = Yt::Channel.new(id: channel_id, auth: co)
# note the explicit parameter setting
videos = channel.videos.where(on_behalf_of_content_owner: content_owner_id, for_content_owner: true)
fetched = videos.take(1000)
puts fetched.size # => 1000
Let me know if you have any other questions.