yt icon indicating copy to clipboard operation
yt copied to clipboard

Can't get an entire channel's videos by last_published

Open blerg-rush opened this issue 6 years ago • 3 comments

I've adapted the solution used for the Yt::Collections::Videos#add_offset_to method to grab video information for an entire channel by date instead of by page, but it missed almost half of the videos in the channel I polled (successes: 1362 out of 2460).

Here's a snippet of the relevant code I used in a local Channel model:

  def scrape
    yt_channel = Yt::Channel.new id: uid
    yt_videos = yt_channel.videos
                          .where(published_before: last_published)
                          .first(50)
    scrape_page(yt_videos)
    save
    scrape if yt_videos.count == 50
  end

  def last_published
    time = videos.any? ? videos.last.published_at : Time.zone.now
    time.strftime '%FT%T.999Z'
  end

If this fails due to some inconsistencies in the YouTube API's sorting of results by most recent first, it seems the gem's own method of accessing more than 500 videos will also miss some results.

blerg-rush avatar Oct 10 '19 04:10 blerg-rush

I'd need to look into this but I think there are a few places where the gem uses the search endpoint instead of the list endpoint, which has the limitations you described. I'd like to make that more predictable and less surprising. IMO we should more closely follow the YouTube API's behavior.

dgb avatar Jun 06 '20 00:06 dgb

Did you ever find a solution to this issue? I'm currently working on a project where I would like to obtain all the video_id's from a list of channels.

justinallenmarsh avatar Dec 16 '20 19:12 justinallenmarsh

Ok, I (finally) had some time to look into this, and I think I know what's going on.

With the YouTube Data API there's a 500 video limit on hitting the search endpoint (which Yt uses most of the time for these operations), including channel videos unless you are properly authorized (see the docs for "channelId" here: https://developers.google.com/youtube/v3/docs/search/list). In other words, you won't be able to get an arbitrary channel's videos via the Data API.

Furthermore, I think there's a bug in the case of using a content owner and fetching its channels where the expected params (on_behalf_of_content_owner / for_content_owner) aren't being passed through (though these can be explicitly set).

In summary, these should work for fetching all videos from a channel (I tested them both).

Account auth:

token = 'ya29.abcdef'
a = Yt::Account.new(access_token: token)
puts a.videos.take(1000).size # => 1000

Content Owner auth:

content_owner_id = 'CBA4321'
channel_id = 'UCFOO'
token = 'ya29.123456'

co = Yt::ContentOwner.new(owner_name: content_owner_id, access_token: token)
channel = Yt::Channel.new(id: channel_id, auth: co)

# note the explicit parameter setting
videos = channel.videos.where(on_behalf_of_content_owner: content_owner_id, for_content_owner: true)

fetched = videos.take(1000)

puts fetched.size # => 1000

Let me know if you have any other questions.

dgb avatar Dec 17 '20 00:12 dgb