check icon indicating copy to clipboard operation
check copied to clipboard

Serverless function for video archiving

Open infojunkie opened this issue 5 years ago • 6 comments

Tell us about your request Create a serverless function that uses youtube-dl to perform video archiving. The function should emulate Pender's current video archiver which stores the output of youtube-dl onto an S3 bucket. Eventually (in another issue) this function will be integrated into Check (specifically, called by Pender as a new archiving provider that replaces the current MediaVideoArchiver.)

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Archiving video media is an essential part of fact-checking, investigative and human-rights workflows. It is hard because video hosting platforms routinely take down videos, especially related to sensitive or controversial issues. In some cases, preserving this media is essential to build investigation reports or court cases.

Are you currently working around the issue? Using youtube-dl manually, and uploading the downloaded video to Check.

Implementation hints

  • Use Serverless as an implementation framework
  • Use Meedan's narcissus as a template for the function
  • Design the endpoint such that it accepts a URL and returns either success or failure results
  • Design the function such that the S3 bucket access is configurable - also, youtube-dl options should be configurable
  • The success result should include the location of the video archive in the specified S3 bucket
  • The error result should include separate error codes and messages for different error conditions
  • Ensure that the function is runnable locally, as well as testable (e.g. on Travis)
  • Use this prototype code as implementation example

infojunkie avatar Sep 25 '20 20:09 infojunkie

Hello @infojunkie, I would love to work on this issue.

I've built a similar function on GitHub actions to archive videos from a YouTube playlist using youtube-dl.

parshnt avatar Oct 03 '20 16:10 parshnt

@parshnt Sounds great, feel free to submit a PR against the repo https://github.com/meedan/varcissus

infojunkie avatar Oct 03 '20 17:10 infojunkie

Hello, @infojunkie

due to some personal commitments and lack of time, I won't be able to work on this issue. apologies, you can go ahead & assign it to someone else who's up to work on it.

parshnt avatar Oct 12 '20 14:10 parshnt

Hey, I would like to work on this. Just wanted to know how are we supposed to trigger the function and tell which video to download?

Pradyumn avatar Oct 15 '20 16:10 Pradyumn

Can I work on this?

Pradyumn avatar Oct 17 '20 14:10 Pradyumn

Hi @Pradyumn ! Are you still interested in working on this issue?

Just wanted to know how are we supposed to trigger the function and tell which video to download?

The serverless function should have an API endpoint to receive the requests. This URL will be used to trigger the function and must accept a URL.

You should use the prototype code as example. Starting from Varcissus, you need to:

  • Design the endpoint such that it accepts a URL and returns either success or failure results
    • The success result should include the location of the video archive in the specified S3 bucket
    • The error result should include separate error codes and messages for different error conditions (couldn't download, S3 upload failure...)
  • Design the function such that the S3 bucket access and upload is configurable. Examples: S3 access and S3 upload
  • youtube-dl options should be configurable

danielafeitosa avatar Oct 30 '20 20:10 danielafeitosa