Serverless function for video archiving
Tell us about your request
Create a serverless function that uses youtube-dl to perform video archiving. The function should emulate Pender's current video archiver which stores the output of youtube-dl onto an S3 bucket. Eventually (in another issue) this function will be integrated into Check (specifically, called by Pender as a new archiving provider that replaces the current MediaVideoArchiver.)
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Archiving video media is an essential part of fact-checking, investigative and human-rights workflows. It is hard because video hosting platforms routinely take down videos, especially related to sensitive or controversial issues. In some cases, preserving this media is essential to build investigation reports or court cases.
Are you currently working around the issue?
Using youtube-dl manually, and uploading the downloaded video to Check.
Implementation hints
- Use
Serverlessas an implementation framework - Use Meedan's
narcissusas a template for the function - Design the endpoint such that it accepts a URL and returns either success or failure results
- Design the function such that the S3 bucket access is configurable - also,
youtube-dloptions should be configurable - The success result should include the location of the video archive in the specified S3 bucket
- The error result should include separate error codes and messages for different error conditions
- Ensure that the function is runnable locally, as well as testable (e.g. on Travis)
- Use this prototype code as implementation example
Hello @infojunkie, I would love to work on this issue.
I've built a similar function on GitHub actions to archive videos from a YouTube playlist using youtube-dl.
@parshnt Sounds great, feel free to submit a PR against the repo https://github.com/meedan/varcissus
Hello, @infojunkie
due to some personal commitments and lack of time, I won't be able to work on this issue. apologies, you can go ahead & assign it to someone else who's up to work on it.
Hey, I would like to work on this. Just wanted to know how are we supposed to trigger the function and tell which video to download?
Can I work on this?
Hi @Pradyumn ! Are you still interested in working on this issue?
Just wanted to know how are we supposed to trigger the function and tell which video to download?
The serverless function should have an API endpoint to receive the requests. This URL will be used to trigger the function and must accept a URL.
You should use the prototype code as example. Starting from Varcissus, you need to:
- Design the endpoint such that it accepts a URL and returns either success or failure results
- The success result should include the location of the video archive in the specified S3 bucket
- The error result should include separate error codes and messages for different error conditions (couldn't download, S3 upload failure...)
- Design the function such that the S3 bucket access and upload is configurable. Examples: S3 access and S3 upload
- youtube-dl options should be configurable