scripts/install.sh is not installing
Hi, I'm trying to get everything in place. I downloaded the files with the Option 1, now I am getting errors in reproducing the pipeline.
when I run
bash scripts/install.sh
I get the following error:
########
# 21 videos were not downloaded for some reason.
# Please try to update your youtube-dl to a recent version.
# If you still have the problem, please open a ticket on github and
# provide the following output:
########
Missing 20 subtitles/videos for train:
1l6MC-9BQa0
1mWIOvyqfUI
2Qn2pn6ReWE
_aB_yyHjlRs
1tIYY7yjdxI
ESXuXDOmvDM
cyemyhihk2s
BtgP4ZHwmdw
bsLj1m4k020
CXbQug3h_wc
ElqzSc-zW2M
3D2JV01YRcE
CNlh1WKGyWo
EBcv0a2jznE
5yZGdmZ9Hrk
dMb2kWTL1zE
DNA_UjrbSyM
aPJ6ILkkvc0
2N20JDU14CQ
5T6dxlSxTEc
Missing 1 subtitles/videos for dev5:
G23G21G49dk
and my youtube-dl is already up-to-date. When I check in the data directory, it doesn't get changed by the script.
Hello,
The install.sh style reproducing pipeline is failing because some of the videos were removed from YouTube after the dataset was prepared. Which leaves the option 1, the only option to get all the necessary data files downloaded as tarballs.
We should maybe add a note to say that it is expected that not all of the videos can be downloaded. This does not pose a problem as long as the data is "reasonably" complete, e.g. most of the videos could be downloaded. Maybe worth providing a download of the complete eval set?
If it is okay to provide the evaluation set videos in terms of license, yes we can do that of course.