how2-dataset icon indicating copy to clipboard operation
how2-dataset copied to clipboard

scripts/install.sh is not installing

Open mattiadg opened this issue 6 years ago • 3 comments

Hi, I'm trying to get everything in place. I downloaded the files with the Option 1, now I am getting errors in reproducing the pipeline.

when I run

bash scripts/install.sh

I get the following error:

########
# 21 videos were not downloaded for some reason.
# Please try to update your youtube-dl to a recent version.
# If you still have the problem, please open a ticket on github and
# provide the following output:
########

Missing 20 subtitles/videos for train:
  1l6MC-9BQa0
  1mWIOvyqfUI
  2Qn2pn6ReWE
  _aB_yyHjlRs
  1tIYY7yjdxI
  ESXuXDOmvDM
  cyemyhihk2s
  BtgP4ZHwmdw
  bsLj1m4k020
  CXbQug3h_wc
  ElqzSc-zW2M
  3D2JV01YRcE
  CNlh1WKGyWo
  EBcv0a2jznE
  5yZGdmZ9Hrk
  dMb2kWTL1zE
  DNA_UjrbSyM
  aPJ6ILkkvc0
  2N20JDU14CQ
  5T6dxlSxTEc
Missing 1 subtitles/videos for dev5:
  G23G21G49dk

and my youtube-dl is already up-to-date. When I check in the data directory, it doesn't get changed by the script.

mattiadg avatar Sep 09 '19 15:09 mattiadg

Hello,

The install.sh style reproducing pipeline is failing because some of the videos were removed from YouTube after the dataset was prepared. Which leaves the option 1, the only option to get all the necessary data files downloaded as tarballs.

ozancaglayan avatar Sep 09 '19 15:09 ozancaglayan

We should maybe add a note to say that it is expected that not all of the videos can be downloaded. This does not pose a problem as long as the data is "reasonably" complete, e.g. most of the videos could be downloaded. Maybe worth providing a download of the complete eval set?

fmetze avatar Sep 09 '19 15:09 fmetze

If it is okay to provide the evaluation set videos in terms of license, yes we can do that of course.

ozancaglayan avatar Oct 04 '19 10:10 ozancaglayan