suppdata (or unzip?) error: zip file is corrupt
Hi, While using MADtraits::MADtraits to download datasets, I ran into a suppdata::suppdata error:
> unzip(suppdata::suppdata("10.1002/ece3.1456", 1))
Warning message:
In unzip(suppdata::suppdata("10.1002/ece3.1456", 1)) : zip file is corrupt
- I tried opening the archive from suppdata cache but 7zip confirms it can't be opened.
- I tried opening other supplementary files in the same format that suppdata just downloaded and could open them.
- I went on the journal website, downloaded and opened the supplementary without problem
- If the corrupt archive is replaced by the good one manually downloaded from the site, unzip() does not throw an error.
I don't know what causes the error.
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] drake_7.12.5 MADtraits_1.0-0
loaded via a namespace (and not attached):
[1] nlme_3.1-149 fs_1.5.0 bold_1.1.0 usethis_1.6.3
[5] lubridate_1.7.9 devtools_2.3.2 progress_1.2.2 filelock_1.0.2
[9] httr_1.4.2 rprojroot_1.3-2 tools_4.0.2 backports_1.1.10
[13] R6_2.4.1 DT_0.15 withr_2.3.0 tidyselect_1.1.0
[17] prettyunits_1.1.1 processx_3.4.4 curl_4.3 compiler_4.0.2
[21] cli_2.0.2 xml2_1.3.2 desc_1.2.0 triebeard_0.3.0
[25] mvtnorm_1.1-1 callr_3.4.4 handlr_0.2.0 convertr_0.1
[29] stringr_1.4.0 digest_0.6.25 txtq_0.2.3 rmarkdown_2.4
[33] pkgconfig_2.0.3 htmltools_0.5.0 bibtex_0.4.2.3 sessioninfo_1.1.1
[37] fastmap_1.0.1 htmlwidgets_1.5.2 rlang_0.4.7 readxl_1.3.1
[41] rstudioapi_0.11 httpcode_0.3.0 shiny_1.5.0 generics_0.0.2
[45] zoo_1.8-8 jsonlite_1.7.1 gtools_3.8.2 dplyr_1.0.2
[49] magrittr_1.5 Rcpp_1.0.5 fansi_0.4.1 ape_5.4-1
[53] RefManageR_1.2.12 lifecycle_0.2.0 stringi_1.5.3 yaml_2.2.1
[57] storr_1.2.1 MASS_7.3-53 pkgbuild_1.1.0 plyr_1.8.6
[61] grid_4.0.2 parallel_4.0.2 gdata_2.18.0 promises_1.1.1
[65] crayon_1.3.4 miniUI_0.1.1.1 lattice_0.20-41 conditionz_0.1.0
[69] hms_0.5.3 knitr_1.30 ps_1.3.4 pillar_1.4.6
[73] uuid_0.1-4 taxize_0.9.98 igraph_1.2.5 caper_1.0.1
[77] base64url_1.4 codetools_0.2-16 reshape2_1.4.4 pkgload_1.1.0
[81] crul_1.0.0 glue_1.4.2 rcrossref_1.1.0 evaluate_0.14
[85] data.table_1.13.0 remotes_2.2.0 renv_0.12.0 foreach_1.5.0
[89] vctrs_0.3.4 httpuv_1.5.4 urltools_1.7.3 testthat_2.3.2
[93] cellranger_1.1.0 purrr_0.3.4 tidyr_1.1.2 reshape_0.8.8
[97] assertthat_0.2.1 xfun_0.18 mime_0.9 xtable_1.8-4
[101] later_1.1.0.1 tibble_3.0.3 iterators_1.0.12 suppdata_1.1-4
[105] tinytex_0.26 memoise_1.1.0 ellipsis_0.3.1
Thanks for posting this, Alban, and also for transferring this issue over
to suppdata. I can't reproduce this behaviour on my machine, and so I
think the problem is with your unzip call on your machine:
> list.files()
character(0)
> unzip(suppdata("10.1002/ece3.1456", 1, ))
x= from= dir= vol= list=
si= save.name= cache= issue= timeout=
> unzip(suppdata("10.1002/ece3.1456", 1, dir="~/Desktop/demo/"))
> list.files()
[1] "10.1002_ece3.1456_1" "ece31456-sup-0001-suppl_data.zip"
...I know that unzip can function a bit differently on Windows machines
(like yours) and Linux machines (like mine), so I wonder if this is perhaps
what is going on.
Hopefully the above helps; if not let me know.
Will
On Wed, 7 Oct 2020 at 07:44, AlbanSagouis [email protected] wrote:
Hi, While using MADtraits::MADtraits to download datasets, I ran into a suppdata::suppdata error:
unzip(suppdata::suppdata("10.1002/ece3.1456", 1)) Warning message: In unzip(suppdata::suppdata("10.1002/ece3.1456", 1)) : zip file is corrupt
- I tried opening the archive from suppdata cache but 7zip confirms it can't be opened.
- I tried opening other supplementary files in the same format that suppdata just downloaded and could open them.
- I went on the journal website https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.1456, downloaded and opened the supplementary without problem
- If the corrupt archive is replaced by the good one manually downloaded from the site, unzip() does not throw an error.
I don't know what causes the error.
sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices datasets utils methods base
other attached packages: [1] drake_7.12.5 MADtraits_1.0-0
loaded via a namespace (and not attached): [1] nlme_3.1-149 fs_1.5.0 bold_1.1.0 usethis_1.6.3 [5] lubridate_1.7.9 devtools_2.3.2 progress_1.2.2 filelock_1.0.2 [9] httr_1.4.2 rprojroot_1.3-2 tools_4.0.2 backports_1.1.10 [13] R6_2.4.1 DT_0.15 withr_2.3.0 tidyselect_1.1.0 [17] prettyunits_1.1.1 processx_3.4.4 curl_4.3 compiler_4.0.2 [21] cli_2.0.2 xml2_1.3.2 desc_1.2.0 triebeard_0.3.0 [25] mvtnorm_1.1-1 callr_3.4.4 handlr_0.2.0 convertr_0.1 [29] stringr_1.4.0 digest_0.6.25 txtq_0.2.3 rmarkdown_2.4 [33] pkgconfig_2.0.3 htmltools_0.5.0 bibtex_0.4.2.3 sessioninfo_1.1.1 [37] fastmap_1.0.1 htmlwidgets_1.5.2 rlang_0.4.7 readxl_1.3.1 [41] rstudioapi_0.11 httpcode_0.3.0 shiny_1.5.0 generics_0.0.2 [45] zoo_1.8-8 jsonlite_1.7.1 gtools_3.8.2 dplyr_1.0.2 [49] magrittr_1.5 Rcpp_1.0.5 fansi_0.4.1 ape_5.4-1 [53] RefManageR_1.2.12 lifecycle_0.2.0 stringi_1.5.3 yaml_2.2.1 [57] storr_1.2.1 MASS_7.3-53 pkgbuild_1.1.0 plyr_1.8.6 [61] grid_4.0.2 parallel_4.0.2 gdata_2.18.0 promises_1.1.1 [65] crayon_1.3.4 miniUI_0.1.1.1 lattice_0.20-41 conditionz_0.1.0 [69] hms_0.5.3 knitr_1.30 ps_1.3.4 pillar_1.4.6 [73] uuid_0.1-4 taxize_0.9.98 igraph_1.2.5 caper_1.0.1 [77] base64url_1.4 codetools_0.2-16 reshape2_1.4.4 pkgload_1.1.0 [81] crul_1.0.0 glue_1.4.2 rcrossref_1.1.0 evaluate_0.14 [85] data.table_1.13.0 remotes_2.2.0 renv_0.12.0 foreach_1.5.0 [89] vctrs_0.3.4 httpuv_1.5.4 urltools_1.7.3 testthat_2.3.2 [93] cellranger_1.1.0 purrr_0.3.4 tidyr_1.1.2 reshape_0.8.8 [97] assertthat_0.2.1 xfun_0.18 mime_0.9 xtable_1.8-4 [101] later_1.1.0.1 tibble_3.0.3 iterators_1.0.12 suppdata_1.1-4 [105] tinytex_0.26 memoise_1.1.0 ellipsis_0.3.1
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/suppdata/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUSIIGPKFIPCH5UMDY3SJQE55ANCNFSM4SG7BWGQ .
Thanks for the answer Will.
So I tried unzip(suppdata("10.1002/ece3.1456", 1)) on a Linux virtual machine and it works as expected so system seems indeed to be a critical aspect of the issue.
But if I download the supplementary from my Windows machine
suppdata("10.1002/ece3.1456", 1, dir = '~/VirtualBox Shared folders/temp')
and try to unzip it from the Linux machine
> unzip('/media/sf_VirtualBox_Shared_folders/temp/10.1002_ece3.1456_1')
Warning message:
In unzip("/media/sf_VirtualBox_Shared_folders/temp/10.1002_ece3.1456_1") :
zip file is corrupt
it fails again.
So I would say the issue comes before unzip() is called and it is either Windows doing some weird stuff to that specific file, maybe because of its extension or absence of, or suppdata behavior is impacted by Windows?
To check the name.extension idea, I changed the name of the downloaded archive
suppdata::suppdata("10.1002/ece3.1456", 1, save.name = 'test.zip', dir = '~/VirtualBox Shared folders/temp')
but unzip still fails under Windows and under Linux.
Alban
Thanks for this. It sounds like we're in agreement this is a problem related to (potentially your) setup of Windows and unzipping, because the code works fine on Linux (on the same computer) and we agree that the file is being downloaded.
Would you mind humouring me and trying one last thing? The URL for the file you're downloading is ( https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip). Would you mind running:
temp.file <- temp
temp.file <- tempfile()
download.file("
https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip",
temp.file)
unzip(temp.file)
...and seeing if that works? This bypasses suppdata entirely.
Well, it works with download.file().
> temp.file <- tempfile()
> download.file("https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip",
+ temp.file)
trying URL 'https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip'
Content type 'application/zip; charset=UTF-8' length 391767 bytes (382 KB)
downloaded 382 KB
> is.data.frame(readxl::read_xls(unzip(temp.file)[2], 2, skip = 6))
[1] TRUE
>
>
> suppdata::suppdata("10.1002/ece3.1456", 1, dir = tempdir(), save.name = 'tst')
[1] "C:\\Users\\as80fywe\\AppData\\Local\\Temp\\Rtmp00xEbx/tst"
attr(,"suffix")
[1] "suppl"
> unzip(paste0(tempdir(), '/tst'))
Warning message:
In unzip(paste0(tempdir(), "/tst")) : zip file is corrupt
> is.data.frame(readxl::read_xls(unzip(paste0(tempdir(), '/tst'))[2], 2, skip = 6))
Error: `path` does not exist: ‘NA’
In addition: Warning message:
In unzip(paste0(tempdir(), "/tst")) : zip file is corrupt
I tried in R, outside of any R project or renv library, with both CRAN and GitHub versions of suppdata and the error remains.
I'll look into suppdata.
Alban
On a side note, the .paquette.2015 function in MADtraits has 2 unzip() calls.
data <- as.data.frame(read_xls(unzip(unzip(suppdata("10.1002/ece3.1456", 1)))[2], sheet=2, na=c("","NA")))
Thanks for this; this has really helped me. I think this is an edge case
where the publisher hasn't named the file with a .zip extension, which
means that we're not detecting it as a zip-file when downloading and
switched to binary mode when downloading on Windows.
I don't have a Windows box to test this on right now, but I hope I have pushed something up now that can force this through. Would you mind trying the following in a fresh session:
library(devtools)
install_github("ropensci/suppdata", ref="winzip")
library(suppdata)
unzip(suppdata("10.1002/ece3.1456", 1))
...if that works for you then I'll clean up and merge it into the master branch. If it doesn't then I'll leave you alone, figure out a better fix, and then merge it anyway.
Thanks for flagging this.
On Thu, 8 Oct 2020 at 09:27, AlbanSagouis [email protected] wrote:
Well, it works with download.file().
temp.file <- tempfile()
download.file("https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip",
temp.file)trying URL 'https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fece3.1456&file=ECE31456-sup-0001-suppl_data.zip'
Content type 'application/zip; charset=UTF-8' length 391767 bytes (382 KB)
downloaded 382 KB
is.data.frame(readxl::read_xls(unzip(temp.file)[2], 2, skip = 6))
[1] TRUE
suppdata::suppdata("10.1002/ece3.1456", 1, dir = tempdir(), save.name = 'tst')
[1] "C:\Users\as80fywe\AppData\Local\Temp\Rtmp00xEbx/tst"
attr(,"suffix")
[1] "suppl"
unzip(paste0(tempdir(), '/tst'))
Warning message:
In unzip(paste0(tempdir(), "/tst")) : zip file is corrupt
is.data.frame(readxl::read_xls(unzip(paste0(tempdir(), '/tst'))[2], 2, skip = 6))
Error:
pathdoes not exist: ‘NA’In addition: Warning message:
In unzip(paste0(tempdir(), "/tst")) : zip file is corrupt
I tried in R, outside of any R project or renv library, with both CRAN and GitHub versions of suppdata and the error remains. I'll look into suppdata.
Alban
On a side note, the .paquette.2015 function in MADtraits has 2 unzip() calls.
data <- as.data.frame(read_xls(unzip(unzip(suppdata("10.1002/ece3.1456", 1)))[2], sheet=2, na=c("","NA")))— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/suppdata/issues/53#issuecomment-705415255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUSOTSOINHWWLPYNXMDSJVZWDANCNFSM4SG7BWGQ .
The fix did not solve the issue for me but I'll keep it in mind for future uses of suppdata
And thanks for putting effort in trying to solve it.
Alban
Oh wait...
> unzip(suppdata::suppdata("10.1002/ece3.1456", 1, zip = TRUE))
works!
This is on a Windows machine, using the fix from your winzip branch.
Ah, that's wonderful news, thank you very much! Thanks for bearing with me while we got this fixed.
On Thu, 8 Oct 2020 at 14:31, AlbanSagouis [email protected] wrote:
Oh wait...
unzip(suppdata::suppdata("10.1002/ece3.1456", 1, zip = TRUE))
works!
This is on a Windows machine, using the fix from your winzip branch.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/suppdata/issues/53#issuecomment-705569763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUWAQELIARLKYGL5F5TSJW5JHANCNFSM4SG7BWGQ .