archive icon indicating copy to clipboard operation
archive copied to clipboard

archive_write_files fails on compressing geopackage files into 7zip

Open jldupouey opened this issue 2 years ago • 5 comments

The following code shows that the archive_write_files function in the archive package incorrectly compresses geopackage files into 7zip.

I've tried this with several different geopackage files, and the error is the same. The problem is with compression, not decompression.

Is there an option to set in the archive_write_files function call for this type of file? Or is it a bug in archive?

    # R version 4.3.2 (2023-10-31 ucrt)
    # Platform: x86_64-w64-mingw32/x64 (64-bit)
    # Running under: Windows 10 x64 (build 19045)
        
    # archive_1.1.7    
    library(archive)
        
    # sf_1.0-15 
    library(sf)
        
    nc <- st_read(system.file("shape/nc.shp", package="sf"))
        
    # Reading layer `nc' from data source `C:\Users\jldupouey\AppData\Local\R\win-library\4.3\sf\shape\nc.shp' using driver `ESRI Shapefile'
    # Simple feature collection with 100 features and 14 fields
    # Geometry type: MULTIPOLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
    # Geodetic CRS:  NAD27
        
    # writing the geopackage file:
    
    st_write(nc,"nc.gpkg",append=FALSE)
    
    # Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
    # Writing 100 features with 14 fields and geometry type Multi Polygon.
        
    # the file has been correctly written:
    
    st_read("nc.gpkg")
    
    # Reading layer `nc' from data source `D:\a_jeter\nc.gpkg' using driver `GPKG'
    # Simple feature collection with 100 features and 14 fields
    # Geometry type: MULTIPOLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
    # Geodetic CRS:  NAD27
        
    # creating a 7z archive:
    
    archive_write_files(archive="nc.7z",files="nc.gpkg",format="7zip")
        
    # extracting the geopackage file from the 7z archive:
    
    archive_extract(archive="nc.7z",files="nc.gpkg")
        
    # the extracted file is not correct:
    
    st_read("nc.gpkg")
    
    # Error: Cannot open "D:\a_jeter\nc.gpkg"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
    # In addition: Warning messages:
    # 1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
    #   GDAL Error 1: database disk image is malformed
    # 2: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
    #   GDAL Error 1: sqlite3_prepare_v2(SELECT COUNT(*) FROM sqlite_master WHERE name IN ('gpkg_metadata', 'gpkg_metadata_reference') AND type IN ('table', 'view')) failed: database disk image is malformed

jldupouey avatar Jan 03 '24 14:01 jldupouey

Hi @jldupouey could you check if https://github.com/r-lib/archive/pull/80 (or #99) fixes the issue?

cielavenir avatar Mar 18 '24 00:03 cielavenir

alternatively you can use non-Windows platforms including WSL

cielavenir avatar Mar 18 '24 02:03 cielavenir

Hi @jldupouey and @cielavenir,

I can easily reproduce this with the reprex below. For me, it seems that both #80 and #99 fix this. If I understand well, it is "R-CMD-check / ubuntu-latest (release) (pull_request)" in GHA CI for #99 that is preventing this from being merged at the moment?

library(archive)
saveRDS(cars, "cars.rds")
archive_write_files("cars.7z", "cars.rds")
archive_extract("cars.7z")
readRDS("cars.rds") |> head()
#> Error in readRDS("cars.rds"): ReadItem: unknown type 0, perhaps written by later version of R

Created on 2024-07-03 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Belgium.utf8
#>  ctype    Dutch_Belgium.utf8
#>  tz       Europe/Brussels
#>  date     2024-07-03
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>  D archive     * 1.1.8   2024-04-28 [1] RSPM
#>    cli           3.6.3   2024-06-21 [1] RSPM
#>    digest        0.6.36  2024-06-23 [1] RSPM
#>    evaluate      0.24.0  2024-06-10 [1] RSPM
#>    fansi         1.0.6   2023-12-08 [1] RSPM (R 4.4.0)
#>    fastmap       1.2.0   2024-05-15 [1] RSPM
#>    fs            1.6.4   2024-04-25 [1] RSPM (R 4.4.0)
#>    glue          1.7.0   2024-01-09 [1] RSPM
#>    htmltools     0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#>    knitr         1.47    2024-05-29 [1] RSPM
#>    lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
#>    magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
#>    pillar        1.9.0   2023-03-22 [1] RSPM (R 4.4.0)
#>    pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.4.0)
#>    purrr         1.0.2   2023-08-10 [1] RSPM (R 4.4.0)
#>    R.cache       0.16.0  2022-07-21 [1] RSPM
#>    R.methodsS3   1.8.2   2022-06-13 [1] RSPM
#>    R.oo          1.26.0  2024-01-24 [1] RSPM
#>    R.utils       2.12.3  2023-11-18 [1] RSPM
#>    reprex        2.1.0   2024-01-11 [1] RSPM (R 4.4.0)
#>    rlang         1.1.4   2024-06-04 [1] RSPM
#>    rmarkdown     2.27    2024-05-17 [1] RSPM
#>    rstudioapi    0.16.0  2024-03-24 [1] RSPM (R 4.4.0)
#>    sessioninfo   1.2.2   2021-12-06 [1] RSPM
#>    styler        1.10.3  2024-04-07 [1] RSPM
#>    tibble        3.2.1   2023-03-20 [1] RSPM (R 4.4.0)
#>    utf8          1.2.4   2023-10-22 [1] RSPM (R 4.4.0)
#>    vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
#>    withr         3.0.0   2024-01-16 [1] RSPM (R 4.4.0)
#>    xfun          0.45    2024-06-16 [1] RSPM
#>    yaml          2.3.8   2023-12-11 [1] RSPM (R 4.4.0)
#> 
#>  [1] C:/Users/brogiers/AppData/Local/R/win-library/4.4
#>  [2] C:/Program Files/R/R-4.4.1/library
#> 
#>  D ── DLL MD5 mismatch, broken installation.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

rogiersbart avatar Jul 03 '24 08:07 rogiersbart

@rogiersbart sorry, i have not touched O_BINARY issue after I posted https://github.com/r-lib/archive/pull/73#issuecomment-1875688520 as they dont accept my attempt to unify coding styles spanning around multiple files.

cielavenir avatar Jul 03 '24 09:07 cielavenir

Ok, I see, thanks for the feedback.

rogiersbart avatar Jul 03 '24 10:07 rogiersbart