TileDB-R icon indicating copy to clipboard operation
TileDB-R copied to clipboard

tiledb_put_metadata only saving first element of character vector

Open PedroMilanezAlmeida opened this issue 2 years ago • 5 comments

On my machine, tiledb_put_metadata will only save one (the first) element of a character vector, but all elements of a numeric or integer vector. I am not sure whether that is the expected behavior.

library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
tiledb_array_open(arr, "WRITE")
tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
tiledb_put_metadata(arr, "character_key", c("value_1", "value_2"))
tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)

Result

character_key:	value_1
integer_key:	1, 2
numeric_key:	0.5, 1.5

sessionInfo

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppSpdlog_0.0.14 tiledb_0.21.1    

loaded via a namespace (and not attached):
 [1] zoo_1.8-12        bit_4.0.5         compiler_4.3.1    tools_4.3.1       RcppCCTZ_0.2.12  
 [6] rstudioapi_0.15.0 spdl_0.0.5        Rcpp_1.0.11       bit64_4.0.5       nanotime_0.3.7   
[11] grid_4.3.1        lattice_0.21-8   

PedroMilanezAlmeida avatar Nov 30 '23 21:11 PedroMilanezAlmeida

I believe this to be a documented constraint: essentially a 'string' is already a vector of char, so you would have to do something like paste( c("value1", "value2"), collapse=";") to create a single vector. That single vector then become a (single column) char array on disk.

While not ideal, you could also combine it with JSON writers / parser to write for complex structures.

eddelbuettel avatar Nov 30 '23 21:11 eddelbuettel

Yeah, I (kind of) see what you mean. Just found this as well: https://github.com/TileDB-Inc/TileDB-R/pull/168#issuecomment-689226600. paste with collapse seems a better solution rn.

PedroMilanezAlmeida avatar Nov 30 '23 22:11 PedroMilanezAlmeida

I will leave this open because this could do with added documentation.

eddelbuettel avatar Nov 30 '23 22:11 eddelbuettel

Kudos by the way for sending a perfect demonstration. Here is a slightly mod'ed version:

#!/usr/bin/env Rscript

library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
ign <- tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
ign <- tiledb_array_open(arr, "WRITE")
ign <- tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
ign <- tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
ign <- tiledb_put_metadata(arr, "character_key", paste(c("value_1", "value_2"), collapse=";"))
ign <- tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)

and it shows:

$ ./gh_issue_626.R 
character_key:  value_1;value_2
integer_key:    1, 2
numeric_key:    0.5, 1.5
$ 

eddelbuettel avatar Nov 30 '23 22:11 eddelbuettel

I will leave this open because this could do with added documentation.

saw this too late, re-opened now

PedroMilanezAlmeida avatar Nov 30 '23 22:11 PedroMilanezAlmeida