tiledb_put_metadata only saving first element of character vector
On my machine, tiledb_put_metadata will only save one (the first) element of a character vector, but all elements of a numeric or integer vector. I am not sure whether that is the expected behavior.
library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
tiledb_array_open(arr, "WRITE")
tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
tiledb_put_metadata(arr, "character_key", c("value_1", "value_2"))
tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)
Result
character_key: value_1
integer_key: 1, 2
numeric_key: 0.5, 1.5
sessionInfo
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppSpdlog_0.0.14 tiledb_0.21.1
loaded via a namespace (and not attached):
[1] zoo_1.8-12 bit_4.0.5 compiler_4.3.1 tools_4.3.1 RcppCCTZ_0.2.12
[6] rstudioapi_0.15.0 spdl_0.0.5 Rcpp_1.0.11 bit64_4.0.5 nanotime_0.3.7
[11] grid_4.3.1 lattice_0.21-8
I believe this to be a documented constraint: essentially a 'string' is already a vector of char, so you would have to do something like paste( c("value1", "value2"), collapse=";") to create a single vector. That single vector then become a (single column) char array on disk.
While not ideal, you could also combine it with JSON writers / parser to write for complex structures.
Yeah, I (kind of) see what you mean. Just found this as well: https://github.com/TileDB-Inc/TileDB-R/pull/168#issuecomment-689226600. paste with collapse seems a better solution rn.
I will leave this open because this could do with added documentation.
Kudos by the way for sending a perfect demonstration. Here is a slightly mod'ed version:
#!/usr/bin/env Rscript
library(tiledb)
pth <- tempfile()
dir.create(pth)
dm <- tiledb_domain(dims = c(tiledb_dim("d1", c(1L, 10L), type = "INT32")))
sch <- tiledb_array_schema(dm, attrs = c(tiledb_attr("a1", type = "INT32")), sparse = TRUE)
ign <- tiledb_array_create(pth, sch)
arr <- tiledb_array(pth, "WRITE")
ign <- tiledb_array_open(arr, "WRITE")
ign <- tiledb_put_metadata(arr, "numeric_key", c(0.5, 1.5))
ign <- tiledb_put_metadata(arr, "integer_key", c(1L, 2L))
ign <- tiledb_put_metadata(arr, "character_key", paste(c("value_1", "value_2"), collapse=";"))
ign <- tiledb_array_close(arr)
arr <- tiledb_array_open(arr, "READ")
allmd <- tiledb_get_all_metadata(arr)
print(x = allmd)
and it shows:
$ ./gh_issue_626.R
character_key: value_1;value_2
integer_key: 1, 2
numeric_key: 0.5, 1.5
$
I will leave this open because this could do with added documentation.
saw this too late, re-opened now