Parse query condition gives error when using a variable
I am creating a R Shiny App where I want users to be able to define the query ranges for the TileDB array that I have. In the code snippet below I have a slider that the user can adjust to change the query. This query is successful if I used the first option, but if I use the hslider variable I get an error.
q1 <- parse_query_condition("A-Count"> 6, use_int64 = TRUE)
hslider = 6
q1 <- parse_query_condition("A-Count"> hslider, use_int64 = TRUE)
(Error: [TileDB::QueryCondition] Error: Value node condition value size mismatch: 8 != 7).
I was able to successfully use the tiledb_query_condition_init function with a variable like the one below, but I want to include queries that have more than two conditions in it and I don't see any way to chain more than 2, using the tiledb_query_condition_combine) function.
qc <- tiledb_query_condition_init("total_count", 10, "INT64", "GT")
Is there any other way to combine multiple queries (more than 2) using the query_condition function or to use a variable in the parse_query_condition function?
Thanks for opening the issue. Your best option is to supplying the actual array to the parse_query_condition helper so it can look up the variable type from the name you supplied. An example from the unit tests:
arr <- tiledb_array(uri)
qc1 <- parse_query_condition(island %in% c("Dream", "Biscoe"), arr)
res1 <- tiledb_array(uri, query_condition = qc1[]
This works with all types of column, and it most helpful when the the simple guess based on the comparison value goes wrong: seeing an int does not lead us to an int64 column, and also gets an error with double.
Combining mulitple queries works by nesting. Say you have queries A, B, C. You can combine A and B into a query D, and thereafter combine C and D into a query E. When you submit E, it's combined effect is evluated in the query context. The combine queries you can use the corresponding function, it supports boolean 'AND' and 'OR".
You can also write it out. An example from the unit tests:
qc <- parse_query_condition(species == "Adelie" || species == "Chinstrap" && year >= 2009)
(This query gets by without array information because text values imply ASCII, and an int value implies INT. Supplying the array is more precise and sometimes needed.)
Thanks for the super quick response, it is much appreciated. I just have a question about the forum that you mentioned. This is the first thing I have reported to my knowledge unless creating the issue here posts to the forum as well? The only forum I am aware of is https://forum.tiledb.com/.
I am able to successfully use the nesting of multiple queries and the tiledb_query_condition_init function, but I still experience an issue with the parse_query condition function, even when including the array as suggested.
Following the example you posted above, I have
hslider = as.integer(60000)
arr <- tiledb_array(uri)
qc1 <- parse_query_condition("A-Count" > hslider, use_int64 = TRUE,arr)
This creates a tiledb_query_condition object, but it shares the warning seen below. Upon the next line of code it does not take the query condition into account at all and returns the entire array without taking into consideration the query condition supplied in qc1.
res1 <- tiledb_array(uri, query_condition = qc1)[]
Warning message:
In stopifnot(Argument 'qc' with query condition object required = is(qc, :
NAs introduced by coercion
I had myself confused -- I saw your post first in our internal slack where we echo github issues. So there was no earlier post and I edited that out, my bad for any confusion. That said, our community slack is good and has a channel for tiledb-r. You may like it, and this link from the bottom of our homepage should get you started.
You should be able to get both muliplle conditions combined, and conditons created on the fly based on variable names. The parse_query_conditions() functions expects an R object of type expression so you may want to create a standard textstring first and then convert it and then use a function such as str2lang() to convert it. If you get lost I can probably mock something based on the popular penguins example.
I just joined the slack and really appreciate you taking your time to assist me with this. I would be grateful if you could make an example like you suggest as I am indeed lost and not having any success using parse or str2lang()with parse_query_condition().
When using those functions and passing along the expression into parse_query_condition(), I get the following errors, even without using a variable for my value. Likely I am missing something.
AB = str2lang("A-Count > 60000")
qc <- parse_query_condition(AB, use_int64 = TRUE,arr)
Error in .makeExpr(e, debug) : Unexpected symbol in expression: AB
qc <- parse_query_condition(str2lang("A-Count > 60000"), use_int64 = TRUE,arr)
Error in .makeExpr(e, debug) :
Unexpected token in expression: str2lang("A-Count > 60000")
Below is a full example running on my standard example data set from the palmerpenguins package (written with one call to fromDataFrame(), no extra options). It contains the critically important step of calling the str2lang which I had omitted. I will add something like this to the docs.
library(tiledb)
uri <- "/tmp/tiledb/penguins"
# as needed: fromDataFrame(palmerpenguins::penguins, uri)
arr <- tiledb_array(uri)
cols <- c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g")
set.seed(42) # reproduce
col <- sample(cols, 1)
txt <- paste(col, "> 50") # arbitrary, replaced by your shiny app
cat("Setting '", txt, "'\n", sep="")
parsed <- do.call(what = parse_query_condition, args = list(expr = str2lang(txt), ta = arr))
arr2 <- tiledb_array(uri, return_as="data.table", query_condition=parsed)[]
summary(arr2[,4:7])
When I run this I get
> summary(arr2[,4:7])
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
Min. :50.1 Min. :14.2 Min. :181 Min. :3250
1st Qu.:50.6 1st Qu.:16.0 1st Qu.:198 1st Qu.:3775
Median :51.3 Median :18.1 Median :206 Median :4375
Mean :51.9 Mean :17.8 Mean :210 Mean :4579
3rd Qu.:52.2 3rd Qu.:19.2 3rd Qu.:223 3rd Qu.:5500
Max. :59.6 Max. :20.8 Max. :231 Max. :6050
> cat("Setting '", txt, "'\n", sep="")
Setting 'bill_length_mm > 50'
>
showing the 'larger than 50' condition held on the (randomly) picked column 'bill_length_mm'.
Thank you so much. I just used your example and was able to successfully get it to work.
Perfect! I will leave the issue open to ensure I add the conversion at the package level -- may as well make it work from a character variable.