est_result_size
Hi,
I have a 3D sparse array with the dimensions: time[0,17600], latitutde[-89.75, 89.75], longitude [-179.75, 179.75], all of type double and one attribute of type float. Up to now I hardcoded the buffer sizes for querying that array, but I now need to make my solution more flexible.
To do so I'm trying to use:
uint64_t est_result_size(const std::string &attr_name) const
When I have a subarray like: [1,1,-89.75, 89.75,-179.75, 179.75], and call est_result_size() for my attribute, I get 1 as a result while it should be 259200. But when I set the subarray to [1,2,-89.75, 89.75,-179.75, 179.75] I get 259200 which should be 518400.
If feels like est_result_size() is using excluding ranges while the query itself uses inclusive ranges.
Any hints on how to handle this, or is this a bug ?
Hi @aosterthun, apologies for the late reply.
This function returns an estimate, with the constraint that it should not execute the whole query (otherwise it would return the correct answer, but you would have to run the query twice). The way it works is as follows.
- It calculates which tile MBRs from the R-tree your subarray intersects.
- It calculates the ratio of the intersection between your subarray and each intersecting MBR.
- It multiplies the ratios with the tile actual sizes (included in the fragment metadata, which are fast to retrieve) and sums all these values up
For your unary subarray, probably the ratio is so small that the value is close to zero and TileDB returns the ceiled value (1 in this case). For the larger subarray, the estimate is wrong as TileDB assumes uniform distribution of points within the tile MBR. If you have multiple fragments, the estimate may become even more inaccurate.
Estimating results sizes in the sparse case is a notoriously difficult problem (as you can only have a hint of the data distribution based on the R-tree MBRs). To mitigate it, you can use any reasonably sized buffer and then just take advantage of TileDB's incomplete query functionality.
I hope this helps.