Order of results when using data.table interface
Hi,
When using the data.table interface, does disk.frame guarantee that the results will always be returned in the same order (when using multiple workers)?
e.g.
library(disk.frame) dt = data.table(mtcars) dt[, row := .I] df = as.disk.frame(dt, nchunks = 32) df[, row]
returns chunks in the order 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 4 5 6 7 8 9
Will these always be returned in the same order?
Also, perhaps by default it might be better to return results in numeric chunk order if this is the case?
Will these always be returned in the same order?
The order of return is always the same, but in general, the user shouldn't rely on that being the case, as it may change in the future. Currently, it returns the chunks in the string sort order which can be improved.
If your dataset is truly large, then df[, row, keep="row"] will be much faster