disk.frame icon indicating copy to clipboard operation
disk.frame copied to clipboard

Order of results when using data.table interface

Open lukeReilly opened this issue 5 years ago • 1 comments

Hi,

When using the data.table interface, does disk.frame guarantee that the results will always be returned in the same order (when using multiple workers)?

e.g.

library(disk.frame) dt = data.table(mtcars) dt[, row := .I] df = as.disk.frame(dt, nchunks = 32) df[, row]

returns chunks in the order 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 4 5 6 7 8 9

Will these always be returned in the same order?

Also, perhaps by default it might be better to return results in numeric chunk order if this is the case?

lukeReilly avatar Jul 16 '20 18:07 lukeReilly

Will these always be returned in the same order?

The order of return is always the same, but in general, the user shouldn't rely on that being the case, as it may change in the future. Currently, it returns the chunks in the string sort order which can be improved.

If your dataset is truly large, then df[, row, keep="row"] will be much faster

xiaodaigh avatar Jul 16 '20 22:07 xiaodaigh