tabplot data.table implementation

directory sandbox currently contains a bin_data implementation using data.table.

For discussion, my suggestion is that this would be the standard implementation and that change the ffbase dependency into a Suggest, so that it is still possible to use extremely large data files without resorting to ff/ffbase. @mtennekes any thoughts?

Feb 27 '21 13:02 edwindj

Thanks @edwindj ! Make sense.

I image three variants on how to set the engine:

Use an argument in tableplot, e.g. use_ff.
Use a global option.
Use data.table if the data frame is smaller than some size-treshhold, and otherwise use ff.

Which do you prefer? (ping @cfholbert, @RobertSellers, @sfd99)

Just started the data.table implementation in the datatable branch, using option 1 (use_ff argument) for now. However, it doesn't work yet. bin_data expects a prepared object, but bin_data_dt (as I renamed it) expects a dataframe/table. Even with p <- p$data I got a strange looking tableplot:

If we are going to use data.table and move ff, bit and ffbase into suggests, then we'll have to re-implement (or skip) tableprepare, and re-implenent bin_hcc_data as well.

Unfortunately, I have almost no time to do this myself this year (even though it shouldn't take much time).

Mar 22 '21 15:03 mtennekes

Yeah, we would need to skip tablePrepare for the data.table implementation.

Alternative approach:

when the tableplot function is called with a data.frame/data.table we use the data.table implementation (because appearantly the data fits into memory...
when the tableplot function is called with a ff (or csv file), then we use the ff implementation.

Mar 22 '21 16:03 edwindj

Good idea!

Mar 22 '21 16:03 mtennekes

I suspect it might be less work than you think because the ff code is mostly separated from the rest. I will give it a try next week in branch data.table?

Mar 23 '21 07:03 edwindj

That would be great.

Mar 23 '21 07:03 mtennekes