Mark Klik
Mark Klik
And add a section on benchmarking on https://fstpackage.github.io
The conversion needs very little memory, as we can use the `rbind` functionality of `fst` to append chunks from the `csv` file. The resulting `fst` file would have random row...
This would reduce the memory footprint of writing to a `fst` file. It is also possible to use a `parLapply` approach, where data is generated in a parallel, but serialized...
By setting the first parameter to a vector of file names.
That would significantly reduce the overhead when these columns are selected, especially when they are selected in the order in which they were stored
Library [asmlib](http://www.agner.org/optimize/#asmlib) contains optimized C++ code for common string methods (`strlen`, `strcopy`, `strcmp`) which are also used in `fst` (mainly for serialization of `character` columns).
The VCL vector class library is a tool that allows for much faster C++ code by handling multiple data in parallel using SIMD instructions. The highest available SIMD instruction set...
The `integer64` type uses a `double` vector at it's base. Currently, this vector is compressed using the same compression schemes ar for `double` type columns. But knowing the type is...
The build-in dictionary could speed-up compression of character columns significantly. Alternatively, we could use the ZSTD compressor with a pre-trained dictionary.
By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of...