Mark Klik

Results 66 issues of Mark Klik

And add a section on benchmarking on https://fstpackage.github.io

The conversion needs very little memory, as we can use the `rbind` functionality of `fst` to append chunks from the `csv` file. The resulting `fst` file would have random row...

feature request

This would reduce the memory footprint of writing to a `fst` file. It is also possible to use a `parLapply` approach, where data is generated in a parallel, but serialized...

feature request

By setting the first parameter to a vector of file names.

feature request

That would significantly reduce the overhead when these columns are selected, especially when they are selected in the order in which they were stored

enhancement

Library [asmlib](http://www.agner.org/optimize/#asmlib) contains optimized C++ code for common string methods (`strlen`, `strcopy`, `strcmp`) which are also used in `fst` (mainly for serialization of `character` columns).

enhancement

The VCL vector class library is a tool that allows for much faster C++ code by handling multiple data in parallel using SIMD instructions. The highest available SIMD instruction set...

enhancement

The `integer64` type uses a `double` vector at it's base. Currently, this vector is compressed using the same compression schemes ar for `double` type columns. But knowing the type is...

The build-in dictionary could speed-up compression of character columns significantly. Alternatively, we could use the ZSTD compressor with a pre-trained dictionary.

feature request

By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of...

enhancement