Mark Klik issues

Results 66 issues of


                                            Mark Klik

Benchmark on well-known data sets

And add a section on benchmarking on https://fstpackage.github.io

Convert a csv file directly to a fst file

The conversion needs very little memory, as we can use the `rbind` functionality of `fst` to append chunks from the `csv` file. The resulting `fst` file would have random row...

feature request

Write to fst binary file with an apply-like method

This would reduce the memory footprint of writing to a `fst` file. It is also possible to use a `parLapply` approach, where data is generated in a parallel, but serialized...

feature request

read.fst reads multiple fst files into a single data set

By setting the first parameter to a vector of file names.

feature request

Adjacent column with identical types are stored as a matrix internally

That would significantly reduce the overhead when these columns are selected, especially when they are selected in the order in which they were stored

enhancement

Use asmlib library for common C++ string methods

Library [asmlib](http://www.agner.org/optimize/#asmlib) contains optimized C++ code for common string methods (`strlen`, `strcopy`, `strcmp`) which are also used in `fst` (mainly for serialization of `character` columns).

enhancement

Use vectorclass library for an interface to the highest available SIMD instruction set

The VCL vector class library is a tool that allows for much faster C++ code by handling multiple data in parallel using SIMD instructions. The highest available SIMD instruction set...

enhancement

More effective compression for integer64 type

The `integer64` type uses a `double` vector at it's base. Currently, this vector is compressed using the same compression schemes ar for `double` type columns. But knowing the type is...

Increase compression performance for text with brotli

The build-in dictionary could speed-up compression of character columns significantly. Alternatively, we could use the ZSTD compressor with a pre-trained dictionary.

feature request

Conditional read on a fst file

By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of...

enhancement