ExcelFiles.jl
ExcelFiles.jl copied to clipboard
Medium term plan: Use XLSX.jl and LibXLS.jl instead of ExcelReader.jl
Main benefit would be that we can get rid of the Python dependency that ExcelReaders.jl brings along and still support both old a new Excel file formats. The Python dependency has been problematic pretty regularly in terms of deployment.
Stuff still todo:
- [ ] get LibXLS.jl in shape. I got the cross building sorted out, and I have some local code that can read meta data from old school Excel files, but there is still a lot to finish before this is ready.
- [ ] Do some performance comparisons of ExcelReader.jl and XLSX.jl, just to be sure (I don't really expect any real issues there)
- [ ] Code things up here :)
CC @felipenoris
I compared XLSX.jl vs. CSV.jl for a data set of approx 160 MB and saw a huge performance difference.
julia> @time d1 = DataFrame(CSV.File("demo.csv"));
0.263584 seconds (2.01 M allocations: 531.367 MiB)
julia> @time d2 = DataFrame(XLSX.readtable("demo.xlsx", 1)...)
100.617655 seconds (489.23 M allocations: 17.850 GiB, 28.54% gc time)
Also the memory and compile time fingerprints differ by an impressive amount. Maybe anonymous functions are used in a loop? It would be definitely good to have a performant reader for xlsx files ...