ExcelFiles.jl icon indicating copy to clipboard operation
ExcelFiles.jl copied to clipboard

Medium term plan: Use XLSX.jl and LibXLS.jl instead of ExcelReader.jl

Open davidanthoff opened this issue 6 years ago • 1 comments

Main benefit would be that we can get rid of the Python dependency that ExcelReaders.jl brings along and still support both old a new Excel file formats. The Python dependency has been problematic pretty regularly in terms of deployment.

Stuff still todo:

  • [ ] get LibXLS.jl in shape. I got the cross building sorted out, and I have some local code that can read meta data from old school Excel files, but there is still a lot to finish before this is ready.
  • [ ] Do some performance comparisons of ExcelReader.jl and XLSX.jl, just to be sure (I don't really expect any real issues there)
  • [ ] Code things up here :)

CC @felipenoris

davidanthoff avatar Feb 20 '19 03:02 davidanthoff

I compared XLSX.jl vs. CSV.jl for a data set of approx 160 MB and saw a huge performance difference.

julia> @time d1 = DataFrame(CSV.File("demo.csv"));
  0.263584 seconds (2.01 M allocations: 531.367 MiB)

julia> @time d2 = DataFrame(XLSX.readtable("demo.xlsx", 1)...)
100.617655 seconds (489.23 M allocations: 17.850 GiB, 28.54% gc time)

Also the memory and compile time fingerprints differ by an impressive amount. Maybe anonymous functions are used in a loop? It would be definitely good to have a performant reader for xlsx files ...

hhaensel avatar Mar 30 '21 08:03 hhaensel