ExcelFiles.jl icon indicating copy to clipboard operation
ExcelFiles.jl copied to clipboard

xlsx files not loading

Open pstaabp opened this issue 5 years ago • 3 comments

I just tried to load an xlsx file using the load function.

Evidently, since down deep this depends on the python xlrd package, this is no longer supported:

There's a disclaimer on the website

pstaabp avatar Feb 26 '21 15:02 pstaabp

I did notice that #26 will obvious fix this.

pstaabp avatar Feb 26 '21 17:02 pstaabp

As a workaround, you can downgrade to the last 1.x release of xlrd

using Conda
Conda.add("xlrd==1.2.0")

This could probably be pinned here - https://github.com/queryverse/ExcelReaders.jl/blob/master/src/ExcelReaders.jl#L12

chris-b1 avatar Mar 26 '21 17:03 chris-b1

If speed is your concern for large data files (as for me), you can gain a factor of 2 by using pandas via PyCall:

EDIT: There is still an error in this function, sorry

using PyCall, DataFrames
pd = pyimport("pandas")

function read_excel(f; kwargs...)
  pdf = pd.read_excel(f; kwargs...)
  DataFrame(Any[pdf.values[:, i] for i in 1:size(pdf.values, 2)], Symbol.(pdf.columns))
end

Forcing the openpyxl engine, as recommended by xlrd, shows again worse performance...

julia> @time DataFrame(load(f, "Tabelle1"));
  0.149713 seconds (221.37 k allocations: 6.085 MiB, 12.28% gc time)

julia> @time read_excel(f);
  0.077299 seconds (1.12 k allocations: 2.093 MiB)

julia> @time read_excel(f, engine = "openpyxl");
  0.135302 seconds (1.13 k allocations: 2.094 MiB)

hhaensel avatar Mar 31 '21 23:03 hhaensel