ExcelProvider icon indicating copy to clipboard operation
ExcelProvider copied to clipboard

Feature request: read more rows to infer column type

Open jackfoxy opened this issue 7 years ago • 5 comments

Description

I don't know how many rows the TP reads before attempting to infer column types, but it is not enough. Type inference is frequently incorrect because the TP did not recognize (for instance) that a blank cell appears in the column, which should render it an option type, or that a decimal number appears further down a column, and so the type should not be int.

Conduct some experiments to come up with a greater number of rows consumed to do type inference, but not so many it impacts performance too much. More rows is better.

Known workarounds

ForceString - not a nice alternative

jackfoxy avatar Aug 15 '18 00:08 jackfoxy

Please, this is also my issue. Can we get a bit more extended feedback here= Is it really the number of rows that the TP reads? Please - at least - clarify how many are they in the current release as it is, before suspending the whole request as an enhancement. Thank you very much for understanding.

Btw, as far as I can see, if a blank cell appears in a column and the other cells are int (for example) the inferred type is obj while it should be an int option (in this example)...

I share here my workaround based on TryGetValue and this helper:

let getOptionFromExcel (optional:obj) =
    match optional with
    | :? int as a -> Some (double a)
    | :? double as d -> Some d
    | _ -> None

as well as

let getDecimalOptionFromExcel : obj -> decimal option =
    getDoubleOptionFromExcel >> Option.map decimal

Usage examples

SizeTON = e.TryGetValue 5 "Size TON" |> getDecimalOptionFromExcel
SizeCbm = e.TryGetValue 6 "Size cbm" |> getDecimalOptionFromExcel
 QtyTons = 
     e.TryGetValue 5 "Size TON"
     |> getOptionFromExcel
     |> function
     | Some n -> decimal n
     | None -> 0m
     // instead of    
     // decimal e.``Size TON``; //https://github.com/fsprojects/ExcelProvider/issues/56

giuliohome avatar Feb 06 '20 10:02 giuliohome

@giuliohome when I looked, the number of lines was easy to find in the code. IIRC it was something like 25 or 50. I haven't used this TP in well over a year, so I have no current feedback. My memory is it could be a lot better. That said, even with improvements I have no plans to use it in the future.

jackfoxy avatar Feb 06 '20 18:02 jackfoxy

Hi @jackfoxy thanks for your reply. It' strange, I have a blank after 3 or 4 lines and the field is interpreted as a float/double... and it throws an exception on the blank/row Will double check it again, if I overlooked anything or if I introduced the bug in my part of code (actually I like my "workaround" anyway...)

giuliohome avatar Feb 06 '20 18:02 giuliohome

@giuliohome I recall other typing issues. You probably found one I experienced. I filed 3 issues, all on the same day, which addressed the most egregious problems I was having. I think it was not long after this I dismantled all the Excel processing and banned Excel file submission in our company. That's how I solved the problem.

jackfoxy avatar Feb 07 '20 00:02 jackfoxy

@jackfoxy thanks for sharing your experience and feel free to unsubscribe from the issue.

Really, I was writing to the maintainers of this project.

giuliohome avatar Feb 07 '20 05:02 giuliohome