FSharp.Data icon indicating copy to clipboard operation
FSharp.Data copied to clipboard

type construction very slow for CSV Files with large number of variables e.g. 1000+

Open fwaris opened this issue 7 years ago • 2 comments

For a CSV file with 1000+ vars, the system takes a while (several minutes) to get the type defined (even with a small number of rows in the file)

Also the memory consumed approaches 8G.

I routinely encounter files with 100's and sometimes 1000+ columns. Not sure if FSharp.Data can be optimized for reading wide files faster but hopefully someone can shed some light. I don't have time now to dig into it right now.

Faisal

fwaris avatar Sep 10 '18 16:09 fwaris

here is an anonymized version of the CSV sample file that seems to cause this issue.

data_sample.zip

Sample code:

open FSharp.Data

[<Literal>]
let data_file = @"data_sample.csv"

type Tdata = CsvProvider< data_file >

let tdata = Tdata.GetSample()
tdata.Headers.Value.Length

fwaris avatar Sep 12 '18 13:09 fwaris

If you have profiling tools you might want to profile the execution of fsc.exe compiling this file, and show the top inclusive and top exclusive methods?

dsyme avatar Sep 14 '18 10:09 dsyme