Data.Analysis : float/double values don't convert properly with . decimals in csv file
If I have a csv dataframe that looks like this:
Date;Value 12-04-1989;20,7 03-11-1990;22,1
Then my two float values will properly be loaded when I call DataFrame.LoadCsv (with values of 20.7 and 22.1)
But if my separator is the default one (',') and my floats are with a dot instead, they will get ignored and I will get 207 and 221 as values.
The issue comes from here : https://github.com/dotnet/corefxlab/blob/master/src/Microsoft.Data.Analysis/DataFrame.cs#L488
Since if you write object value = Convert.ChangeType("20.7", typeof(double)); you will get 207 as a result.
Apparently you need to add CultureInfo.InvariantCulture to the function if your double/float has dots instead of commas:
https://stackoverflow.com/questions/7725020/how-to-use-convert-changetype-when-conversiontype-is-decimal-and-input-is-40
So a potential fix could be to check whether the type of the column is a double/float and call ChangeType in the right way depending on whether the value(s) is described with dots or commas. I can open a branch for this if the maintainers agree with this approach.
So a potential fix could be to check whether the type of the column is a double/float and call ChangeType in the right way depending on whether the value(s) is described with dots or commas. I can open a branch for this if the maintainers agree with this approach.
We may also think about accepting a CultureInfo in the public API. That way the caller can specify what culture to use to parse numbers. The caller is going to know best whether they want CurrentCulture or InvariantCulture.
@chriss2401 - if you have a fix and test cases, feel free to send a PR and @pgovind and I can take a look.
Note also dotnet/corefxlab#2927 is re-doing how LoadCsv is implemented. So we may want to consider that as well.
@eerhardt Good idea. I can implement this in a couple of days unless you guys decide to add it to dotnet/corefxlab#2927 , which in that case I can just close the issue.
I think taking a CultureInfo in the public API is the right approach. @chriss2401 : If you have a fix and unit tests, I don't mind accepting it into either DataFrame.LoadCsv or dotnet/corefxlab#2927 (feel free to push a commit to that PR). If you make a new PR, I'll port your fix to dotnet/corefxlab#2927 when I go through there next :)
Sounds good @pgovind , I'll push this and dotnet/corefxlab#2902 in dotnet/corefxlab#2927 in the next couple of days. I think I will clean up some duplicate code I saw as well that I saw in that PR :)
Will this ever be fixed? For people outside of the US, the DataFrame.LoadCsv method doesn't exist in practice.
I wouldn't mind re-taking a look at this at some point, just don't know exactly when that would be :)