Casting strings to double using `with { it.toDouble()}` and `toDouble()` gives different results
Reproduce
- Take the ramen dataset: https://www.kaggle.com/code/sujan97/complete-analysis-of-ramen-ratings/input
val df = DataFrame.readCSV("ramen-ratings.csv").renameToCamelCase()
df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble()
- convert
starscolumn to a double type
df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble()
Expected
df.filter { !stars.startsWith("Un") }.convert { stars }.with { it.toDouble() }
result:
review# brand variety style country stars topTen
0 2580 New Touch T's Restaurant Tantanmen Cup Japan 3,75 null
1 2579 Just Way Noodles Spicy Hot Sesame Spicy Hot Se... Pack Taiwan 1,00 null
2 2578 Nissin Cup Noodles Chicken Vegetable Cup USA 2,25 null
3 2577 Wei Lih GGE Ramen Snack Tomato Flavor Pack Taiwan 2,75 null
4 2576 Ching's Secret Singapore Curry Pack India 3,75 null
Actual
review# brand variety style country stars topTen
0 2580 New Touch T's Restaurant Tantanmen Cup Japan 375,0 null
1 2579 Just Way Noodles Spicy Hot Sesame Spicy Hot Se... Pack Taiwan 1,0 null
2 2578 Nissin Cup Noodles Chicken Vegetable Cup USA 225,0 null
3 2577 Wei Lih GGE Ramen Snack Tomato Flavor Pack Taiwan 275,0 null
4 2576 Ching's Secret Singapore Curry Pack India 375,0 null
Version and Environment
Name: kotlin-jupyter-kernel, Version: 0.11.0.385
dataframe version: 0.12.1
Thanks @devcrocod
I'm sorry, I cannot reproduce it directly. It returns the same result for me.
It might be a locale thing (as I see your Doubles have "," instead "."). Convert relies on parse to parse Strings. It defaults to your system locale and interprets "," as the decimal splitter and "." as the thousands splitter.
This may be different from the default String.toDouble() function from the stdlib you call the other time.
I feel like this is intended behavior, though a bit unfortunate in this example.
Since you're trying to parse a String I'd recommend using parse as you can define extra ParserOptions, such as a Locale.
Yes, this is a problem specifically with the locale. But I expect to get one result:
df.filter { !stars.startsWith("Un") }.convert { stars }.with { it.toDouble() },
df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble()
Because in my opinion, toDouble() is just a shortcut for with.
Yes, this is a problem specifically with the locale. But I expect to get one result:
df.filter { !stars.startsWith("Un") }.convert { stars }.with { it.toDouble() },df.filter { !stars.startsWith("Un") }.convert { stars }.toDouble()Because in my opinion,
toDouble()is just a shortcut forwith.
I know, it should, but I'd argue our solution is "better" as it takes locale into account. It's the stlib toDouble() function that should change, but that's not something we can do.