mismatch between ctypes an values types
I have this code snippet
const danfojs = require("danfojs-node");
(async () => {
let data = await danfojs.readCSV(
`${__dirname}/plan.csv`,
{
dynamicTyping: {
"variation": false
}
}
)
console.log(
data.columns.map((item, index) => {
return {
[item]: data.ctypes.values[index]
}
})
)
console.log(
data
.column("variation")
.unique()
.values[0]
)
console.log(
typeof data
.column("variation")
.unique()
.values[0]
)
})()
The output is
[
...
{ market_segment: 'string' },
{ variation: 'int32' },
{ '': 'int32' }
]
00
string
as you can see ctypes shows variation column has 'int32' value
however, all values in the column are strings: '00'
My expectation it ctype should also be a string to indicate actual value in the column
I'm also having this issue with my AVG and SUM in https://github.com/javascriptdata/danfojs/discussions/666
I can confirm this bug comes from dynamicTyping: { "variation": false } being passed to PapaParse, so it correctly parse variation column type as string (thus typeof variation column -> string), however, DataFrame infers dtype from $typeChecker in shared/utils.ts, this method purely infers dtype from if (!isNaN(Number(ele))) -> numeric, and it only accepts data array itself as param, not the additional options passed to PapaParse, this results in wrongly inferred dtype.
However, inferDtype is called throughout the library so fixing this bug will result in significant internal API change. Not sure if this worth refactoring before we can discuss with the maintainer.