danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

mismatch between ctypes an values types

Open spleshakov opened this issue 2 years ago • 2 comments

I have this code snippet

const danfojs = require("danfojs-node");

(async () => {
	let data = await danfojs.readCSV(
		`${__dirname}/plan.csv`,
		{
			dynamicTyping: {
				"variation": false
			}
		}
	)

	console.log(
		data.columns.map((item, index) => {
			return {
				[item]: data.ctypes.values[index]
			}
		})
	)
	console.log(
		data
			.column("variation")
			.unique()
			.values[0]
	)
	console.log(
		typeof data
			.column("variation")
			.unique()
			.values[0]
	)
})()

The output is

[
  ...
  { market_segment: 'string' },
  { variation: 'int32' },
  { '': 'int32' }
]
00
string

as you can see ctypes shows variation column has 'int32' value however, all values in the column are strings: '00'

My expectation it ctype should also be a string to indicate actual value in the column

plan.csv

spleshakov avatar May 31 '23 15:05 spleshakov

I'm also having this issue with my AVG and SUM in https://github.com/javascriptdata/danfojs/discussions/666

ZandercraftGames avatar Apr 28 '25 19:04 ZandercraftGames

I can confirm this bug comes from dynamicTyping: { "variation": false } being passed to PapaParse, so it correctly parse variation column type as string (thus typeof variation column -> string), however, DataFrame infers dtype from $typeChecker in shared/utils.ts, this method purely infers dtype from if (!isNaN(Number(ele))) -> numeric, and it only accepts data array itself as param, not the additional options passed to PapaParse, this results in wrongly inferred dtype.

However, inferDtype is called throughout the library so fixing this bug will result in significant internal API change. Not sure if this worth refactoring before we can discuss with the maintainer.

junduck avatar Nov 12 '25 17:11 junduck