dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

NPE in FrameColumnImpl.schema property

Open koperagen opened this issue 1 year ago • 3 comments

NPE happens in this line.

values.mapNotNull { it.takeIf { it.nrow > 0 }?.schema() }.intersectSchemas()

Code that leads to it: image

I briefly looked into csv.kt and found that only tryParseImpl method could potentially create FrameColumn and provide null there. Need to confirm that it's possible. Another thing that could cause problem is read method itself that actually tries to parse the file as JSON, CSV, TSV, Excel and others until it succeeds. So, if that file cannot be parsed as CSV, it continues and can produce strange result too

Full stack trace below

The problem is found in one of the loaded libraries: check library converters (fields callbacks)
java.lang.NullPointerException: Parameter specified as non-null is null: method org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow, parameter <this>
org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library converters (fields callbacks)
	at org.jetbrains.kotlinx.jupyter.exceptions.CompositeReplExceptionKt.throwLibraryException(CompositeReplException.kt:50)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:68)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:94)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:93)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll(logging.kt:42)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll$default(logging.kt:41)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:93)
	at org.jetbrains.kotlinx.jupyter.repl.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:14)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:500)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl$evalEx$1.invoke(repl.kt:478)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.withEvalContext(repl.kt:441)
	at org.jetbrains.kotlinx.jupyter.ReplForJupyterImpl.evalEx(repl.kt:478)
	at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:320)
	at org.jetbrains.kotlinx.jupyter.messaging.ProtocolKt$shellMessagesHandler$2$res$1.invoke(protocol.kt:314)
	at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:38)
	at org.jetbrains.kotlinx.jupyter.JupyterExecutorImpl$runExecution$execThread$1.invoke(execution.kt:33)
	at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
Caused by: java.lang.NullPointerException: Parameter specified as non-null is null: method org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow, parameter <this>
	at org.jetbrains.kotlinx.dataframe.DataFrameKt.getNrow(DataFrame.kt)
	at org.jetbrains.kotlinx.dataframe.impl.columns.FrameColumnImpl$schema$1.invoke(FrameColumnImpl.kt:43)
	at org.jetbrains.kotlinx.dataframe.impl.columns.FrameColumnImpl$schema$1.invoke(FrameColumnImpl.kt:42)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at org.jetbrains.kotlinx.dataframe.impl.schema.UtilsKt.extractSchema(Utils.kt:92)
	at org.jetbrains.kotlinx.dataframe.impl.schema.UtilsKt.extractSchema(Utils.kt:26)
	at org.jetbrains.kotlinx.dataframe.api.SchemaKt.schema(schema.kt:17)
	at org.jetbrains.kotlinx.dataframe.impl.codeGen.ReplCodeGeneratorImpl.process(ReplCodeGeneratorImpl.kt:50)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.updateAnyFrameVariable(Integration.kt:132)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.access$updateAnyFrameVariable(Integration.kt:73)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:295)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:290)
	at org.jetbrains.kotlinx.jupyter.api.libraries.FieldHandlerFactory.createUpdateExecution$lambda$0(FieldHandlerFactory.kt:38)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.executeEx(FieldsProcessorImpl.kt:88)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.access$executeEx(FieldsProcessorImpl.kt:1)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:47)
	... 15 more

koperagen avatar Feb 15 '24 20:02 koperagen

So indeed some JSON value in the cell + null value in other causes an issue in CSV reading

koperagen avatar Feb 20 '24 10:02 koperagen

val df2 = DataFrame.readDelimStr("""name
"[""str""]"
null
""")

koperagen avatar Feb 20 '24 14:02 koperagen

What is the expected result? A FrameColumn cannot contain nulls, right?

Should we:

  • throw exception, because FrameColumn cannot contain null
  • convert null to empty dataframe
  • Don't parse the value as JSON, but keep it a String

Jolanrensen avatar Oct 16 '24 17:10 Jolanrensen