NeoCSV icon indicating copy to clipboard operation
NeoCSV copied to clipboard

NeoCSVReader: Add more context to error messages

Open JoachimTuchel opened this issue 5 years ago • 3 comments

I wonder if it would make sense to add a bit more context to error messages by counting the columns and maybe even lines that are parsed by NeoCSVReader.

I mean errors like this: "Parse error on column 5 of row 7: UndefinedObject doesNotUnderstand asNumber" "Parse error on column #custNo in row 15: No customer found with number 15"

In order to sse if this is feasible, I took a look at #readNextRecordAsObject and I think something along these lines would at least help with keeping track of the column number (the name of the field accessor is not available here, but maybe a column count is enough to help analyse a problem). Here is a quick example as a start:

readNextRecordAsObject

	| object conversionBlock idx |

	conversionBlock := [:each :obj :input :fieldIndex |
		[each value: obj value: input]
			on: Error
			do: [:err |
				Error signal: (
					'Error parsing column %1: %2' bindWith: fieldIndex asString with: err description)]].

	object := recordClass new.
	idx := 1.
	fieldAccessors
		do: [:each | | rawValue |
			rawValue := self readField.
			idx := idx + 1.
			"nil field accessors are used to ignore fields"
			each ifNotNil: [
				rawValue = emptyFieldValue
					ifTrue: [
						emptyFieldValue = #passNil
							ifTrue: [conversionBlock value: each value: object value: nil value: idx]]
					ifFalse: [conversionBlock value: each value: object value: rawValue value: idx]]]
		separatedBy: [self handleSeparator].
	self handleEndOfRecord.
	^object

This is an implementation from VA Smalltalk, so please bear with me if #bindWith:with: and #description are not what you'd use in Pharo. I guess you get the picture anyways.

The idea is that I simply count the columns (there is no doWithIndex:separatedBy: , so I count by hand) and simply augment any errors thrown in the converter block with this info...

I know a 4-parameter Block is not exactly elegant code, but it's the STTCPW.

Keeping track of the parsed line is not so easy, however. But this is just an idea, maybe people don't think this is useful at all.

JoachimTuchel avatar Jan 06 '21 10:01 JoachimTuchel

I forgot that the error should also include the input that couldn't be converted, like in

Error signal: ('Error parsing input "%3" in column #%1: %2' bindWith: fieldIndex asString with: err description with: input)

This way it is much easier to search in the CSV and maybe even see what might be wrong even without having the file at hand. Imagine a message like

"Error parsing input 'NaN" in column #3: Reading a number failed: a digit between 0 and 9 expected"

This might help see immediately what's wrong in many cases when you have to support end users...

Comments?

JoachimTuchel avatar Jan 06 '21 10:01 JoachimTuchel

Yes that makes sense, but it is a bit more work (keeping track of the line and field numbers).

We'll put this on the todo list then.

svenvc avatar Jan 06 '21 11:01 svenvc

I am glad you like the idea.

I'll use my suggested fix here in our application for a while. It does keep track of the column number at least, so that error messages from customers are a bit easier to handle.

I need to move to something else for now, but I'll also take a look at the line number stuff... The code I suggest currently only works for reading as objects, but can be adapted to NeoCSVReader>>#readNextRecordWithFieldAccessors: the same way (which indicates a method with 4 parameters is probably better than a Block), imo.

JoachimTuchel avatar Jan 06 '21 12:01 JoachimTuchel