csv-validator icon indicating copy to clipboard operation
csv-validator copied to clipboard

Bounded error collection

Open marksteele opened this issue 7 years ago • 2 comments

Would it be possible to implement an error collector that short circuits after a configurable number of errors (instead of the all-or-nothing approach). Ex: fail after 100 errors (and return the error messages).

This is useful when validating large files which might have lots of errors and avoids OOM issues.

Alternatively, it would be nice to collect the first N errors, then possibly statistics on the total number of errors (eg: found 500 validation errors on column A, 355 on column b, etc...)

marksteele avatar Jan 07 '19 18:01 marksteele

It would seem like a good idea in the Java API to replace both the resultant List<FailMessage> and ProgressCallback with a single callback mechanism, which is notified on each validation, and can then either return a flag or throw an exception to indicate that validation should stop.

I don't have any time to implement this personally, but if someone is interested, I could suggest a design...

adamretter avatar Jan 08 '19 04:01 adamretter

Happy to hear a design and we'll add it to our backlog.

alexgreenDP avatar Jan 08 '19 16:01 alexgreenDP