Cannot handle quoted line breaks
tass does not seem to handle line breaks between quotes. It gives this error:
Error: CSV error: record 0 (line: 289, byte: 64591): found record with 1 fields, but the previous record has 30 fields
CSV error: record 0 (line: 289, byte: 64591): found record with 1 fields, but the previous record has 30 fields
While I do think this is bad form to have raw line breaks in a CSV file, it does happen e.g. in contact list exports from MailChimp.
I'm considering this an error as xsv and csview handle the same file without error.
Yeah... I suppose I agree: this is part of the CSV standard, and tass should support it.
(Aside: It's really quite an unfortunate misfeature, isn't it? Random access in a CSV file would be very simple otherwise: just jump to a location and scan forward and back for the surrounding newline characters. But with the possibility of non-row-delimiting newlines, it means you can't safely interpret anything without reading all the way from start of the file. This is not a problem for tass though, as we build a complete index of all rows anyway, starting at the beginning of the file.)
Ok, there are a couple of sub-issues here:
- [ ] The indexer needs to be a bit more CSV-aware. Currently it simply looks for newlines and adds them all to the index. Instead, it should skip newlines which are inside quotes (which in turn means understanding escaped quotes). This is probably not too hard to do, but will slow the indexer down a lot. It's not as bad as it sounds though, since indexing doesn't block the UI.
- [ ] The rendering code needs to know how to display cells containing newlines. The current code assumes that every line in your terminal corresponds to a single CSV row, so this step might involve a lot of new code. Certainly not insurmountable though.
I agree with your aside, the problem here has to do with quoting which is a form of in-band signaling. The really sad thing is that the ASCII standard actually set aside 4 code points to ensure that this was never an issue: Text File formats – ASCII Delimited Text – Not CSV or TAB delimited text | Ronald Duncan's Blog
Also, I found csvlens which handles the file properly. It's interesting but I still find tass compelling.
I wonder if some of Rust's zero allocation parsers like nom or rcsv (for CSV parsing specifically) might be able to handle quoted line feeds while keep speed and memory use optimal.