data-formatter icon indicating copy to clipboard operation
data-formatter copied to clipboard

long-term nlp column cleanup

Open ClimbsRocks opened this issue 9 years ago • 0 comments

it's kind of icky manual work, but:

we'd have to do this right at the start, after reading in the dataDescription rows to figure out that we have an nlp column, but before we do anything else.

we could go through the whole raw document.

for each row, ignore the number of commas up to the nlp column, and then the correct number of commas after the nlp column to the end of the row.

then concat everything else in there together. then remove all strings, quotes, newline characters, etc.

or, we could just find a proper csv parser that can handle things like unbalanced quotes with commas, etc.

ClimbsRocks avatar Mar 19 '16 20:03 ClimbsRocks