pgfutter icon indicating copy to clipboard operation
pgfutter copied to clipboard

Does not accepted escaped " in CSV

Open houshuang opened this issue 10 years ago • 3 comments

I don't know if this is part of the official CSV specification (if there is one), but it would be useful to handle escaped quotation marks. For example, pgfutter chokes on this line:

"cV7QpZd-EeSKzSIAC0cT7w@2","cV7QpZd-EeSKzSIAC0cT7w",2,5,"{\"typeName\":\"cml\",\"definition\":{\"dtdId\":\"assess/1\",\"value\":\"<co-content><text>For the gene At3g59490, retrieve the corresponding protein sequence from TAIR</text><text>(http://www.arabidopsis.org/tools/bulk/sequences/index.jsp). Remember to choose the correct dataset and output option.</text><text>Now, navigate to BLASTP at NCBI and paste your genes sequence into the “query sequence” box. Set the database to “non-redundant protein sequences (nr)”, keep all settings at default, and click BLAST.</text><text>Take note of the top match (ortholog) for each of the other species, for 20 different species excluding your query species. Which species’ gene is most closely related to your query gene?</text></co-content>\"}}",2015-02-17 22:14:27.187

however, when I remove all " with sed, it imports beautifully.

houshuang avatar Dec 12 '15 15:12 houshuang

Thats the Golang CSV reader which has a weird escaping rule compared to the rest of the world. https://golang.org/pkg/encoding/csv

"the ""word"" is true","a ""quoted-field""

results in

{`the "word" is true`, `a "quoted-field"`}

I will do more research whether it can be configured for the Go CSV reader to support custom escape characters.

lukasmartinelli avatar Dec 12 '15 16:12 lukasmartinelli

I wouldn't call Golang's CSV reader weird, that's just the weird CSV format. RFC 4180 says

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"

leofidus avatar Aug 22 '16 22:08 leofidus

I wouldn't call Golang's CSV reader weird, that's just the weird CSV format. RFC 4180 says

Okay that makes sense. But would be cool if it was configurable (perhaps it is and I just didn't found out).

lukasmartinelli avatar Aug 23 '16 07:08 lukasmartinelli