CodableCSV icon indicating copy to clipboard operation
CodableCSV copied to clipboard

Add support for field delimiter detection

Open PoshAlpaca opened this issue 3 years ago • 2 comments

Is your feature request related to a problem?

When using CodableCSV to load user-provided CSV files, one currently needs to ask the user which field delimiter is used in their file.

Describe the solution you'd like

It would be nice if CodableCSV had an option to automatically infer the field delimiter from the provided file.

I saw that this feature is on the roadmap, along with row delimiter detection and header detection. There are also already some references to it in the code, with the idea to use auto-detection when the field delimiter is set to nil in the reader's configuration.

I'd be happy to contribute this feature. My idea was to port the dialect detection code from the CleverCSV Python library to Swift.

Describe alternatives you've considered

An alternative would be to use the library directly, however that would introduce a dependency to the project, and, more importantly, I'm not quite sure how good Swift's support is for calling Python code. I guess it wouldn't work on iOS, for example?

@dehesa what do you think?

PoshAlpaca avatar Feb 17 '22 18:02 PoshAlpaca

Hey @PoshAlpaca,

Delimiter inference is indeed something I always wanted to do and plan for, but never really got into doing it. To be honest, I haven't even begin to think how to approach the problem. So, if you want to research it and come up with a solution, I will be more than happy to review it.

The inference code is supposed to live here. You would probably want to expand the switch statement to indicate which delimiter has the user input; i.e.:

  • you know the field delimiter, but not the row delimiter, or
  • you know the row delimiter, but not the field delimiter, or
  • you know neither.

I don't want to add dependencies to the project, so if you want to take this over, I would ask you to write Swift code directly.

dehesa avatar Feb 17 '22 23:02 dehesa