simplecsv icon indicating copy to clipboard operation
simplecsv copied to clipboard

Wrong delimiter detection under certain conditions

Open rusproject opened this issue 2 years ago • 1 comments

Conditions under which the problem occurs:

  1. The CSV being parsed has the ; delimiter.
  2. The $delimiter parameter is not specified when calling the parse() or import() methods (i.e., it remains 'auto').
  3. The CSV contains a cell with the following character sequence: ", like this:

image Actual CSV contents:

"cell 1";"cell 2"
"cell 3";"cell ""4"", causing problem"

In these circumstances, the delimiter is auto-detected as ,, although the actual delimiter is ;. This leads to incorrect further CSV parsing, resulting in an array that looks like this:

array (
  0 => 
  array (
    0 => 'cell 1;cell 2',
  ),
  1 => 
  array (
    0 => 'cell 3;cell "4", causing problem',
  ),
)

image Instead of this:

array (
  0 => 
  array (
    0 => 'cell 1',
    1 => 'cell 2',
  ),
  1 => 
  array (
    0 => 'cell 3',
    1 => 'cell "4" NO problem',
  ),
)

image


As far as I understand, the problem is that the following 'if' condition doesn't cover the case where the file contains $this->_enclosure . ',' (i.e. ", in this case) as an actual cell content (which is escaped as $this->_enclosure . $this->_enclosure . ',' ("", in this case)):

// detect delimiter
if ( strpos($this->_csv, $this->_enclosure . ',' ) !== false ) {
  $this->_delimiter = ',';
} // else ...

So I made a quick fix with an additional check whether the previous character is NOT the same as _enclosure:

// detect delimiter

// quick fix of wrong delimiter detection in some files with doublequotes (`"",` case)
$pos = strpos($this->_csv, $this->_enclosure . ',' );
$prev_char = substr($this->_csv, $pos - 1, 1);

if ( $pos !== false && $prev_char !== $this->_enclosure) {
  $this->_delimiter = ',';
} // else ...

It works for me and solves this one particular case, but it doesn't cover other combinations of enclosures/delimiters inside cells, neither check for some edge cases like empty enclosed cells ("","cell 2"). Please consider implementing the actual fix in your class.

rusproject avatar Aug 26 '23 10:08 rusproject

fixed in 1.0 https://github.com/shuchkin/simplecsv

shuchkin avatar Aug 26 '23 20:08 shuchkin