Unable to convert string to the requested encoding when reading sav files with long strings
Hi,
When reading a sav file that contains a long string (756 characters to be precise, with 755 the error does not show up) with an international character, Readstat gives the error:
Unable to convert string to the requested encoding (invalid byte sequence)
Attached an example save file. The sav file was produced with pyreadstat.
thanks in advance!
original report: https://github.com/Roche/pyreadstat/issues/128
note: initially I reported the error was on writing, it is on reading!
also attached a csv version of the file
another observation is that a very similar file with only one character of difference (first variable name "aaaaa3" instead of "aaaaa2") does not raise the error, attached example file. eg3.sav.zip
Are UTF-8 strings being provided to the writer?
Yes
as mentioned in #260, it is possible to reproduce this error without any international character, (using only 'a's in this example) if the length of the string is at least 757. Another important thing to reproduce this is that the numerical values must be NANs. See #260 for C code to reproduce the issue.