unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

fix: csv/tsv encoding

Open jaluma opened this issue 1 year ago • 1 comments

Testing Unstructured with CSV/TSV files, I have encountered problems when the file is not UTF-8 encoded. With this fix, it is intended to fix problems if the client passes the right encoding. Tested with ISO-8859-1 files and it works as expected :)

Crash error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 80: invalid continuation byte

jaluma avatar Jul 09 '24 10:07 jaluma

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: unstructured/partition/csv.py

Function Unhandled Issue
partition_csv UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 54: invalid continuation byte ...
Event Count: 3

Did you find this useful? React with a 👍 or 👎

sentry[bot] avatar Jul 09 '24 10:07 sentry[bot]