unstructured
unstructured copied to clipboard
fix: csv/tsv encoding
Testing Unstructured with CSV/TSV files, I have encountered problems when the file is not UTF-8 encoded. With this fix, it is intended to fix problems if the client passes the right encoding. Tested with ISO-8859-1 files and it works as expected :)
Crash error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 80: invalid continuation byte
🔍 Existing Issues For Review
Your pull request is modifying functions with the following pre-existing issues:
📄 File: unstructured/partition/csv.py
| Function | Unhandled Issue |
|---|---|
partition_csv |
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 54: invalid continuation byte ... Event Count: 3 |
Did you find this useful? React with a 👍 or 👎