bayeslite
bayeslite copied to clipboard
Incomprehensible UnicodeDecodeError
Download the file: t1.csv, where the offending character is in the last column of the last line.
probcomp-1:/scratch/fsaad/sandbox/preproc% cat t1.csv
tag,version,custom,abstract,datatype,iord,crdr,tlabel
RedeemableCommonStockMember,0001654954-17-000551,1,0,member,D,,Redeemable Common Stock
RedeemableCommonStockValue,0001654954-17-000551,1,0,monetary,I,C,"Common stock subject to possible redemption, at $200,004; 38,364 shares issued and outstanding at redemption value as of October 31, 2016, none as of October 31, 2015"
SupplementalDisclosureOfNoncashInvestingAndFinancingActivitiesAbstract,0001654954-17-000551,1,1,,,,SUPPLEMENTAL DISCLOSURE OF NON�_CASH INVESTING AND FINANCING ACTIVITIES:
Loading the data in bayeslite gives:
probcomp-1:/scratch/fsaad/sandbox/preproc% python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bayeslite
>>> bdb = bayeslite.bayesdb_open(':memory:')
>>> bdb.execute('CREATE TABLE t FROM \'t1.csv\'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 228, in execute
self.tracer, self._do_execute, string, bindings)
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 236, in _maybe_trace
return meth(string, bindings)
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 277, in _do_execute
cursor = bql.execute_phrase(self, phrase, bindings)
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bql.py", line 113, in execute_phrase
bdb, phrase.name, phrase.csv, header=True, create=True)
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/read_csv.py", line 37, in bayesdb_read_csv_file
ifnotexists=ifnotexists)
File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/read_csv.py", line 121, in bayesdb_read_csv
bdb.sql_execute(sql, [unicode(v, 'utf8').strip() for v in row])
File "/scratch/fsaad/.pyenv2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8c in position 30: invalid start byte
>>>