UnicodeDecodeError: 'charmap' codec can't decode byte 0x81
I'm trying to read a database produced by an ancient version of the ACDsee photo manager program (don't ask). When I try to read it simply as:
table = DBF('asset.dbf')
for record in table:
print(record)
I get ValueError: Unknown field type: '7'.
I followed the advice in another issue and created a field parser as:
class TestFieldParser(FieldParser):
def parse7(self, field, data):
return data
table = DBF('asset.dbf', parserclass=TestFieldParser)
for record in table:
print(record)
This produces the stack trace below. Googling for the error suggests that maybe the file is being read with the wrong encoding. Is there an easy way to try reading e.g. with UTF-8?
Traceback (most recent call last):
File "/mnt/acdsee/ACDsee/./dumpdb.py", line 10, in <module>
for record in table:
File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/dbf.py", line 314, in _iter_records
items = [(field.name,
File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/dbf.py", line 315, in <listcomp>
parse(field, read(field.length))) \
File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 79, in parse
return func(field, data)
File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 87, in parseC
return self.decode_text(data.rstrip(b'\0 '))
File "/mnt/acdsee/ACDsee/venv/lib/python3.9/site-packages/dbfread/field_parser.py", line 45, in decode_text
return decode_text(text, self.encoding, errors=self.char_decode_errors)
File "/home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.9/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 35: character maps to <undefined>
I'm not sure if I have an answer for you but have you tried specifying the encoding?
If you know what encoding you have, you might be able to get it to work this way. The DBF() constructor's optional second argument is for an encoding:
table = DBF('asset.dbf', 'UTF-8')
for record in table:
print(record)
table = DBF('asset.dbf', 'Latin-1')
for record in table:
print(record)
Another option is to set the char_decode_errors handler. The argument defaults to 'strict' when unspecified.
So this...
table = DBF('asset.dbf')
Is the same as...
table = DBF('asset.dbf', char_decode_errors='strict')
But you could relax this requirement by specifying a more forgiving error handler (see Python's Error Handlers docs for more options):
table = DBF('asset.dbf', char_decode_errors='replace')
You might settle on some combination of the two... defining an expected encoding and loosening the error handling behavior:
table = DBF('asset.dbf', 'Latin-1', char_decode_errors='replace')
for record in table:
print(record)