MySQLdb1 icon indicating copy to clipboard operation
MySQLdb1 copied to clipboard

Define all blob types for text unicode decoding

Open saaj opened this issue 11 years ago • 0 comments

Here's the code to illustrate the issue (python 2.7, mysql 5.5.32):

import MySQLdb

connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
cursor     = connection.cursor()

cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")

print cursor.fetchone() # (u'abcd\u0451', 'abcd\xd1\x91')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)

As you can see, b column is returned as a byte string instead of unicode, regardless of the fact that FLAG.BINARY is not set. Unicode decoding works fine for FIELD_TYPE.VAR_STRING (253) and FIELD_TYPE.BLOB (252), but it doesn't for FIELD_TYPE.LONG_BLOB (251), which is returned by ExtractValue.

Here's the workaround.

import MySQLdb
import MySQLdb.converters as conv
import MySQLdb.constants as const

connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
connection.converter[const.FIELD_TYPE.LONG_BLOB] = connection.converter[const.FIELD_TYPE.BLOB]
cursor = connection.cursor()

cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")

print cursor.fetchone() # (u'abcd\u0451', u'abcd\u0451')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)

The workaround also shows that current value converter design needs improvement. Unicode decoders are set in connection constructor in contrast to most that are set in MySQLdb.converters. _get_string_decoder is also defined in constructor. So it's impossible to use conv constructor argument to pass extended decoder dict, and the only way is to patch instances individually.

saaj avatar Nov 28 '14 09:11 saaj