Define all blob types for text unicode decoding
Here's the code to illustrate the issue (python 2.7, mysql 5.5.32):
import MySQLdb
connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
cursor = connection.cursor()
cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")
print cursor.fetchone() # (u'abcd\u0451', 'abcd\xd1\x91')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)
As you can see, b column is returned as a byte string instead of unicode, regardless of the fact that FLAG.BINARY is not set. Unicode decoding works fine for FIELD_TYPE.VAR_STRING (253) and FIELD_TYPE.BLOB (252), but it doesn't for FIELD_TYPE.LONG_BLOB (251), which is returned by ExtractValue.
Here's the workaround.
import MySQLdb
import MySQLdb.converters as conv
import MySQLdb.constants as const
connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
connection.converter[const.FIELD_TYPE.LONG_BLOB] = connection.converter[const.FIELD_TYPE.BLOB]
cursor = connection.cursor()
cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")
print cursor.fetchone() # (u'abcd\u0451', u'abcd\u0451')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)
The workaround also shows that current value converter design needs improvement. Unicode decoders are set in connection constructor in contrast to most that are set in MySQLdb.converters. _get_string_decoder is also defined in constructor. So it's impossible to use conv constructor argument to pass extended decoder dict, and the only way is to patch instances individually.