Incorrect handling of the unicode queries
When searching like this: Body.search.query(u'привет')
There're always zero results, while command-line search returns hundreds. This is due to double (or even triple) encoding in utf-8 done somewhere in the guts of django-sphinx/sphinxapi.
There're instances of pointless code like unicode(string).encode('utf-8'). The problem is that if string is already a unicode object, this code will create a unicode object containing its utf-8 representation and encode it using utf-8 again thus creating garbage. I've fixed this place in code but the string is sill double-encoded somewhere. :(
This code is pointless anyway because even if it would work - it would be a noop - take a bytestring, convert to unicode, convert to bytestring again. But instead of a useless noop it makes garbage of unicode input.
This patch somewhat mitigate problem by allowing to search using utf-8 strings:
--- models.py.orig 2010-09-11 17:14:01.000000000 +0400
+++ models.py 2010-09-11 17:32:18.000000000 +0400
@@ -289,7 +289,9 @@
return self._clone(**kwargs)
def query(self, string):
- return self._clone(_query=unicode(string).encode('utf-8'))
+ if isinstance(string, unicode):
+ string = string.encode('utf-8')
+ return self._clone(_query=string)
def group_by(self, attribute, func, groupsort='@group desc'):
return self._clone(_groupby=attribute, _groupfunc=func, _groupsort=groupsort)