Hi all,
I am rather new to xapian, I just recently tried to include it in
my application, so bear with me if this has already been discussed.
I was playing with QueryParser and noticed that it expects
input to be in ISO8859_1 encoding - characters above 0x80 are
transliterated, and are not considered letters. For example,
using single word (in utf-8 encoding) "bo?e" as input for
parse_query, the resulting query is something like:
Xapian::Query((boaa:(pos=1) OR e:(pos=2)))
which makes the parse_query quite unusable for UTF-8 strings (or
indeed, for any encoding other than ISO8859_1).
I tried to disable the transliteration in
accentnormalisingitor.h and modified common/utils.h to contain:
inline bool C_isalpha(char ch) {
using namespace Xapian::Internal;
return (static_cast<unsigned char>(ch)>=0x80) ||
(is_tab[static_cast<unsigned char>(ch)] & (IS_UPPER|IS_LOWER));
}
inline bool C_isalnum(char ch) {
using namespace Xapian::Internal;
return (static_cast<unsigned char>(ch)>=0x80) ||
(is_tab[static_cast<unsigned char>(ch)] &
(IS_UPPER|IS_LOWER|IS_DIGIT));
}
since most of the characters above 0x80 are meant as letters, only with
very few exceptions (non breaking spaces and punctuation, and
people generally do not write queries using these characters).
Of course, the same effect can be achieved by modifying is_tab.
Now queries in my application work as expected :-)
I would suggest to make transliteration optional (or if not, remove it,
since it makes more harm than benefit), and to consider
all the chars above 0x80 to be letters (at least there is no
better solution unless full Unicode support is implemented, and THAT is
probably not worth the effort)
What do you say?
--
-----------------------------------------------------------
| Radovan Garab?k http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me
spread!