Displaying 1 result from an estimated 1 matches for "uppercase_lett".
Did you mean:
uppercase_letter
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...e "Mitsubishi UFJ Factors Limited" bank.
Using word segmentation in Xapian 1.5 this causes the following terms to get indexed:
?????
??
????
???
Note that last term, which starts with FULLWIDTH LATIN CAPITAL LETTER U' (U+FF35). Xapian's Unicode library correctly assigns this the UPPERCASE_LETTER category and indexes this verbatim.
However, querying for ??? produces the query Query(???@1). That is, it queries for the lowercase form which seems to be the result of unconditional lower-casing at https://github.com/xapian/xapian/blob/master/xapian-core/queryparser/queryparser.lemony#L1459. A...