Displaying 2 results from an estimated 2 matches for "1b100".
Did you mean:
1,100
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...r/word-breaker.cc
@@ -103,7 +103,7 @@ is_unbroken_script(unsigned p)
// FE30..FE4F; CJK Compatibility Forms
0xFE30 - 1, 0xFE4F,
// FF00..FFEF; Halfwidth and Fullwidth Forms
- 0xFF00 - 1, 0xFFEF,
+ //0xFF00 - 1, 0xFFEF,
// 1AFF0..1AFFF; Kana Extended-B
// 1B000..1B0FF; Kana Supplement
// 1B100..1B12F; Kana Extended-A
If we're fixing it this way we should check this list for other
instances of this (and doing this would probably reveal if that
assumption is valid).
Cheers,
Olly
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf).
Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's OK to post my analysis here first,