search for: is_unbroken_script

Displaying 3 results from an estimated 3 matches for "is_unbroken_script".

2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...ode - I tried this and it fixes your case at least: diff --git a/xapian-core/queryparser/word-breaker.cc b/xapian-core/queryparser/word-breaker.cc index 8108523ccd53..4fabc23f4b56 100644 --- a/xapian-core/queryparser/word-breaker.cc +++ b/xapian-core/queryparser/word-breaker.cc @@ -103,7 +103,7 @@ is_unbroken_script(unsigned p) // FE30..FE4F; CJK Compatibility Forms 0xFE30 - 1, 0xFE4F, // FF00..FFEF; Halfwidth and Fullwidth Forms - 0xFF00 - 1, 0xFFEF, + //0xFF00 - 1, 0xFFEF, // 1AFF0..1AFFF; Kana Extended-B // 1B000..1B0FF; Kana Supplement // 1B100..1B12F; Kana Extended-A If we're fixing it th...
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
...'s a couple of unit tests that check for this. diff --git a/xapian-core/queryparser/word-breaker.cc b/xapian-core/queryparser/word-breaker.cc index 8108523ccd53..6122dcdccc97 100644 --- a/xapian-core/queryparser/word-breaker.cc +++ b/xapian-core/queryparser/word-breaker.cc @@ -102,8 +102,10 @@ is_unbroken_script(unsigned p) 0xF900 - 1, 0xFAFF, // FE30..FE4F; CJK Compatibility Forms 0xFE30 - 1, 0xFE4F, - // FF00..FFEF; Halfwidth and Fullwidth Forms - 0xFF00 - 1, 0xFFEF, + // FF00..FF60: Fullwidth Numbers, Latin Characters, Punctuation + // FF61..FF64: Halfwidt...
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf). Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's OK to post my analysis here first,