search for: flag_word_break

Displaying 5 results from an estimated 5 matches for "flag_word_break".

Did you mean: flag_word_breaks
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf). Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's O...
2024 Jan 09
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Mon, Jan 08, 2024 at 02:01:46PM +0100, Robert Stepanek wrote: > Removing the whole block will cause word-breaker to not correctly > handle halfwidth Katakana, such as "??????????" which it would treat > as a single term, whereas it should be two: ??????and ????). > > My pull request causes word-breaker to only handle halfwidth Katakana > and Hangul codepoints as
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote: > I've restarted trac. I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too? > Assuming the latter is valid, just removing this block (or removing the > parts of it which are Lu or Ll) should fix the problem as then > tokenisation will switch mode - I tried this and it fixes
2024 Jan 10
2
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Tue, Jan 9, 2024, at 3:28 AM, Olly Betts wrote: > Thanks, that looks good - now merged. Thanks! > Did you already check the other ranges for cased letters? I can but if > you have already there's not much point. I did not. If you find time, that'd be great. Otherwise I can make room for it in the next days. > > The fullwidth "????? ??????" tests suggests to
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Thu, Jan 04, 2024 at 05:50:22PM +0100, Robert Stepanek wrote: > Since I am undecided yet if and how to fix this in Xapian I haven't > come up with a pull request. Because trac currently is offline, I > could not file a bug. I hope it's OK to post my analysis here first, > I'll be happy to follow up reporting that bug proper later (should we > conclude that it actually