Displaying 5 results from an estimated 5 matches for "flag_word_break".
Did you mean:
flag_word_breaks
2024 Jan 04
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
I think I found a bug in Xapian 1.5 when using FLAG_WORD_BREAKS for input that contains characters in Unicode Halfwidth and Fullwidth Forms (https://unicode.org/charts/PDF/UFF00.pdf).
Since I am undecided yet if and how to fix this in Xapian I haven't come up with a pull request. Because trac currently is offline, I could not file a bug. I hope it's O...
2024 Jan 09
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Mon, Jan 08, 2024 at 02:01:46PM +0100, Robert Stepanek wrote:
> Removing the whole block will cause word-breaker to not correctly
> handle halfwidth Katakana, such as "??????????" which it would treat
> as a single term, whereas it should be two: ??????and ????).
>
> My pull request causes word-breaker to only handle halfwidth Katakana
> and Hangul codepoints as
2024 Jan 08
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Sun, Jan 7, 2024, at 7:45 PM, Olly Betts wrote:
> I've restarted trac.
I now created a pull request: https://github.com/xapian/xapian/pull/329 Should I create a trac issue, too?
> Assuming the latter is valid, just removing this block (or removing the
> parts of it which are Lu or Ll) should fix the problem as then
> tokenisation will switch mode - I tried this and it fixes
2024 Jan 10
2
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Tue, Jan 9, 2024, at 3:28 AM, Olly Betts wrote:
> Thanks, that looks good - now merged.
Thanks!
> Did you already check the other ranges for cased letters? I can but if
> you have already there's not much point.
I did not. If you find time, that'd be great. Otherwise I can make room for it in the next days.
> > The fullwidth "????? ??????" tests suggests to
2024 Jan 07
1
Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
On Thu, Jan 04, 2024 at 05:50:22PM +0100, Robert Stepanek wrote:
> Since I am undecided yet if and how to fix this in Xapian I haven't
> come up with a pull request. Because trac currently is offline, I
> could not file a bug. I hope it's OK to post my analysis here first,
> I'll be happy to follow up reporting that bug proper later (should we
> conclude that it actually