Displaying 2 results from an estimated 2 matches for "codepoint_is_cjk".
2016 Sep 19
2
Pull requests: CJK words and Snippet generator
...though.
> The main issue is that new codepoints get added (and the odd one changes
> category) in each new Unicode version, so if you're using different
> Unicode versions at index time and at search time, the terms you get
> won't match each other. [...] If Xapian's CJK::codepoint_is_cjk() and ICU have different ideas of
> what's in CJK, the results might be odd, and will likely vary depending
> on the exact combination of Unicode versions
ICU currently only word-breaks text that `codepoint_is_cjk` before
identified as CJK text, there shouldn't be a gap between searc...
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote:
> I think my main concerns are about efficiency (since that a major
> motivation for the current implementation, so slowing it down would be
> annoying), and whether we can just make this the standard behaviour
> rather than adding an option.
The current implementation is O(n) and I took care to keep it at that.
For the proposed term