search for: codepoint_is_cjk

Displaying 2 results from an estimated 2 matches for "codepoint_is_cjk".

2016 Sep 19
2
Pull requests: CJK words and Snippet generator
...though. > The main issue is that new codepoints get added (and the odd one changes > category) in each new Unicode version, so if you're using different > Unicode versions at index time and at search time, the terms you get > won't match each other. [...] If Xapian's CJK::codepoint_is_cjk() and ICU have different ideas of > what's in CJK, the results might be odd, and will likely vary depending > on the exact combination of Unicode versions ICU currently only word-breaks text that `codepoint_is_cjk` before identified as CJK text, there shouldn't be a gap between searc...
2016 Sep 07
2
Pull requests: CJK words and Snippet generator
On Tue, Sep 6, 2016, at 09:16, Olly Betts wrote: > I think my main concerns are about efficiency (since that a major > motivation for the current implementation, so slowing it down would be > annoying), and whether we can just make this the standard behaviour > rather than adding an option. The current implementation is O(n) and I took care to keep it at that. For the proposed term