Greg Banks
2013-Mar-13 05:48 UTC
[Xapian-devel] patch - Some CJK codepoints are also punctuation
-- Greg. -------------- next part -------------- A non-text attachment was scrubbed... Name: xapian-some-cjk-codepoints-are-also-punctuation.patch Type: text/x-patch Size: 1499 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130313/4da8b0f9/attachment.bin>
Olly Betts
2013-Mar-16 00:43 UTC
[Xapian-devel] patch - Some CJK codepoints are also punctuation
This seems a sensible change, but it really needs some test coverage. Do you have any examples of CJK text with punctuation where this change makes a difference? Cheers, Olly
Greg Banks
2013-Mar-16 03:12 UTC
[Xapian-devel] patch - Some CJK codepoints are also punctuation
On 16/03/2013, at 11:43, Olly Betts <olly at survex.com> wrote:> This seems a sensible change, but it really needs some test coverage. > > Do you have any examples of CJK text with punctuation where this change > makes a difference?There should be one in the unit tests that are added in the next patch that I sent (probably still awaiting moderation because it is larger than 40K). Basically there was some Chinese text which had a "full width exclamation mark" character in it. Greg.
Reasonably Related Threads
- Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
- Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
- Pull requests: CJK words and Snippet generator
- Pull requests: CJK words and Snippet generator
- Pull requests: CJK words and Snippet generator