Hello, I'm trying to debug search command with Japanese 'ISO-2022-JP' codec. ISO-2022-JP codec needs to keep character cases until convert to UTF-8. For example: * '\033[$B%s\033[(B' means a character sounds 'N' * '\033[$B%S\033[(B' means a character sounds 'Bi' It causes a trouble in searches with Japanese. I found imap/imap-search.c/add_new() makes strings uppercase and fixed. But search results didn't affected.. ;-) I hope I can fix this problem in a few days. But I'm not clear about dovecot's source. If you have hints, please tell me. thanks, -- Kazuo Moriwaka moriwaka at valinux.co.jp
On 20.12.2004, at 04:04, Kazuo Moriwaka wrote:> I'm trying to debug search command with Japanese 'ISO-2022-JP' codec. > > ISO-2022-JP codec needs to keep character cases until convert to > UTF-8. > > For example: > * '\033[$B%s\033[(B' means a character sounds 'N' > * '\033[$B%S\033[(B' means a character sounds 'Bi' > > It causes a trouble in searches with Japanese. > I found imap/imap-search.c/add_new() makes strings uppercase > and fixed. But search results didn't affected.. ;-)I'll fix add_new(), but I'm not sure what else could be there.. That value gets passed as key parameter to message_body_search() and message_header_search_init(). Those call charset_to_ucase_utf8_string() to get an uppercase utf-8 string from it which is then compared to text found in messages. Did you check if the ISO-2022-JP text is converted correctly to UTF-8 at all? Looking at charset_to_ucase_utf8() in lib-charset/charset-iconv.c might show something. -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20041220/332f688d/attachment-0001.bin>
Hello, From: Timo Sirainen <tss at iki.fi> Subject: Re: [Dovecot] Japanese Search Date: Mon, 20 Dec 2004 06:54:15 +0200> On 20.12.2004, at 04:04, Kazuo Moriwaka wrote: > > > I'm trying to debug search command with Japanese 'ISO-2022-JP' codec. > > > > ISO-2022-JP codec needs to keep character cases until convert to > > UTF-8. > > > > For example: > > * '\033[$B%s\033[(B' means a character sounds 'N' > > * '\033[$B%S\033[(B' means a character sounds 'Bi' > > > > It causes a trouble in searches with Japanese. > > I found imap/imap-search.c/add_new() makes strings uppercase > > and fixed. But search results didn't affected.. ;-) > > I'll fix add_new(), but I'm not sure what else could be there.. That > value gets passed as key parameter to message_body_search() and > message_header_search_init(). Those call charset_to_ucase_utf8_string() > to get an uppercase utf-8 string from it which is then compared to text > found in messages. > > Did you check if the ISO-2022-JP text is converted correctly to UTF-8 > at all? Looking at charset_to_ucase_utf8() in > lib-charset/charset-iconv.c might show something.Thank you for your fix. And I'm sorry for my mistake. This problem is already fixed by add_new() fix. But I couldn't notice it because I was mistaken in the binary files. Now, I can search headers with Japanese strings :-) # I test some Subject and From headers. But body of messages can't. I'll check it out. thanks, -- Kazuo Moriwaka moriwaka at valinux.co.jp