thr3ads.net - dovecot - [Dovecot] Japanese Search [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Kazuo Moriwaka

2004-Dec-20 02:04 UTC

[Dovecot] Japanese Search

Hello,

I'm trying to debug search command with Japanese 'ISO-2022-JP'
codec.

ISO-2022-JP codec needs to keep character cases until convert to
UTF-8.

For example:
 * '\033[$B%s\033[(B' means a character sounds 'N' 
 * '\033[$B%S\033[(B' means a character sounds 'Bi' 

It causes a trouble in searches with Japanese.
I found imap/imap-search.c/add_new() makes strings uppercase 
and fixed. But search results didn't affected.. ;-)

I hope I can fix this problem in a few days. 
But I'm not clear about dovecot's source. 
If you have hints, please tell me.

thanks,
-- 
Kazuo Moriwaka 
moriwaka at valinux.co.jp

Timo Sirainen

2004-Dec-20 04:54 UTC

head link

[Dovecot] Japanese Search

On 20.12.2004, at 04:04, Kazuo Moriwaka wrote:
> I'm trying to debug search command with Japanese 'ISO-2022-JP'
codec.
>
> ISO-2022-JP codec needs to keep character cases until convert to
> UTF-8.
>
> For example:
>  * '\033[$B%s\033[(B' means a character sounds 'N'
>  * '\033[$B%S\033[(B' means a character sounds 'Bi'
>
> It causes a trouble in searches with Japanese.
> I found imap/imap-search.c/add_new() makes strings uppercase
> and fixed. But search results didn't affected.. ;-)
I'll fix add_new(), but I'm not sure what else could be there.. That 
value gets passed as key parameter to message_body_search() and 
message_header_search_init(). Those call charset_to_ucase_utf8_string() 
to get an uppercase utf-8 string from it which is then compared to text 
found in messages.

Did you check if the ISO-2022-JP text is converted correctly to UTF-8 
at all? Looking at charset_to_ucase_utf8() in 
lib-charset/charset-iconv.c might show something.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20041220/332f688d/attachment-0001.bin>

Kazuo Moriwaka

2004-Dec-20 07:08 UTC

head link

[Dovecot] Japanese Search

Hello,

From: Timo Sirainen <tss at iki.fi>
Subject: Re: [Dovecot] Japanese Search
Date: Mon, 20 Dec 2004 06:54:15 +0200
> On 20.12.2004, at 04:04, Kazuo Moriwaka wrote:
> 
> > I'm trying to debug search command with Japanese
'ISO-2022-JP' codec.
> >
> > ISO-2022-JP codec needs to keep character cases until convert to
> > UTF-8.
> >
> > For example:
> >  * '\033[$B%s\033[(B' means a character sounds 'N'
> >  * '\033[$B%S\033[(B' means a character sounds 'Bi'
> >
> > It causes a trouble in searches with Japanese.
> > I found imap/imap-search.c/add_new() makes strings uppercase
> > and fixed. But search results didn't affected.. ;-)
> 
> I'll fix add_new(), but I'm not sure what else could be there..
That
> value gets passed as key parameter to message_body_search() and 
> message_header_search_init(). Those call charset_to_ucase_utf8_string() 
> to get an uppercase utf-8 string from it which is then compared to text 
> found in messages.
> 
> Did you check if the ISO-2022-JP text is converted correctly to UTF-8 
> at all? Looking at charset_to_ucase_utf8() in 
> lib-charset/charset-iconv.c might show something.
Thank you for your fix.
And I'm sorry for my mistake. 

This problem is already fixed by add_new() fix. 
But I couldn't notice it because I was mistaken in the binary files.

Now, I can search headers with Japanese strings :-)
# I test some Subject and From headers.
But body of messages can't. I'll check it out.

thanks,
-- 
Kazuo Moriwaka 
moriwaka at valinux.co.jp

Possibly Parallel Threads

Search for more maybe matching threads

dovecot - Dec 2004 - Japanese Search

[Dovecot] Japanese Search

[Dovecot] Japanese Search

[Dovecot] Japanese Search

Possibly Parallel Threads