thr3ads.net - dovecot - [Dovecot] Sieve and locale (Japanese) [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Jorgen Lundman

2009-Sep-11 02:18 UTC

[Dovecot] Sieve and locale (Japanese)

I have setup Dovecot-1.2 'delivery' to use Dovecot's Sieve as well
(both
versions are latest from dovecot.org).

I am curious if sieve can handle Japanese, or locale, in general in the 
language. In particular, the subject which is encoded even more complicated.

if header :contains "subject" ["test", "???"] {

So if I want to file based on above, test, and tesuto (utf-8). Most Japanese 
mail is send in iso-2022-jp, will it use iconv to change to utf-8 before the 
test? Then handle the Subject encoding? (It's also possible to be in EUC-JP,
but
that is unusual).

Or is it something on TODO?




-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Timo Sirainen

2009-Sep-11 02:36 UTC

head link

[Dovecot] Sieve and locale (Japanese)

On Sep 10, 2009, at 10:18 PM, Jorgen Lundman wrote:
> if header :contains "subject" ["test", "???"]
{
>
> So if I want to file based on above, test, and tesuto (utf-8). Most  
> Japanese mail is send in iso-2022-jp, will it use iconv to change to  
> utf-8 before the test? Then handle the Subject encoding? (It's also  
> possible to be in EUC-JP, but that is unusual).
All text, including subject, is converted to UTF-8 with iconv before  
comparing, as long as Dovecot was compiled with iconv support  
(automatically as long as iconv devel files were found).

Jorgen Lundman

2009-Sep-11 02:42 UTC

head link

[Dovecot] Sieve and locale (Japanese)

> All text, including subject, is converted to UTF-8 with iconv before
> comparing, as long as Dovecot was compiled with iconv support
> (automatically as long as iconv devel files were found).
I should have know that it would just work, and done right. Thanks Timo.

Lund


-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Stephan Bosch

2009-Sep-13 08:01 UTC

head link

[Dovecot] Sieve and locale (Japanese), string length

Jorgen Lundman wrote:> 
> Damn I apologise for the noise now, but I did manage to run into one 
> problem:
> 
> Subject: ??????????????????????????????? 
> ???? ?????????????????????
> 
> Subject: 
> =?UTF-8?B?5pel5pys6Kqe44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?> 
=?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=
> 
=?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=
>  =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?[...]
> 
> So it seems the longest word test is 16 UTF8 chars (or 48 bytes?). So as 
> long as we use small words, it should be ok.
> 
> Out of curiousity, can we increase this limit?Well, this looks very much like a bug to me. And, conveniently, it does 
not look like a bug in Sieve :o). By adding a small debug line i found 
that the Dovecot function mail_get_headers_utf8() returns the following 
for the above string:

?????????????? ???????????????? ????? 
??????????? ??????????

Note the additional spaces between the characters. This is due to the 75 
character limit for RFC2822 header lines, meaning that (as seen above) 
the RFC2047 encoding is broken into multiple parts.

Timo: As far as I understand RFC2047, the (<CRLF>)<SPC> sequence
between
the RFC2047 encoded words is not supposed to be added as a space. I 
admit that it is not a very well-specified RFC. The examples mention 
something about encoding space inside the encoded words if a joining 
space is required.

Regards,

Stephan

Maybe Matching Threads

Search for more possibly parallel threads

dovecot - Sep 2009 - Sieve and locale (Japanese)

[Dovecot] Sieve and locale (Japanese)

[Dovecot] Sieve and locale (Japanese)

[Dovecot] Sieve and locale (Japanese)

[Dovecot] Sieve and locale (Japanese), string length

Maybe Matching Threads