Andrew Richards
2011-Sep-21 22:59 UTC
[Dovecot] Dovecot failing to parse some UTF-8 encoded attachment filenames, returning empty string instead
Hi, I'm seeing a strange problem with some attachment filenames that are UTF-8 encoded. The problem seems to be related to spaces and/or unusual characters in filenames, like accented characters (or perhaps just to filenames if UTF-8 encoded; I've not explored that fully). These filenames are shown as empty strings in IMAP using Dovecot. I've attached a sample message that exhibits this problem, trimmed down to fairly bare essentials. By comparison I find that (for example) Courier happily returns the filename (still encoded). Although I suspect the problem lies within Dovecot, it may be an underlying Unicode or other component that's at the root of the problem. I can replicate this by putting the attached message in a mailbox (I'm using Maildir format mailboxes, so I just drop the raw file in Maildir/new and change the ownership of the file to match the mailbox owner). Then a pretend IMAP session to show the problem, $ telnet localhost 143 Trying ::1... Connected to localhost. Escape character is '^]'. * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN] Dovecot ready. 0 login some.one at test.domain password 0 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS] Logged in 0 select inbox * FLAGS (\Answered \Flagged \Deleted \Seen \Draft) * OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft \*)] Flags permitted. * 4 EXISTS * 0 RECENT * OK [UNSEEN 1] First unseen. * OK [UIDVALIDITY 1316621730] UIDs valid * OK [UIDNEXT 8] Predicted next UID * OK [HIGHESTMODSEQ 1] Highest 0 OK [READ-WRITE] Select completed. 0 fetch 4 body * 4 FETCH (BODY (("text" "html" ("charset" "iso-8859-15") NIL NIL "base64" 278 5)("application" "octet-stream" ("name" "") NIL NIL "base64" 18) "mixed")) 0 OK Fetch completed. 0 logout * BYE Logging out 0 OK Logout completed. Connection closed by foreign host. $ especially note the ("name" "") part showing a supposedly empty filename. I've observed this behaviour on the following versions of Dovecot, - 1.2.9 on Ubuntu 10.04LTS (pre-compiled version) - 1.2.17 on Fedora 13 (pre-compiled version) - 2.0.15 on Fedora 13 (from source) I don't think the Dovecot configuration is relevant, but I've put it below for good measure for the 2.0.15 setup. Any ideas on what might be causing this? Best regards, Andrew. # dovecot -n # 2.0.15: /usr/local/etc/dovecot/dovecot.conf # OS: Linux 2.6.34.9-69.fc13.i686.PAE i686 Fedora release 13 (Goddard) auth_debug = yes default_login_user = nobody log_path = /var/log/dovecot.log passdb { args = /usr/local/bin/checkcdb driver = checkpassword } protocols = imap pop3 service auth { user = root } service imap-login { inet_listener imap { ssl = no } } service pop3-login { inet_listener pop3 { ssl = no } } ssl = no userdb { driver = prefetch } -------------- next part -------------- A non-text attachment was scrubbed... Name: troublesome-dovecot-message Type: application/octet-stream Size: 971 bytes Desc: not available URL: <http://dovecot.org/pipermail/dovecot/attachments/20110921/22d6038f/attachment-0004.obj>
Timo Sirainen
2011-Sep-21 23:45 UTC
[Dovecot] Dovecot failing to parse some UTF-8 encoded attachment filenames, returning empty string instead
On 22.9.2011, at 1.59, Andrew Richards wrote:> I'm seeing a strange problem with some attachment filenames that are > UTF-8 encoded. The problem seems to be related to spaces and/or > unusual characters in filenames, like accented characters (or perhaps > just to filenames if UTF-8 encoded; I've not explored that fully).The problem is that the client sends it wrong:> Content-Type: application/octet-stream; > name==?UTF-8?B?dGhpc19mYWlscy50eHQ=?> Content-Disposition: attachment; > filename==?UTF-8?B?dGhpc19mYWlscy50eHQ=?These are both wrong. First of all they are illegal because they have = and ? characters, from RFC 2045:> parameter := attribute "=" value > value := token / quoted-string > token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, > or tspecials> > tspecials := "(" / ")" / "<" / ">" / "@" / > "," / ";" / ":" / "\" / <"> > "/" / "[" / "]" / "?" / "=" > ; Must be in quoted-string, > ; to use within parameter valuesAlso from RFC 2047 (encoded-word is the =?UTF-8?...?= thing):> + An 'encoded-word' MUST NOT be used in parameter of a MIME > Content-Type or Content-Disposition field, or in any structured > field body except within a 'comment' or 'phrase'.The proper way to do this would be to use RFC 2184, which looks something like this:> Content-Disposition: attachment; > filename*=iso-8859-1''p%E4%E4Looks like Apple Mail also sends:> Content-Type: application/octet-stream; > name="=?iso-8859-1?Q?p=E4=E4?="That is inside a quoted-string, so it's not broken, but clients aren't really supposed to decode that string in there either. Anyway .. I'll check tomorrow if I can easily add code to workaround your problem. If it's just a minor change I'll do it.
Possibly Parallel Threads
- Broken mail clients? [MIME] Long attachment encoded filenames (for non-ASCII characters etc)
- Long attachment encoded filenames (for non-ASCII characters etc) in MIME headers & corresponding Dovecot behaviour
- Length of attachment filenames
- CentOS6: ntfs-3g and writing utf-8 filenames
- DO NOT REPLY [Bug 7816] New: get_tmpname() can create invalid UTF-8 filenames