Andrew Richards
2011-Sep-21  22:59 UTC
[Dovecot] Dovecot failing to parse some UTF-8 encoded attachment filenames, returning empty string instead
Hi,
I'm seeing a strange problem with some attachment filenames that are
UTF-8 encoded. The problem seems to be related to spaces and/or
unusual characters in filenames, like accented characters (or perhaps
just to filenames if UTF-8 encoded; I've not explored that fully).
These filenames are shown as empty strings in IMAP using Dovecot. I've
attached a sample message that exhibits this problem, trimmed down to
fairly bare essentials. By comparison I find that (for example)
Courier happily returns the filename (still encoded). Although I
suspect the problem lies within Dovecot, it may be an underlying
Unicode or other component that's at the root of the problem.
I can replicate this by putting the attached message in a mailbox (I'm
using Maildir format mailboxes, so I just drop the raw file in
Maildir/new and change the ownership of the file to match the mailbox
owner). Then a pretend IMAP session to show the problem,
$ telnet localhost 143
Trying ::1...
Connected to localhost.
Escape character is '^]'.
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
IDLE AUTH=PLAIN] Dovecot ready.
0 login  some.one at test.domain password
0 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND
UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1
CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH
LIST-STATUS] Logged in
0 select inbox
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft \*)]
Flags permitted.
* 4 EXISTS
* 0 RECENT
* OK [UNSEEN 1] First unseen.
* OK [UIDVALIDITY 1316621730] UIDs valid
* OK [UIDNEXT 8] Predicted next UID
* OK [HIGHESTMODSEQ 1] Highest
0 OK [READ-WRITE] Select completed.
0 fetch 4 body
* 4 FETCH (BODY (("text" "html" ("charset"
"iso-8859-15") NIL NIL
"base64" 278 5)("application" "octet-stream"
("name" "") NIL NIL
"base64" 18) "mixed"))
0 OK Fetch completed.
0 logout
* BYE Logging out
0 OK Logout completed.
Connection closed by foreign host.
$
especially note the ("name" "") part showing a supposedly
empty filename.
I've observed this behaviour on the following versions of Dovecot,
 - 1.2.9 on Ubuntu 10.04LTS (pre-compiled version)
 - 1.2.17 on Fedora 13 (pre-compiled version)
 - 2.0.15 on Fedora 13 (from source)
I don't think the Dovecot configuration is relevant, but I've put it
below for good measure for the 2.0.15 setup.
Any ideas on what might be causing this?
Best regards,
Andrew.
# dovecot -n
# 2.0.15: /usr/local/etc/dovecot/dovecot.conf
# OS: Linux 2.6.34.9-69.fc13.i686.PAE i686 Fedora release 13 (Goddard)
auth_debug = yes
default_login_user = nobody
log_path = /var/log/dovecot.log
passdb {
  args = /usr/local/bin/checkcdb
  driver = checkpassword
}
protocols = imap pop3
service auth {
  user = root
}
service imap-login {
  inet_listener imap {
    ssl = no
  }
}
service pop3-login {
  inet_listener pop3 {
    ssl = no
  }
}
ssl = no
userdb {
  driver = prefetch
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: troublesome-dovecot-message
Type: application/octet-stream
Size: 971 bytes
Desc: not available
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20110921/22d6038f/attachment-0004.obj>
Timo Sirainen
2011-Sep-21  23:45 UTC
[Dovecot] Dovecot failing to parse some UTF-8 encoded attachment filenames, returning empty string instead
On 22.9.2011, at 1.59, Andrew Richards wrote:> I'm seeing a strange problem with some attachment filenames that are > UTF-8 encoded. The problem seems to be related to spaces and/or > unusual characters in filenames, like accented characters (or perhaps > just to filenames if UTF-8 encoded; I've not explored that fully).The problem is that the client sends it wrong:> Content-Type: application/octet-stream; > name==?UTF-8?B?dGhpc19mYWlscy50eHQ=?> Content-Disposition: attachment; > filename==?UTF-8?B?dGhpc19mYWlscy50eHQ=?These are both wrong. First of all they are illegal because they have = and ? characters, from RFC 2045:> parameter := attribute "=" value > value := token / quoted-string > token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, > or tspecials> > tspecials := "(" / ")" / "<" / ">" / "@" / > "," / ";" / ":" / "\" / <"> > "/" / "[" / "]" / "?" / "=" > ; Must be in quoted-string, > ; to use within parameter valuesAlso from RFC 2047 (encoded-word is the =?UTF-8?...?= thing):> + An 'encoded-word' MUST NOT be used in parameter of a MIME > Content-Type or Content-Disposition field, or in any structured > field body except within a 'comment' or 'phrase'.The proper way to do this would be to use RFC 2184, which looks something like this:> Content-Disposition: attachment; > filename*=iso-8859-1''p%E4%E4Looks like Apple Mail also sends:> Content-Type: application/octet-stream; > name="=?iso-8859-1?Q?p=E4=E4?="That is inside a quoted-string, so it's not broken, but clients aren't really supposed to decode that string in there either. Anyway .. I'll check tomorrow if I can easily add code to workaround your problem. If it's just a minor change I'll do it.
Apparently Analagous Threads
- Broken mail clients? [MIME] Long attachment encoded filenames (for non-ASCII characters etc)
- Long attachment encoded filenames (for non-ASCII characters etc) in MIME headers & corresponding Dovecot behaviour
- Length of attachment filenames
- CentOS6: ntfs-3g and writing utf-8 filenames
- DO NOT REPLY [Bug 7816] New: get_tmpname() can create invalid UTF-8 filenames