Hello everybody, hope you?re all well! We have a really strange bug with our dovecot setup. After weeks and month of debugging we?re finally reached the point when we ask you for help. Our basic setup is quite simple: a single dovecot 2.3.10 (0da0eff44) install on Debian 10 with LDAP as user/passdb and sieve, imap_sieve and mail_crypt as mail_plugins. For the mail_crypt we have a special rule which allows us to enable crypting for each user individually. See this line from dovecot-ldap.conf.ext: pass_attrs = userPassword=password,=userdb_mail_plugins=%{if;%{real_lport};eq;993;imap_sieve;} %{if;%{real_lport};eq;143;imap_sieve;} notify mail_log %{if;%{ldap:encryptMail};eq;TRUE;mail_crypt; },=userdb_mail_crypt_private_password=%w As you can see for IMAP connections we enable imap_sieve and if the LDAP attribute ?encryptMail? is set to TRUE we enable ?mail_crypt? as mail_plugin. This works fine except that around 5-10 times a day the dovecot server crashes during a LMTP connection. It crashes in the last steps of LMTP which result that the mail get delivered to Maildir but Postfix doesn?t get a positive response and delivers the mail again after a few minutes. So all of our users get some mails twice or even more times. This is what the log says in the moment of crashing: dovecot: lmtp(53852): Panic: Module context mail_crypt_user_module missing *** backtrace *** (see: https://pastebin.com/YCiFtxmy) dovecot: lmtp(53852): Fatal: master: service(lmtp): child 53852 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps) Here I?ve pasted the LTMP session in mail.log: https://pastebin.com/A4ughNUF Obviously it looks like it has something to do with our mail_crypt setup. But the thing is that although we?re monitoring this for months we cannot find a condition which cause this. It seems to happen more frequent for mails with lot of people in To or CC (large headers), but definitively not all of them? Some more things we?ve already thought on: First my guess was that the mail_crypt plugin isn?t loaded in some case but in the backtrace are things from ?lib10_mail_crypt_plugin.so?, so it should be loaded, right? On the other hand a non-crashing LTMP session looks exactly the same in log except it has three additional lines in mail.log: Debug: Loading modules from directory: /usr/lib/dovecot/modules Debug: Module loaded: /usr/lib/dovecot/modules/lib15_notify_plugin.so Debug: Module loaded: /usr/lib/dovecot/modules/lib20_mail_log_plugin.so My college already had a look in the source code, but this didn?t make us much smarter. Here his references: mail context is defined here: https://github.com/dovecot/core/blob/81b5b188c478ec36bea8bda8fcad1e5f32ac612b/src/plugins/mail-crypt/mail-crypt-plugin.c#L453 and removed again here: https://github.com/dovecot/core/blob/81b5b188c478ec36bea8bda8fcad1e5f32ac612b/src/plugins/mail-crypt/mail-crypt-plugin.c#L397 Does any of the experts on this list have an idea what could cause these crashes and how to fix them? Thanks in advance! Best, Andi
On 29. Mar 2020, at 15.15, Andi Krischer <ak at michaelpelzer.de> wrote:> We have a really strange bug with our dovecot setup. After weeks and month of debugging we?re finally reached the point when we ask you for help. > > Our basic setup is quite simple: a single dovecot 2.3.10 (0da0eff44) install on Debian 10 with LDAP as user/passdb and sieve, imap_sieve and mail_crypt as mail_plugins.Was this happening with old Dovecot versions, or did it start after v2.3.10 upgrade?> This is what the log says in the moment of crashing: > dovecot: lmtp(53852): Panic: Module context mail_crypt_user_module missing > *** backtrace *** (see: https://pastebin.com/YCiFtxmy)It shows mailbox_free() in the backtrace, so it's probably crashing in mail_crypt_mailbox_close()'s MAIL_CRYPT_CONTEXT(box). But looking at the code, I can't see how that could be possible. Maybe there is some kind of memory corruption or something.> dovecot: lmtp(53852): Fatal: master: service(lmtp): child 53852 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps)Could you get core dumps enabled? See the link. And from the core dump send at least: bt full fr 6 (or maybe 5 or 7, whichever makes the following command work) p *box Also please send your doveconf -n output.
On 29. Mar 2020, at 16:07, Timo Sirainen <timo at sirainen.com> wrote:> > On 29. Mar 2020, at 15.15, Andi Krischer <ak at michaelpelzer.de> wrote: >> We have a really strange bug with our dovecot setup. After weeks and month of debugging we?re finally reached the point when we ask you for help. >> >> Our basic setup is quite simple: a single dovecot 2.3.10 (0da0eff44) install on Debian 10 with LDAP as user/passdb and sieve, imap_sieve and mail_crypt as mail_plugins. > > Was this happening with old Dovecot versions, or did it start after v2.3.10 upgrade?Yes, we implemented this setup with v2.3.5.1 on Debian 9 and had the issue right from the beginning.>> This is what the log says in the moment of crashing: >> dovecot: lmtp(53852): Panic: Module context mail_crypt_user_module missing >> *** backtrace *** (see: https://pastebin.com/YCiFtxmy) > > It shows mailbox_free() in the backtrace, so it's probably crashing in mail_crypt_mailbox_close()'s MAIL_CRYPT_CONTEXT(box). But looking at the code, I can't see how that could be possible. Maybe there is some kind of memory corruption or something. > >> dovecot: lmtp(53852): Fatal: master: service(lmtp): child 53852 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps) > > Could you get core dumps enabled? See the link. And from the core dump send at least: > > bt full > fr 6 (or maybe 5 or 7, whichever makes the following command work) > p *box > > Also please send your doveconf -n output.Core dump: https://pastebin.com/g4VkMXsT (hope this is everything - I?m not that familiar with core dumps?) Doveconf: https://pastebin.com/JnhXkYUA Hope this helps!
On 29. Mar 2020, at 17.07, Timo Sirainen <timo at sirainen.com> wrote:> >> This is what the log says in the moment of crashing: >> dovecot: lmtp(53852): Panic: Module context mail_crypt_user_module missing >> *** backtrace *** (see: https://pastebin.com/YCiFtxmy) > > It shows mailbox_free() in the backtrace, so it's probably crashing in mail_crypt_mailbox_close()'s MAIL_CRYPT_CONTEXT(box). But looking at the code, I can't see how that could be possible. Maybe there is some kind of memory corruption or something.Looks like there's a bug when some users have mail_crypt plugin loaded and other users don't. If LMTP delivers a mail to both users within the same session, it crashes at deinit. A workaround could be to tell MTA to deliver to max recipient at a time. We'll get this fixed also to some future release. You could also apply a patch: diff --git a/src/plugins/mail-crypt/mail-crypt-plugin.c b/src/plugins/mail-crypt/mail-crypt-plugin.c index 43ece3d3b..a37fb4a35 100644 --- a/src/plugins/mail-crypt/mail-crypt-plugin.c +++ b/src/plugins/mail-crypt/mail-crypt-plugin.c @@ -345,7 +345,7 @@ static void mail_crypt_mailbox_close(struct mailbox *box) { struct mail_crypt_mailbox *mbox = MAIL_CRYPT_CONTEXT(box); struct mail_crypt_user *muser - MAIL_CRYPT_USER_CONTEXT_REQUIRE(box->storage->user); + MAIL_CRYPT_USER_CONTEXT(box->storage->user); if (mbox->pub_key != NULL) dcrypt_key_unref_public(&mbox->pub_key);