(Background: Relatively new to dovecot; looking to do transparent replacement of long-established UW-IMAP on cluster of Linux boxes which NFS-mount a shared "/var/spool/mail".) With rc8, where I had already increased "login_max_processes_count" from default 128 to 1024, we had still hit the issue of too many logins crashing dovecot, so that trial had only lasted a couple of hours. With rc10, I doubled this to try to avoid the problem (I didn't want to risk testing the new code that addressed the problem... sorry!). We ran for almost a full working day. Good! Because of a few issues (below) I then backed off. 1. "User unknown": We use NIS for our passwd information. On the earlier rc8 test we had had several occurences of "User unknown" (from "deliver") giving "dsn=5..." for perfectly valid users. So for this rc10 test I applied a local patch so these were reduced to "EX_TEMPFAIL" (dsn=4...). (This was triggered, as epected, a few times and subsequent delivery attemtps succeeded.) I strongly suspect that this is some sort of issue with FC5, probably "nscd" and nothing to do with dovecot. Hints would be nice, but from the dovecot perspective you may probably ignore this item. 2. For one particular user, the "deliver" consistently gave: Failed to create storage for '...' with mail 'mbox:/HOME_DIRECTORY_USED_BUT_NOT_GIVEN_BY_USERDB:INBOX=... I think this is ultimately due to something strange in the user ".forward" file. I'd be delighted to follow this up with anyone else who might have seen it. Although in one sense we may be drifting off-topic, in another sense I suspect that there is scope for adjusting "deliver" to handle this more gracefully. 3. There were several occurences of: IMAP(...): file ../../../../../src/lib-storage/index/mbox/mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion failed: (need_space == (uoff_t)-mails[idx].space) child 30842 (imap) killed with signal 6 This looks particularly awkward. Any thoughts? 4. There were two occurences of: IMAP(...): file ../../../src/lib-index/mail-index.c: line 1801 (mail_index_move_to_memory): assertion failed: (index->fd == -1) child 20493 (imap) killed with signal 6 Again, this looks particularly awkward. Any thoughts? For these last two items, note that the indexes are currently NFS-shared alongside the INBOX area. I'm still not clear on how to regard the concept of indexes, as applied to a small cluster of machines, and handling simultaneous updates to INBOXes (analogous to the vital importance of INBOX locking for such updates). If one imagines the IMAP daemon (and pop and deliver) as file-clients of the (NFS-shared) INBOXes on a fileserver, do the indexes belong very close to the INBOXes (fileserver) or the dovecot software (file client)? So should I have the indexes on the fileserver (one instance), or should they be on each cluster machine's private storage (possibly several instances; one per cluster machine)? I've got them on the server; would they be better on the cluster clients? (Might that be the cause and fix of these two problems?) -- : David Lee I.T. Service : : Senior Systems Programmer Computer Centre : : Durham University : : http://www.dur.ac.uk/t.d.lee/ South Road : : Durham DH1 3LE : : Phone: +44 191 334 2752 U.K. :
dovecot-request at dovecot.org wrote:> Date: Fri, 20 Oct 2006 13:26:03 +0100 (BST) > From: David Lee <t.d.lee at durham.ac.uk> > Subject: [Dovecot] 1.0.rc10 status report > > > 1. "User unknown": We use NIS for our passwd information. On the earlier > rc8 test we had had several occurences of "User unknown" (from "deliver") > giving "dsn=5..." for perfectly valid users. So for this rc10 test I > applied a local patch so these were reduced to "EX_TEMPFAIL" (dsn=4...). > (This was triggered, as epected, a few times and subsequent delivery > attemtps succeeded.) I strongly suspect that this is some sort of issue > with FC5, probably "nscd" and nothing to do with dovecot. Hints would be > nice, but from the dovecot perspective you may probably ignore this item. > >I've had similar "User unknowns" with nscd in the past. I was using dovecot ->getpwent -> nscd -> nss_ldap -> LDAP. I found out that whenever the ldap server got restarted, nscd did not restart properly and immediately its permanent ldap connection, and was giving 'user unknown' replies at least for a few minutes. Restarting nscd would fix the problem immediately. Running without nscd would also fix the problem. There was no problem with pam_ldap authentication. As a result, the system got ignorant of its uncached users for a few moments. My workaround was a crontab that kept /etc/passwd up-to-date as well as an entry in the /etc/resolv.conf 'passwd: ldap files' (cat /etc/passwd ; getent passwd ) |sort -u >/etc/passwd.tmp && mv /etc/passwd.tmp /etc/passwd> So should I have the indexes on the fileserver (one instance), or should they > be on each cluster machine's private storage (possibly several instances; > one per cluster machine)?The suggestion is that if you need to have mailboxes over NFS, use local disk for indexes. If you read the complete list archive you will find too many people that have had troubles with indexes over NFS. Local indexes have a performance penalty, only when you have access from different imap servers (especially concurrent) . apap -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5890 bytes Desc: S/MIME Cryptographic Signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20061023/36fd1b0d/attachment-0002.bin>
On Oct 20, 2006, at 5:26 AM, David Lee wrote:> 3. There were several occurences of: > IMAP(...): file ../../../../../src/lib-storage/index/mbox/ > mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion > failed: (need_space == (uoff_t)-mails[idx].space) > child 30842 (imap) killed with signal 6 > > This looks particularly awkward. Any thoughts?I'm also getting this error fairly often. I first installed Dovecot with 1.0rc10 last Wednesday after running with UW-Imap for a few years, under RedHat 9 Linux, using mbox format. The primary client I'm using is Apple MacOS X Mail; the box has between 3-5 regular imap users. I added the following code right before the assert to try and assist with debugging (assert line copied in for context), in the file src/ lib-storage/index/mbox/mbox-sync-rewrite.c at line 405: + if (need_space != (uoff_t)-mails[idx].space) + i_info("Need_space: %d idx: %d mails [idx].space: %d",need_space,idx,mails[idx].space); i_assert(need_space == (uoff_t)-mails[idx].space); I know I'm not doing a proper cast of the variables, but here's the log message I've gotten about 17 times since installing the debug version about an hour ago: Oct 23 01:24:58 dragonlair dovecot: IMAP(dalvenja): Need_space: -12 idx: 6 mails[idx].space: -1 (which is particularly odd, since the three numbers are always the same.) I've only halfheartedly tried to track back what's going on in the daemon to get to this point; I know that it seems to happen when I delete mails with the client (which moves the mails to a Trash folder and then either sets the D status flag or removes them from the Inbox entirely); beyond that I'm not sure what's happening exactly. (If anyone does and can give me a couple of hints as to what to try or look for, I can do some more debugging). It doesn't appear to be doing anything bad, other than slowing down mail client operations a bit; OS X mail seems to be smart enough to know when the imap server didn't do what it wants and to repeat the operations, and I don't believe I've had any real issues as a result of the assert failure. Just as a data point, when I first converted, I did get the following types of messages as well: Oct 18 22:08:06 dragonlair dovecot: IMAP(dalvenja): mbox sync: Expunged message reappeared in mailbox /var/spool/mail/dalvenja (UID 1 < 34486, seq=1, idx_msgs=0) Oct 18 22:08:57 dragonlair dovecot: IMAP(dalvenja): mbox sync: UID inserted in the middle of mailbox /var/spool/mail/dalvenja (34486 > 1, seq=1, idx_msgs=5607) Oct 18 22:16:52 dragonlair dovecot: IMAP(dalvenja): UIDs broken with partial sync in mbox file /var/spool/mail/dalvenja but those don't appear to have shown up since (I assume Dovecot is correcting them as it finds them.) Other than that, Dovecot is certainly much faster than UW-IMAP; I'm very pleased with the speedup I've gotten from it so far. It's also lessened the incidents of OS X mail "giving up" on operations; with UW-IMAP, if I tried to do too many operations at once (delete 5 messages, delete 5 more messages, pull up a message, delete 5 more, pull up another), it would sometimes give up, put all the deleted messages back, and try to resync itself with the imap server. So far I haven't had any of those occurrences when using Dovecot. Thanks all, -dalvenjah
>> Axel Thimm wrote: >> >>> On Mon, Oct 23, 2006 at 11:04:18AM +0300, "????????? >>> ????????????? (Apostolis Papagiannakis)" wrote: >>> >>> >>>> I've had similar "User unknowns" with nscd in the past. I was using >>>> dovecot ->getpwent -> nscd -> nss_ldap -> LDAP. >>>> >>>> >>> Are you using ldapi? >>> >>> >> Oops, I think I sent my previous post with unreadable HTML formating. I >> hope this one is OK. >> >> In /etc/ldap.conf (nss_ldap conf file) I use two ldap servers as >> "ldaps" URIs. >> >> # /etc/ldap.conf >> uri ldaps://ldap1.auth.gr/ ldaps://ldap2.auth.gr/ >> >> apap >> >> > > You need to make sure that the user nscd is running as has proper > permissions to the required resources (r/w on ldapi sockets, read on > ldaps' ca certs and the like). Turn on the debug level in ldap.conf > (nss_ldap's, not openssl's) and sudo to the nscd user/group to test > the access. > > Also nscd doesn't use rootbinddn, it uses binddn. >I think file permissions have always been ok because nscd and nss_ldap usually work ok. The problem appears only when the ldap connection breaks (e.g. remote ldap server restart). We don't use rootbinddn at all. Anyway I just checked the latest version of nss_ldap and now I see interesting new relevant options are available (e.g. nss_connect_policy persist/oneshot). I will give it a try and respond back in a few days. Definately nss_ldap's bad behaviour is not really a dovecot problem. Dovecot has been rock solid here serving 30000 users (4000 different active users every day) on a single server. apap -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5890 bytes Desc: S/MIME Cryptographic Signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20061023/176b5ecc/attachment-0002.bin>
On Fri, 2006-10-20 at 13:26 +0100, David Lee wrote:> 1. "User unknown": We use NIS for our passwd information. On the earlier > rc8 test we had had several occurences of "User unknown" (from "deliver") > giving "dsn=5..." for perfectly valid users. So for this rc10 test I > applied a local patch so these were reduced to "EX_TEMPFAIL" (dsn=4...). > (This was triggered, as epected, a few times and subsequent delivery > attemtps succeeded.) I strongly suspect that this is some sort of issue > with FC5, probably "nscd" and nothing to do with dovecot. Hints would be > nice, but from the dovecot perspective you may probably ignore this item.Yea. Dovecot only does a getpwent() call which can't really be used wrong.> 2. For one particular user, the "deliver" consistently gave: > Failed to create storage for '...' with mail 'mbox:/HOME_DIRECTORY_USED_BUT_NOT_GIVEN_BY_USERDB:INBOX=... > > I think this is ultimately due to something strange in the user ".forward" > file. I'd be delighted to follow this up with anyone else who might have > seen it. Although in one sense we may be drifting off-topic, in another > sense I suspect that there is scope for adjusting "deliver" to handle this > more gracefully.Is deliver executed from .forward file? In that case the HOME environment isn't set and deliver doesn't assume that it's going to deliver to the current local user, so it's not looking up the home directory by itself..> 3. There were several occurences of: > IMAP(...): file ../../../../../src/lib-storage/index/mbox/mbox-sync-rewrite.c: line 405 (mbox_sync_read_and_move): assertion failed: (need_space == (uoff_t)-mails[idx].space) > child 30842 (imap) killed with signal 6 > > This looks particularly awkward. Any thoughts?In case you missed, this fixes it: http://dovecot.org/patches/1.0/dovecot-1.0.rc10-mbox-keywords-fix.patch> > 4. There were two occurences of: > IMAP(...): file ../../../src/lib-index/mail-index.c: line 1801 (mail_index_move_to_memory): assertion failed: (index->fd == -1) > child 20493 (imap) killed with signal 6 > > Again, this looks particularly awkward. Any thoughts?The moving to memory code isn't perfect, but normally it shouldn't even be done. I think there are only two reasons: 1) Filesystem quota / out of disk space in general 2) mbox_min_index_size> For these last two items, note that the indexes are currently NFS-shared > alongside the INBOX area. > > I'm still not clear on how to regard the concept of indexes, as applied to > a small cluster of machines, and handling simultaneous updates to INBOXes > (analogous to the vital importance of INBOX locking for such updates). > > If one imagines the IMAP daemon (and pop and deliver) as file-clients of > the (NFS-shared) INBOXes on a fileserver, do the indexes belong very close > to the INBOXes (fileserver) or the dovecot software (file client)? So > should I have the indexes on the fileserver (one instance), or should they > be on each cluster machine's private storage (possibly several instances; > one per cluster machine)? I've got them on the server; would they be > better on the cluster clients? (Might that be the cause and fix of these > two problems?)Indexes contain metadata of the mailboxes, so if you're using multiple different computers to read/write to the same user's mailbox, then it's better to keep them in NFS. If you can make only a single computer access the same user's mailbox most of the time then it's probably faster to keep them in local disk. Otherwise if you kept them in local disk in different computers you'd waste time in synchronizing the indexes separately for each computer that accesses the mailbox. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://dovecot.org/pipermail/dovecot/attachments/20061102/5294f894/attachment.pgp