Greetings all, I'm trying to get an understanding of a problem we are facing here. We're currently running dovecot 1.0-beta3 and have a long standing issue of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp). Here's what is happening: The machine hangs and the system load climbs as high as 80.0+. Yet, the system response is not effected. Command line is still responds instantly. There are multiple running dovecot PIDs, even if I stop the service. If I try to kill or -9 the PIDs, they will not die. The machine is DOA and must be forcefully restarted. Issuing a reboot will cause the machine to hang when it attempts to unmount network shares. Here's the setup: - Dovecot 1.0-beta3 - lock_method = dotlock - mmap_disable = yes /var/mail is store locally on the mail server and access via NFS to ALL remote machines. All remote machines have /var/mail sym linked to the NFS share on Mail. /home on Mail is NFS'd to another set of servers where IMAP mail folders reside in mbox format. All client machines have /home sym linked to the second NFS server. In other words, there's a lot of NFS shares and one mail transaction can involve 3 machines. What I'm trying to find out is the current state of NFS locking with Dovecot. This system hang happens 1-3 times a week. The current /home NFS mounts are running from SGI machines on IRIX 6.5. Clients are all Linux (debian) 2.4 or Linux (ubuntu) 2.6. Is our setup too much for Dovecot to handle? Are there other variables we're not looking at here? Thanks everyone. -- =============================================Nate Sanders nate at ima.umn.edu Associate Systems Manager (612) 624 - 4353 http://www.ima.umn.edu/ =============================================Institute for Mathematics and its Applications University of Minnesota 400 Lind Hall, 207 Church St. SE Minneapolis, MN 55455-0463 ==============================================
Nate Sanders wrote:> Greetings all, > > I'm trying to get an understanding of a problem we are facing here. > We're currently running dovecot 1.0-beta3 and have a long standing issue > of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp). > > Here's what is happening: > > The machine hangs and the system load climbs as high as 80.0+. Yet, the > system response is not effected. Command line is still responds > instantly. There are multiple running dovecot PIDs, even if I stop the > service. If I try to kill or -9 the PIDs, they will not die. The machine > is DOA and must be forcefully restarted. Issuing a reboot will cause the > machine to hang when it attempts to unmount network shares. >It sounds like NFS is dying one way or another -- likely due to a bug on either the client side (you could try compiling a newer 2.4 or 2.6 kernel) or the server side (I know jack about NFS on IRIX.) If you look at the tasks in ps or top, the 'state' column is probably 'D' indicating an uninterruptible sleep (which usually means the process is hung waiting for an IO request to complete.) Are there any messages in the kernel log indicating NFS timeouts? Specifying 'intr' in the nfs mount options might enable you to actually kill the running dovecot processes, unmount, and remount, but that won't solve your real problem. -- Ben Winslow <rain at bluecherry.net> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 827 bytes Desc: OpenPGP digital signature URL: <http://dovecot.org/pipermail/dovecot/attachments/20060502/abe5f216/attachment.bin>
On Mon, 2006-05-01 at 15:56 -0500, Nate Sanders wrote:> Here's the setup: > > - Dovecot 1.0-beta3 > - lock_method = dotlock > - mmap_disable = yesSo if you're using mboxes, what about mbox_read_locks and mbox_write_locks? Maybe it helps if you change them to be dotlocks also. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20060502/ff82f6d8/attachment.bin>