Jorgen Lundman
2009-Aug-26 02:06 UTC
[Dovecot] Load spikes on NFS server, multiple index updaters.
We are occasionally experiencing trouble where the NFS server's load will shoot over 60+. (Normal of sub 1.0). I have been hunting this for a while, and I believe it comes down to "deliver". System setup: NFS servers: x4540 Solaris 10 x64 ZFS over NFS. NFS clients: Solaris 10 x64 postfix-2.4.1 with dovecot-1.1.11 deliver. What appears to happen, when I check for nfsstat per process, is that I see 4 processes (in this case on vmx04) taking up majority of NFS ops: root at vmx04:/var/tmp# ./nfsclientstats.pl process read write readdir getattr setattr lookup access create remove rename mkdir rmdir 24303 0 0 1 19 0 190 171 0 0 0 0 0 24551 0 0 1 18 0 180 162 0 0 0 0 0 26099 0 0 1 18 0 180 162 0 0 0 0 0 295 0 0 1 18 0 180 162 0 0 0 0 0 6793 3 0 0 0 0 5 5 0 0 0 0 0 7234 0 1 0 2 0 9 9 0 1 0 0 0 Checking what these processes are doing, I find the following happening: 26099: getdents64(8, 0xCE7A4000, 8192) = 8136 26099: stat64("/export/censored/mail/cur/1223013930.V4700010I69f93eM483098.vmx02.unix:2,S", 0x08047930) = 0 26099: stat64("/export/censored/mail/cur/1222066290.V4700007I67562bM241839.vmx04.unix:2,S", 0x08047930) = 0 26099: stat64("/export/censored/mail/cur/1225325373.V4700008I94a1f9M286935.vmx04.unix:2,S", 0x08047930) = 0 26099: stat64("/export/censored/mail/cur/1236170307.V4700002I67ca03M310418.vmx06.unix:2,", 0x08047930) = 0 26099: stat64("/export/censored/mail/cur/1223581462.V4700011I69ffd6M720814.vmx02.unix:2,S", 0x08047930) = 0 Very well, so it is rebuilding the dovecot.index, or recalculating the user's quota usage. Is the directory large? root at vmx04# ls -l /export/censored/mail/cur/|wc -l 199626 You bet! But what is annoying is that if I also check process 24303, 24551 and 295, they are scanning the SAME user's directory. 295: stat64("/export/censored/mail/cur/1230544947.V4700004I11d1d7fM492433.vmx04.unix:2,S", 0x08047930) = 0 295: stat64("/export/censored/mail/cur/1223003964.V4700007I68932dM763546.vmx04.unix:2,S", 0x08047930) = 0 So, in vmx04 we have 4 processes working in one user's giant directory, and on the other vmx clients, many more. Could the semantics to 're-computing dovecot.index' be done such that the first "deliver" process locks it to do the work, and sub-sequent deliver processes will return temporary failures, until the work has finished. Has it been already addresses in dovecot-1.1.18? Advice please. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Timo Sirainen
2009-Aug-26 03:24 UTC
[Dovecot] Load spikes on NFS server, multiple index updaters.
On Aug 25, 2009, at 10:06 PM, Jorgen Lundman wrote:> 26099: stat64("/export/censored/mail/cur/ > 1223013930.V4700010I69f93eM483098.vmx02.unix:2,S", 0x08047930) = 0Are these old files, or why don't they contain the ,S=1234 in filename? That would help a lot when recaculating Maildir++ quota.> Could the semantics to 're-computing dovecot.index' be done such > that the first "deliver" process locks it to do the work, and sub- > sequent deliver processes will return temporary failures, until the > work has finished.But Maildir++ quota is supposed to work without locks.. :) Do you need Maildir++ quota at all? With v1.2 you could use dict quota with file backend. It'll use dovecot.index.cache when recalculating quota, although it doesn't do that unless the quota is lost for some reason (so about never).