Jorgen Lundman
2009-Aug-26  02:06 UTC
[Dovecot] Load spikes on NFS server, multiple index updaters.
We are occasionally experiencing trouble where the NFS server's load 
will shoot over 60+. (Normal of sub 1.0).
I have been hunting this for a while, and I believe it comes down to 
"deliver".
System setup:
NFS servers: x4540 Solaris 10 x64 ZFS over NFS.
NFS clients: Solaris 10 x64 postfix-2.4.1 with dovecot-1.1.11 deliver.
What appears to happen, when I check for nfsstat per process, is that I 
see 4 processes (in this case on vmx04) taking up majority of NFS ops:
root at vmx04:/var/tmp# ./nfsclientstats.pl
process    read write readdir getattr setattr lookup access create 
remove rename mkdir rmdir
24303         0     0       1      19       0    190    171      0 
0      0     0     0
24551         0     0       1      18       0    180    162      0 
0      0     0     0
26099         0     0       1      18       0    180    162      0 
0      0     0     0
295           0     0       1      18       0    180    162      0 
0      0     0     0
6793          3     0       0       0       0      5      5      0 
0      0     0     0
7234          0     1       0       2       0      9      9      0 
1      0     0     0
Checking what these processes are doing, I find the following happening:
26099:  getdents64(8, 0xCE7A4000, 8192)                 = 8136
26099: 
stat64("/export/censored/mail/cur/1223013930.V4700010I69f93eM483098.vmx02.unix:2,S",
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1222066290.V4700007I67562bM241839.vmx04.unix:2,S",
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1225325373.V4700008I94a1f9M286935.vmx04.unix:2,S",
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1236170307.V4700002I67ca03M310418.vmx06.unix:2,",
0x08047930) = 0
26099: 
stat64("/export/censored/mail/cur/1223581462.V4700011I69ffd6M720814.vmx02.unix:2,S",
0x08047930) = 0
Very well, so it is rebuilding the dovecot.index, or recalculating the 
user's quota usage.
Is the directory large?
root at vmx04# ls -l /export/censored/mail/cur/|wc -l
   199626
You bet! But what is annoying is that if I also check process 24303, 
24551 and 295, they are scanning the SAME user's directory.
295: 
stat64("/export/censored/mail/cur/1230544947.V4700004I11d1d7fM492433.vmx04.unix:2,S",
0x08047930) = 0
295: 
stat64("/export/censored/mail/cur/1223003964.V4700007I68932dM763546.vmx04.unix:2,S",
0x08047930) = 0
So, in vmx04 we have 4 processes working in one user's giant directory, 
and on the other vmx clients, many more.
Could the semantics to 're-computing dovecot.index' be done such that 
the first "deliver" process locks it to do the work, and sub-sequent 
deliver processes will return temporary failures, until the work has 
finished.
Has it been already addresses in dovecot-1.1.18?
Advice please.
Lund
-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)
Timo Sirainen
2009-Aug-26  03:24 UTC
[Dovecot] Load spikes on NFS server, multiple index updaters.
On Aug 25, 2009, at 10:06 PM, Jorgen Lundman wrote:> 26099: stat64("/export/censored/mail/cur/ > 1223013930.V4700010I69f93eM483098.vmx02.unix:2,S", 0x08047930) = 0Are these old files, or why don't they contain the ,S=1234 in filename? That would help a lot when recaculating Maildir++ quota.> Could the semantics to 're-computing dovecot.index' be done such > that the first "deliver" process locks it to do the work, and sub- > sequent deliver processes will return temporary failures, until the > work has finished.But Maildir++ quota is supposed to work without locks.. :) Do you need Maildir++ quota at all? With v1.2 you could use dict quota with file backend. It'll use dovecot.index.cache when recalculating quota, although it doesn't do that unless the quota is lost for some reason (so about never).