Hi, In this thread I have posted to freebsd-fs@ several messages describing our problem with freebsd7.1 nfs clients. As with the time new info has appeared and having this all spread in several messages might be a bit confusing, I want to summarise here what we see and know. Also I cc to freebsd-stable@ hoping to draw more attention to this problem as it looks for me very interesting and challenging :-) I have found in the Internet that other people have been observed the similar problem with FreeBSD6.2 client: http://forums.freebsd.org/showthread.php?t=1697 So, on some of our freebsd7.1 nfs clients (and it looks like we have had similar case with 6.3), which have several nfs mounts to the same CentOS 5.3 NFS server (mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6), at some moment the access to one of the NFS mount gets stuck, while the access to the other mounts works ok. In all cases we have been observed so far the first gotten stuck process was php script (or two) that was (were) writing to logs file (appending). In tcpdump we see that every write to the file causes the sequence of the following rpc: ACCESS - READ - WRITE - COMMIT. And at some moment this stops after READ rpc call and successful reply. After this in tcpdump successful readdir/access/lookup/fstat calls are observed from our other utilities, which just check the presence of some files and they work ok (df also works). The php process at this state is in bo_wwait invalidating buffer cache [1]. If at this time we try accessing the share with mc then it hangs acquiring the vn_lock held by php process [2] and after this any operations with this NFS share hang (df hangs too). If instead some other process is started that writes to some other file on this share (append) then the first process "unfreezes" too (starting from WRITE rpc, so there is no any retransmits). With my limited knowledge of this complicated kernel subsystem I have the following hypothesis what is going on. On some of the nfs_write() it does successful ACCESS - READ rpcs but by some reason does not call WRITE to flush dirty buffer to the server (aborts somewere or may be in bdwrite() which calls bd_wakeup() and actually bd_wakeup considers that we don't have enough dirty buffers?). But it looks like on this stage the buffer appears to be unlinked from bufqueues [3] so when bufdaemon runs it does not flush the buffer. The next write() call to this file causes the process to get stuck invalidating the dirty buffer. The buffer is accessible by nfsiod via nmp structure [3] and when the next process is writing to another file, nfsiod is started and flushes this dirty buffer. [1]: Gotten stuck php process: (kgdb) bt #0 sched_switch (td=0xc839e000, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f460c in sleepq_catch_signals (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc07f4ebd in sleepq_wait_sig (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xc07cb047 in _sleep (ident=0xc90c9ee8, lock=0xc90c9e8c, priority=333, wmesg=0xc0b731ed "bo_wwait", timo=0) at /usr/src/sys/kern/kern_synch.c:224 #6 0xc0827295 in bufobj_wwait (bo=0xc90c9ec4, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870 #7 0xc0966307 in nfs_flush (vp=0xc90c9e04, waitfor=1, td=0xc839e000, commit=1) at /usr/src/sys/nfsclient/nfs_vnops.c:2989 #8 0xc09667ce in nfs_fsync (ap=0xed3c38ec) at /usr/src/sys/nfsclient/nfs_vnops.c:2725 #9 0xc0aee5d2 in VOP_FSYNC_APV (vop=0xc0c2b920, a=0xed3c38ec) at vnode_if.c:1007 #10 0xc0827864 in bufsync (bo=0xc90c9ec4, waitfor=1, td=0xc839e000) at vnode_if.h:538 #11 0xc083f354 in bufobj_invalbuf (bo=0xc90c9ec4, flags=1, td=0xc839e000, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1066 #12 0xc083f6e2 in vinvalbuf (vp=0xc90c9e04, flags=1, td=0xc839e000, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1142 #13 0xc094f216 in nfs_vinvalbuf (vp=0xc90c9e04, flags=Variable "flags" is not available. ) at /usr/src/sys/nfsclient/nfs_bio.c:1326 #14 0xc0951825 in nfs_write (ap=0xed3c3bc4) at /usr/src/sys/nfsclient/nfs_bio.c:918 #15 0xc0aef956 in VOP_WRITE_APV (vop=0xc0c2b920, a=0xed3c3bc4) at vnode_if.c:691 #16 0xc0850097 in vn_write (fp=0xc9969b48, uio=0xed3c3c60, active_cred=0xcb901600, flags=0, td=0xc839e000) at vnode_if.h:373 #17 0xc07f9d17 in dofilewrite (td=0xc839e000, fd=6, fp=0xc9969b48, auio=0xed3c3c60, offset=-1, flags=0) at file.h:256 #18 0xc07f9ff8 in kern_writev (td=0xc839e000, fd=6, auio=0xed3c3c60) at /usr/src/sys/kern/sys_generic.c:401 #19 0xc07fa06f in write (td=0xc839e000, uap=0xed3c3cfc) at /usr/src/sys/kern/sys_generic.c:317 #20 0xc0ad9c75 in syscall (frame=0xed3c3d38) at /usr/src/sys/i386/i386/trap.c:1090 #21 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) [2] mc process gotten stuck acquiring _vn_lock held by above php thread: at /usr/src/sys/kern/sched_ule.c:1944 * 178 Thread 100340 (PID=40443: mc) sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not availabl . (kgdb) thr 178 [Switching to thread 178 (Thread 100340)]#0 sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 1944 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f4946 in sleepq_wait (wchan=0xc90c9e5c) at /usr/src/sys/kern/subr_sleepqueue.c:580 #4 0xc07cb056 in _sleep (ident=0xc90c9e5c, lock=0xc0c77d18, priority=80, wmesg=0xc0b80b92 "nfs", timo=0) at /usr/src/sys/kern/kern_synch.c:226 #5 0xc07adf5a in acquire (lkpp=0xed56b7f0, extflags=Variable "extflags" is not available. ) at /usr/src/sys/kern/kern_lock.c:151 #6 0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384 #7 0xc0832470 in vop_stdlock (ap=0xed56b840) at /usr/src/sys/kern/vfs_default.c:305 #8 0xc0aef4f6 in VOP_LOCK1_APV (vop=0xc0c1d5c0, a=0xed56b840) at vnode_if.c:1618 #9 0xc084ed86 in _vn_lock (vp=0xc90c9e04, flags=8194, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at vnode_if.h:851 #10 0xc0841d84 in vget (vp=0xc90c9e04, flags=8194, td=0xc9810af0) at /usr/src/sys/kern/vfs_subr.c:2061 #11 0xc08355b3 in vfs_hash_get (mp=0xc6b472cc, hash=3326873010, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/vfs_hash.c:81 #12 0xc09534d4 in nfs_nget (mntp=0xc6b472cc, fhp=0xc97be078, fhsize=20, npp=0xed56b9f0, flags=2) at /usr/src/sys/nfsclient/nfs_node.c:120 #13 0xc0964a05 in nfs_lookup (ap=0xed56ba84) at /usr/src/sys/nfsclient/nfs_vnops.c:947 #14 0xc0aefbe6 in VOP_LOOKUP_APV (vop=0xc0c2b920, a=0xed56ba84) at vnode_if.c:99 #15 0xc0836841 in lookup (ndp=0xed56bb48) at vnode_if.h:57 #16 0xc083756f in namei (ndp=0xed56bb48) at /usr/src/sys/kern/vfs_lookup.c:219 #17 0xc0844fef in kern_lstat (td=0xc9810af0, path=0x48611280 <Address 0x48611280 out of bounds>, pathseg=UIO_USERSPACE, sbp=0xed56bc18) at /usr/src/sys/kern/vfs_syscalls.c:2169 #18 0xc08451af in lstat (td=0xc9810af0, uap=0xed56bcfc) at /usr/src/sys/kern/vfs_syscalls.c:2152 #19 0xc0ad9c75 in syscall (frame=0xed56bd38) at /usr/src/sys/i386/i386/trap.c:1090 #20 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #21 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 6 #6 0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384 384 error = acquire(&lkp, extflags, (LK_HAVE_EXCL | LK_WANT_EXCL), &contested, &waitstart); (kgdb) p *lkp $2 = {lk_object = {lo_name = 0xc0b80b92 "nfs", lo_type = 0xc0b80b92 "nfs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77d18, lk_flags = 33816640, lk_sharecount = 0, lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xc839e000, lk_newlock = 0x0} [3] struct nfsmount of the "problem" share: (kgdb) p *nmp $4 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, nm_state = 1310720, nm_mountp = 0xc6b472cc, nm_numgrps = 16, nm_fh = "\001\000\000\000\000\223\000\000\001@\003\n", '\0' <repeats 115 times>, nm_fhsize = 12, nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc6e81d00, nm_sotype = 1, nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc6948640, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, nm_timeouts = 0, nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, nm_verf = "J??W\000\004o?", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 0xda8058e0}, nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = { rpcresid = 0, flags = 1, sock_send_inprog = 0}, nm_hostname = "172.30.10.92\000/var/www/app31", '\0' <repeats 60 times>, nm_clientid = 0, nm_fsid = { val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0} buffers on it: (kgdb) p *nmp->nm_bufq.tqh_first $7 = {b_bufobj = 0xc7324960, b_bcount = 31565, b_caller1 = 0x0, b_data = 0xde581000 " valid_lines:", ' ' <repeats 29 times>, "1341\n invalid_lines:", ' ' <repeats 27 times>, "1556\n total_lines:", ' ' <repeats 29 times>, "2897\n\n Error summary:\n Inactive pr"..., b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0', b_iooffset = 196608, b_resid = 0, b_iodone = 0, b_blkno = 384, b_offset = 196608, b_bobufs = {tqe_next = 0x0, tqe_prev = 0xc7324964}, b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = { tqe_next = 0xda805894, tqe_prev = 0xc725d3c0}, b_qindex = 0, b_flags = 536870948, b_xflags = 2 '\002', b_lock = {lk_object = {lo_name = 0xc0b73635 "bufwait", lo_type = 0xc0b73635 "bufwait", lo_flags = 70844416, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77b50, lk_flags = 262144, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 0, lk_lockholder = 0xfffffffe, lk_newlock = 0x0}, b_bufsize = 31744, b_runningbufspace = 0, b_kvabase = 0xde581000 " valid_lines:", ' ' <repeats 29 times>, "1341\n invalid_lines:", ' ' <repeats 27 times>, "1556\n total_lines:", ' ' <repeats 29 times>, "2897\n\n Error summary:\n Inactive pr"..., b_kvasize = 32768, b_lblkno = 6, b_vp = 0xc73248a0, b_dirtyoff = 31512, b_dirtyend = 31565, b_rcred = 0x0, b_wcred = 0xcebec400, b_saveaddr = 0xde581000, b_pager = { pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xda917ec8, tqh_last = 0xda888e94}, cluster_entry = {tqe_next = 0xda917ec8, tqe_prev = 0xda888e94}}, b_pages = {0xc3726e90, 0xc448dca8, 0xc2a55b98, 0xc3bf1a28, 0xc3467ff0, 0xc3299600, 0xc28db130, 0xc2301398, 0x0 <repeats 24 times>}, b_npages = 8, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, b_pin_count = 0} These are entires from our log file. Note that b_qindex is 0. But bufqueues[0] is empty: (kgdb) p bufqueues[0] $8 = {tqh_first = 0x0, tqh_last = 0xc0c83e20} Also does not it look strange that lk_lockholder of b_lock points to innvalid location (0xfffffffe)? -- Mikolaj Golub
On Tue, 19 Jan 2010 10:02:57 +0200 Mikolaj Golub wrote:> I have found in the Internet that other people have been observed the similar > problem with FreeBSD6.2 client: > > http://forums.freebsd.org/showthread.php?t=1697Reading this through carefully it looks like the guy did not experience the problem (gotten stuck processes). He just described the behaviour of freebsd client when appending the file. -- Mikolaj Golub
On Tue, 19 Jan 2010 10:02:57 +0200 Mikolaj Golub wrote:> So, on some of our freebsd7.1 nfs clients (and it looks like we have had > similar case with 6.3), which have several nfs mounts to the same CentOS 5.3 > NFS server (mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6), at > some moment the access to one of the NFS mount gets stuck, while the access to > the other mounts works ok. > > In all cases we have been observed so far the first gotten stuck process was > php script (or two) that was (were) writing to logs file (appending). In > tcpdump we see that every write to the file causes the sequence of the > following rpc: ACCESS - READ - WRITE - COMMIT. And at some moment this stops > after READ rpc call and successful reply. > > After this in tcpdump successful readdir/access/lookup/fstat calls are > observed from our other utilities, which just check the presence of some files > and they work ok (df also works). The php process at this state is in bo_wwait > invalidating buffer cache [1]. > > If at this time we try accessing the share with mc then it hangs acquiring the > vn_lock held by php process [2] and after this any operations with this NFS > share hang (df hangs too). > > If instead some other process is started that writes to some other file on > this share (append) then the first process "unfreezes" too (starting from > WRITE rpc, so there is no any retransmits).So it looks for me that the problem here is that eventually problem nfsmount ends up in this state: (kgdb) p *nmp $1 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, nm_state = 1310720, nm_mountp = 0xc6b472cc, nm_numgrps = 16, nm_fh = "\001\000\000\000\000\223\000\000\001@\003\n", '\0' <repeats 115 times>, nm_fhsize = 12, nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc6e81d00, nm_sotype = 1, nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc6948640, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, nm_timeouts = 0, nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, nm_verf = "J??W\000\004o?", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 0xda8058e0}, nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = { rpcresid = 0, flags = 1, sock_send_inprog = 0}, nm_hostname = "172.30.10.92\000/var/www/app31", '\0' <repeats 60 times>, nm_clientid = 0, nm_fsid = { val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0} We have nonempty nm_bufq, nm_bufqiods = 1, but actually there is no nfsiod thread run for this mount, which is wrong -- nm_bufq will not be emptied until some other process starts writing to the nfsmount and starts nfsiod thread for this mount. Reviewing the code how it could happen I see the following path. Could someone confirm or disprove me? in nfs_bio.c:nfs_asyncio() we have: 1363 mtx_lock(&nfs_iod_mtx); ... 1374 /* 1375 * Find a free iod to process this request. 1376 */ 1377 for (iod = 0; iod < nfs_numasync; iod++) 1378 if (nfs_iodwant[iod]) { 1379 gotiod = TRUE; 1380 break; 1381 } 1382 1383 /* 1384 * Try to create one if none are free. 1385 */ 1386 if (!gotiod) { 1387 iod = nfs_nfsiodnew(); 1388 if (iod != -1) 1389 gotiod = TRUE; 1390 } Let's consider situation when new nfsiod is created. nfs_nfsiod.c:nfs_nfsiodnew() before creating nfssvc_iod thread unlocks nfs_iod_mtx: 179 mtx_unlock(&nfs_iod_mtx); 180 error = kthread_create(nfssvc_iod, nfs_asyncdaemon + i, NULL, RFHIGHPID, 181 0, "nfsiod %d", newiod); 182 mtx_lock(&nfs_iod_mtx); And nfs_nfsiod.c:nfssvc_iod() do the followin: 226 mtx_lock(&nfs_iod_mtx); ... 238 nfs_iodwant[myiod] = curthread->td_proc; 239 nfs_iodmount[myiod] = NULL; ... 244 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT | PCATCH, 245 "-", timo); Let's at this moment another nfs_asyncio() request for another nfsmount has happened and this thread has locked nfs_iod_mtx. Then this thread will found nfs_iodwant[iod] in "for" loop and will use it. When the first thread actually has returned from nfs_nfsiodnew() it will insert buffer to nmp->nm_bufq but nfsiod will process other nmp. It looks like the fix for this situation would be to check nfs_iodwant[iod] after nfs_nfsiodnew(): --- nfs_bio.c.orig 2010-01-22 15:38:02.000000000 +0000 +++ nfs_bio.c 2010-01-22 15:39:58.000000000 +0000 @@ -1385,7 +1385,7 @@ again: */ if (!gotiod) { iod = nfs_nfsiodnew(); - if (iod != -1) + if ((iod != -1) && (nfs_iodwant[iod] == NULL)) gotiod = TRUE; } Described here scenario could be our case. We have 7 nfs mounts on the problem host. And by cront at the same time one or two scripts for every mount were started. So we had something like this in top (cron tasks started at 23:02): last pid: 64884; load averages: 0.28, 0.34, 0.24 up 0+22:15:41 23:02:04 300 processes: 6 running, 259 sleeping, 1 stopped, 17 zombie, 17 waiting CPU: 10.2% user, 0.0% nice, 7.6% system, 1.0% interrupt, 81.2% idle Mem: 174M Active, 2470M Inact, 221M Wired, 136M Cache, 112M Buf, 251M Free Swap: 8192M Total, 8192M Free 64793 app12 -1 0 23352K 11980K nfsreq 0 0:00 1.07% php 64789 app16 -1 0 21304K 11084K nfsreq 0 0:00 0.98% php 64784 app16 -1 0 19256K 9696K nfsreq 2 0:00 0.88% php 64768 app20 -1 0 19256K 9300K nfsreq 0 0:00 0.78% php 64759 app20 -1 0 18232K 8888K nfsreq 1 0:00 0.78% php 64722 app31 -1 0 20280K 9956K nfsreq 0 0:00 0.68% php 64781 app18 -1 0 19256K 9412K nfsreq 3 0:00 0.68% php 64778 app26 -1 0 18232K 8840K nfsreq 1 0:00 0.68% php 64800 app8 -1 0 18232K 8664K nfsreq 3 0:00 0.68% php 64728 app31 -1 0 18232K 8752K nfsreq 0 0:00 0.59% php 64795 app18 -1 0 18232K 8676K nfsreq 1 0:00 0.59% php 64777 app22 -1 0 18232K 8984K nfsreq 0 0:00 0.49% php 2342 app31 -4 0 22236K 7780K nfs 1 0:13 0.00% icoms_agent_cox215 58920 root 8 - 0K 8K - 2 0:08 0.00% nfsiod 0 2334 app31 -4 0 18908K 6356K nfs 1 0:05 0.00% icoms_agent_cox001 64297 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 1 64298 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 2 64303 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 3 64874 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 12 64870 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 9 64866 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 4 64873 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 11 64867 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 5 64869 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 8 64872 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 10 64868 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 7 64871 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 6 last pid: 64967; load averages: 0.42, 0.37, 0.25 up 0+22:15:46 23:02:09 295 processes: 7 running, 251 sleeping, 1 stopped, 19 zombie, 17 waiting CPU: 69.1% user, 0.0% nice, 8.3% system, 1.5% interrupt, 21.1% idle Mem: 376M Active, 2488M Inact, 226M Wired, 124M Cache, 106M Buf, 37M Free Swap: 8192M Total, 8192M Free 64793 app12 99 0 86840K 59968K CPU3 3 0:02 16.55% php 64768 app20 -1 0 57144K 38424K nfsreq 1 0:02 15.19% php 64722 app31 99 0 61240K 41228K CPU0 0 0:02 15.19% php 64781 app18 -1 0 54072K 35612K nfsreq 2 0:02 13.67% php 64789 app16 -1 0 48952K 31660K nfsreq 3 0:01 10.60% php 64777 app22 -1 0 43832K 27876K nfsreq 0 0:01 9.86% php 64784 app16 -1 0 45880K 29648K nfsreq 0 0:01 9.77% php 64759 app20 -7 0 36664K 22792K bo_wwa 0 0:01 8.25% php 64800 app8 -7 0 24376K 13596K bo_wwa 1 0:01 2.39% php 64795 app18 -7 0 23352K 12788K bo_wwa 3 0:00 1.37% php 58920 root -1 - 0K 8K nfsreq 2 0:08 0.00% nfsiod 0 64303 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 3 64866 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 4 64297 root -1 - 0K 8K nfsreq 2 0:00 0.00% nfsiod 1 64298 root -1 - 0K 8K nfsreq 3 0:00 0.00% nfsiod 2 64873 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 11 64868 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 7 64867 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 5 64871 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 6 64947 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 14 64950 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 17 64870 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 9 64869 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 8 64949 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 16 64874 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 12 64872 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 10 64952 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 19 64948 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 15 64951 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 18 64946 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 13 last pid: 64968; load averages: 0.54, 0.39, 0.26 up 0+22:15:51 23:02:14 289 processes: 7 running, 243 sleeping, 1 stopped, 21 zombie, 17 waiting CPU: 28.7% user, 0.0% nice, 8.8% system, 1.1% interrupt, 61.4% idle Mem: 404M Active, 2503M Inact, 224M Wired, 83M Cache, 107M Buf, 37M Free Swap: 8192M Total, 8192M Free 64793 app12 -1 0 148M 106M nfsreq 1 0:07 41.55% php 64722 app31 -7 0 61240K 41232K bo_wwa 1 0:03 14.26% php 64768 app20 -7 0 57144K 38424K bo_wwa 3 0:03 13.67% php 64781 app18 -7 0 54072K 35612K bo_wwa 0 0:02 11.18% php 64789 app16 -7 0 48952K 31660K bo_wwa 0 0:02 7.96% php 64784 app16 -7 0 45880K 29648K bo_wwa 0 0:02 7.76% php 64777 app22 -7 0 43832K 27876K bo_wwa 0 0:01 6.40% php 64759 app20 -7 0 36664K 22792K bo_wwa 0 0:01 4.79% php 58920 root -1 - 0K 8K nfsreq 2 0:08 0.00% nfsiod 0 64867 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 5 64873 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 11 64303 root -1 - 0K 8K nfsreq 0 0:00 0.00% nfsiod 3 64866 root -1 - 0K 8K nfsreq 1 0:00 0.00% nfsiod 4 64297 root -1 - 0K 8K nfsreq 0 0:00 0.00% nfsiod 1 64298 root -1 - 0K 8K nfsreq 3 0:00 0.00% nfsiod 2 64871 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 6 64868 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 7 64869 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 8 64947 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 14 64872 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 10 64874 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 12 64870 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 9 64949 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 16 64950 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 17 64948 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 15 64951 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 18 64946 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 13 64952 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 19 last pid: 64969; load averages: 0.50, 0.39, 0.25 up 0+22:15:56 23:02:19 269 processes: 6 running, 219 sleeping, 1 stopped, 26 zombie, 17 waiting CPU: 11.9% user, 0.0% nice, 5.8% system, 0.8% interrupt, 81.5% idle Mem: 264M Active, 2504M Inact, 232M Wired, 83M Cache, 112M Buf, 169M Free Swap: 8192M Total, 8192M Free 64793 app12 -1 0 148M 106M nfsreq 3 0:08 33.69% php 64789 app16 -7 0 48952K 31660K bo_wwa 0 0:02 4.98% php 64784 app16 -7 0 45880K 29648K bo_wwa 0 0:02 4.88% php 58920 root 8 - 0K 8K - 1 0:08 0.20% nfsiod 0 64867 root 8 - 0K 8K - 2 0:00 0.10% nfsiod 5 64303 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 3 64297 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 1 64873 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 11 64866 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 4 64871 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 6 64298 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 2 64868 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 7 64869 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 8 64947 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 14 64872 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 10 64874 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 12 64870 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 9 64949 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 16 64950 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 17 64948 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 15 64951 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 18 64946 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 13 64952 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 19 last pid: 64970; load averages: 0.46, 0.38, 0.25 up 0+22:16:02 23:02:25 263 processes: 5 running, 212 sleeping, 1 stopped, 28 zombie, 17 waiting CPU: 8.7% user, 0.0% nice, 3.1% system, 0.3% interrupt, 87.9% idle Mem: 160M Active, 2502M Inact, 232M Wired, 83M Cache, 112M Buf, 274M Free Swap: 8192M Total, 8192M Free 64789 app16 -7 0 48952K 31660K bo_wwa 0 0:02 3.27% php 64784 app16 -7 0 45880K 29648K bo_wwa 0 0:02 3.17% php 58920 root 8 - 0K 8K - 1 0:08 0.00% nfsiod 0 64867 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 5 64303 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 3 64297 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 1 64873 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 11 64866 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 4 64871 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 6 64298 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 2 64868 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 7 64869 root 8 - 0K 8K - 0 0:00 0.00% nfsiod 8 64947 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 14 64872 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 10 64874 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 12 64870 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 9 64949 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 16 64950 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 17 64948 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 15 64951 root 8 - 0K 8K - 3 0:00 0.00% nfsiod 18 64946 root 8 - 0K 8K - 2 0:00 0.00% nfsiod 13 64952 root 8 - 0K 8K - 1 0:00 0.00% nfsiod 19 And this two php processes were hanged until 23:05 another process started that write to another file on this nfs mount. -- Mikolaj Golub