Tim Chen
2008-Sep-15 15:57 UTC
Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000
Currently I was running a mail server using a netapp filer as backend storage.>From time to time, the whole system get stuck and lasted for 3-5 minutes.But after that, everything recovers normally. During the "stuck" moment, using ps auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D" status. The command df certainly does not reponse either. System configuration: 1. NFS server: NetApp FAS3020 2. NFS client: acting as a smtp/pop3/imap server. freebsd 7.0-stable (almost 7.1-prelease) hardware: IBM x3550 server network interface: bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem 0xc8000000-0xc9ffffff irq 18 at device 0.0 on pci4 miibus0: <MII bus> on bce0 brgphy0: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bce1: Ethernet address: 00:1a:64:--:--:-- bce1: [ITHREAD] bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W (0x04000305); Flags( MFW MSI ) ifconfig: bce1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4> ether 00:1a:64:--:--:-- inet 192.168.1.166 netmask 0xffffff00 broadcast 192.168.1.255 media: Ethernet autoselect (1000baseTX <full-duplex>) status: active software: postfix 2.5.4 courier-imap 4.4.1 maildrop 2.0.4 After further investigation, I found that the situation is most severe when nfs over tcp and using mtu 9000. If nfs mount is changed to either (over udp and mtu 9000) or (over tcp and mtu 1500), things get significantly improvement. The frequency of "suddenly hang" decreases from every 10-15 min to several hours. Another observation is the "freeze" happens more frequently when server load is high, especially working hours. So I believed it is tightly related to server load (or nfs load). I tried to modify the source code of MDA (maildrop) and adding some debug code to identify the problem. What I found is: 1) MDA processing time always approximate 0 sec or < 1 sec when things work normally. 2) MDA processing time may up to 30 second when system got stuck. If the incoming email continues to come, later emails may cost up to 200 second to complete. At this time, using ps auxw shows MDAs were in "D" status. 3) Detail trace shows the processing time spent were waiting around the fcntl (lock) and stat(fstat) code. One more thing to note: I've tried to turn on and off rpc.statd,rpc.lockd, -L mount, even compile NFSLOCKD in kernel. All were in vain, things still got stuck when using NFS over TCP with mtu 9000. We have already lots of mail servers whose hardware were different and OS is freebsd 6-stable. Softwares were all the same but with prior version. Those servers didn't show any of the above strange behavior. Based on all of the above experiment and observation, I guess there might be something wrong with: 1) NFS or network stack of freebsd 7 2) fcntl/stat over NFS 3) bce driver Need your help/suggestion to solve the problem! Thanks very much. Sincerely, Tim Chen
John Baldwin
2008-Sep-15 20:49 UTC
Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000
On Monday 15 September 2008 11:57:02 am Tim Chen wrote:> Currently I was running a mail server using a netapp filer as backend > storage. > >From time to time, the whole system get stuck and lasted for 3-5 minutes. > But > after that, everything recovers normally. During the "stuck" moment, using > ps > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D" > status. > The command df certainly does not reponse either.Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck threads when they hang? If it is "lockf", then make sure you have an up-to-date RELENG_6 kernel as there was a recent fix for a "lockf" hang. Alternatively, if things are stuck in "nfsreq", it may be useful to use tcpdump to look at the NFS requests your client is making. nfsstat can also be useful as you can see which counters are increasing during a hang. -- John Baldwin