Hi All, I am looking for advice on how to cure a constantly-crashing NFS server which crashes every few hours, or at least, every few days. The kernel log file (below) points toward NFS as a likely cause. The system disk is a 3ware 8000 series RAID1 mirror. The data disk is using a 3Ware 9000 controller to produce two RAID1 devices; these are then striped (RAID0) in software to form a RAID 10 device. We're using a 2.6 kernel, xfs filesystem, and NFS3/UDP. We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel has xfs extensions, and we're running the xfs filesystem for /home (obtained from CentOS website). In "lsmod" I see both 3w_xxxx and 3w_9xxx modules. NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server, async clients, noac. This system has been serving /home in this configuration since October 2005; we've seen it crash rarely, but uptimes were usually on the order of months. This past week, it can't seem to remain up for much longer than about a day. Kernel log file containing the crash: Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 12 05:52:03 tier2-home kernel: printing eip: Feb 12 05:52:03 tier2-home kernel: 00000000 Feb 12 05:52:03 tier2-home kernel: *pde = f561f067 Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1] Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd exportfs lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror dm_mod button battery ac uhci_hcd shpchp e100 0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod Feb 12 05:52:03 tier2-home kernel: CPU: 0 Feb 12 05:52:03 tier2-home kernel: EIP: 0060:[<00000000>] Not tainted VLI Feb 12 05:52:03 tier2-home kernel: EFLAGS: 00010286 (2.6.9-22.0.1.106.unsupportedsmp) Feb 12 05:52:03 tier2-home kernel: EIP is at 0x0 Feb 12 05:52:03 tier2-home kernel: eax: e48ec050 ebx: c040a260 ecx: 00000000 edx: ecf89344 Feb 12 05:52:03 tier2-home kernel: esi: ecf89344 edi: f4ad4f00 ebp: 00000000 esp: f4ad4ee4 Feb 12 05:52:03 tier2-home kernel: ds: 007b es: 007b ss: 0068 Feb 12 05:52:03 tier2-home kernel: Process nfsd (pid: 2839, threadinfo=f4ad4000 task=f4afd6b0) Feb 12 05:52:03 tier2-home kernel: Stack: c01649d7 e48ec050 ffffffff c3e300b3 0028fd9d ecf89c2c c0164a4b 0028fd9d Feb 12 05:52:03 tier2-home kernel: 00000003 c3e300b0 ecf89c2c e48ec0c0 e48ec050 ecf89c2c f4ac6804 f8c8bdf3 Feb 12 05:52:03 tier2-home kernel: c3e300b0 c31d6a00 f7dc1700 c31d6a00 f89902e7 f4ab6000 f4ac6800 f4ac69d4 Feb 12 05:52:03 tier2-home kernel: Call Trace: Feb 12 05:52:03 tier2-home kernel: [<c01649d7>] __lookup_hash+0x70/0x89 Feb 12 05:52:03 tier2-home kernel: [<c0164a4b>] lookup_one_len+0x54/0x63 Feb 12 05:52:03 tier2-home kernel: [<f8c8bdf3>] nfsd_lookup+0x31c/0x3a8 [nfsd] Feb 12 05:52:03 tier2-home kernel: [<f89902e7>] svcauth_unix_set_client+0xa7/0xb5 [sunrpc] Feb 12 05:52:03 tier2-home kernel: [<f8c93351>] nfsd3_proc_lookup+0xa9/0xb3 [nfsd] Feb 12 05:52:03 tier2-home kernel: [<f8c952aa>] nfs3svc_decode_diropargs+0x0/0xfa [nfsd] Feb 12 05:52:03 tier2-home kernel: [<f8c896a2>] nfsd_dispatch+0xba/0x170 [nfsd] Feb 12 05:52:03 tier2-home kernel: [<f898d459>] svc_process+0x41b/0x6ce [sunrpc] Feb 12 05:52:03 tier2-home kernel: [<f8c89482>] nfsd+0x1cc/0x332 [nfsd] Feb 12 05:52:03 tier2-home kernel: [<f8c892b6>] nfsd+0x0/0x332 [nfsd] Feb 12 05:52:03 tier2-home kernel: [<c0104205>] kernel_thread_helper+0x5/0xb Feb 12 05:52:03 tier2-home kernel: Code: Bad EIP value. Feb 12 05:52:03 tier2-home kernel: <0>Fatal exception: panic in 5 seconds Feb 13 08:50:06 tier2-home kernel: klogd 1.4.1, log source = /proc/kmsg started. Feb 13 08:50:06 tier2-home kernel: Linux version 2.6.9-22.0.1.106.unsupportedsmp (buildcentos at louisa.home.local) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)) #1 SMP Sun Nov 6 1 3:58:14 CST 2005 Feb 13 08:50:06 tier2-home kernel: BIOS-provided physical RAM map: ...and the reboot continues normally.
Andrew Zahn wrote:> Hi All, > > I am looking for advice on how to cure a constantly-crashing NFS > server which crashes every few hours, or at least, every few days. The > kernel log file (below) points toward NFS as a likely cause. > > The system disk is a 3ware 8000 series RAID1 mirror. The data disk is > using a 3Ware 9000 controller to produce two RAID1 devices; these are > then striped (RAID0) in software to form a RAID 10 device. We're > using a 2.6 kernel, xfs filesystem, and NFS3/UDP. > > We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel > has xfs extensions, and we're running the xfs filesystem for /home > (obtained from CentOS website). > > In "lsmod" I see both 3w_xxxx and 3w_9xxx modules. > > NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server, > async clients, noac. > > This system has been serving /home in this configuration since October > 2005; we've seen it crash rarely, but uptimes were usually on the > order of months. This past week, it can't seem to remain up for much > longer than about a day. > > Kernel log file containing the crash: > > Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL > pointer dereference at virtual address 00000000 > Feb 12 05:52:03 tier2-home kernel: printing eip: > Feb 12 05:52:03 tier2-home kernel: 00000000 > Feb 12 05:52:03 tier2-home kernel: *pde = f561f067 > Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1] > Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP > Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd > exportfs lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror > dm_mod button battery ac uhci_hcd shpchp e100 > 0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod > Feb 12 05:52:03 tier2-home kernel: CPU: 0<snip> Hmmm...does it also crash if you run a non-SMP kernel? Did you update the kernel around the same time as the instability began? Cheers,
On Monday 13 February 2006 22:44, Andrew Zahn wrote:> Hi All, > > I am looking for advice on how to cure a constantly-crashing NFS server > which crashes every few hours, or at least, every few days. The kernel > log file (below) points toward NFS as a likely cause. > > The system disk is a 3ware 8000 series RAID1 mirror. The data disk is > using a 3Ware 9000 controller to produce two RAID1 devices; these are > then striped (RAID0) in software to form a RAID 10 device. We're using > a 2.6 kernel, xfs filesystem, and NFS3/UDP. > > We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel. This kernel > has xfs extensions, and we're running the xfs filesystem for /homeOne thing to consider is that the xfs module in current centosplus kernels is the same as kernel.org 2.6.9, that is, ancient. I never got 2.6.9 xfs stable for non-trivial loads and configurations. /Peter> (obtained from CentOS website). > > In "lsmod" I see both 3w_xxxx and 3w_9xxx modules. > > NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server, > async clients, noac. > > This system has been serving /home in this configuration since October > 2005; we've seen it crash rarely, but uptimes were usually on the order > of months. This past week, it can't seem to remain up for much longer > than about a day. > > Kernel log file containing the crash: > ...-- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20060214/b240b493/attachment-0002.sig>
Reasonably Related Threads
- df command shows transport endpoint mount error on gluster client v.3.10.5 + core dump
- df command shows transport endpoint mount error on gluster client v.3.10.5 + core dump
- df command shows transport endpoint mount error on gluster client v.3.10.5 + core dump
- df command shows transport endpoint mount error on gluster client v.3.10.5 + core dump
- df command shows transport endpoint mount error on gluster client v.3.10.5 + core dump