thr3ads.net - CentOS - [CentOS] Help with disk server stability issues [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Andrew Zahn

2006-Feb-13 21:44 UTC

[CentOS] Help with disk server stability issues

Hi All,

I am looking for advice on how to cure a constantly-crashing NFS server 
which crashes every few hours, or at least, every few days. The kernel 
log file (below) points toward NFS as a likely cause.

The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
using a 3Ware 9000 controller to produce two RAID1 devices; these are 
then striped (RAID0) in software to form a RAID 10 device.  We're using 
a 2.6 kernel, xfs filesystem, and NFS3/UDP.

We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel.  This kernel 
has xfs extensions, and we're running the xfs filesystem for /home 
(obtained from CentOS website).

In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.

NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server, 
async clients, noac.

This system has been serving /home in this configuration since October 
2005; we've seen it crash rarely, but uptimes were usually on the order 
of months.  This past week, it can't seem to remain up for much longer 
than about a day.

Kernel log file containing the crash:

Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL pointer 
dereference at virtual address 00000000
Feb 12 05:52:03 tier2-home kernel:  printing eip:
Feb 12 05:52:03 tier2-home kernel: 00000000
Feb 12 05:52:03 tier2-home kernel: *pde = f561f067
Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1]
Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP
Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd exportfs 
lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror dm_mod 
button battery ac uhci_hcd shpchp e100
0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod
Feb 12 05:52:03 tier2-home kernel: CPU:    0
Feb 12 05:52:03 tier2-home kernel: EIP:    0060:[<00000000>]    Not 
tainted VLI
Feb 12 05:52:03 tier2-home kernel: EFLAGS: 00010286 
(2.6.9-22.0.1.106.unsupportedsmp)
Feb 12 05:52:03 tier2-home kernel: EIP is at 0x0
Feb 12 05:52:03 tier2-home kernel: eax: e48ec050   ebx: c040a260   ecx: 
00000000   edx: ecf89344
Feb 12 05:52:03 tier2-home kernel: esi: ecf89344   edi: f4ad4f00   ebp: 
00000000   esp: f4ad4ee4
Feb 12 05:52:03 tier2-home kernel: ds: 007b   es: 007b   ss: 0068
Feb 12 05:52:03 tier2-home kernel: Process nfsd (pid: 2839, 
threadinfo=f4ad4000 task=f4afd6b0)
Feb 12 05:52:03 tier2-home kernel: Stack: c01649d7 e48ec050 ffffffff 
c3e300b3 0028fd9d ecf89c2c c0164a4b 0028fd9d
Feb 12 05:52:03 tier2-home kernel:        00000003 c3e300b0 ecf89c2c 
e48ec0c0 e48ec050 ecf89c2c f4ac6804 f8c8bdf3
Feb 12 05:52:03 tier2-home kernel:        c3e300b0 c31d6a00 f7dc1700 
c31d6a00 f89902e7 f4ab6000 f4ac6800 f4ac69d4
Feb 12 05:52:03 tier2-home kernel: Call Trace:
Feb 12 05:52:03 tier2-home kernel:  [<c01649d7>] __lookup_hash+0x70/0x89
Feb 12 05:52:03 tier2-home kernel:  [<c0164a4b>] lookup_one_len+0x54/0x63
Feb 12 05:52:03 tier2-home kernel:  [<f8c8bdf3>] nfsd_lookup+0x31c/0x3a8 
[nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<f89902e7>] 
svcauth_unix_set_client+0xa7/0xb5 [sunrpc]
Feb 12 05:52:03 tier2-home kernel:  [<f8c93351>] 
nfsd3_proc_lookup+0xa9/0xb3 [nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<f8c952aa>] 
nfs3svc_decode_diropargs+0x0/0xfa [nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<f8c896a2>] 
nfsd_dispatch+0xba/0x170 [nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<f898d459>] svc_process+0x41b/0x6ce 
[sunrpc]
Feb 12 05:52:03 tier2-home kernel:  [<f8c89482>] nfsd+0x1cc/0x332 [nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<f8c892b6>] nfsd+0x0/0x332 [nfsd]
Feb 12 05:52:03 tier2-home kernel:  [<c0104205>] 
kernel_thread_helper+0x5/0xb
Feb 12 05:52:03 tier2-home kernel: Code:  Bad EIP value.
Feb 12 05:52:03 tier2-home kernel:  <0>Fatal exception: panic in 5 seconds
Feb 13 08:50:06 tier2-home kernel: klogd 1.4.1, log source = /proc/kmsg 
started.
Feb 13 08:50:06 tier2-home kernel: Linux version 
2.6.9-22.0.1.106.unsupportedsmp (buildcentos at louisa.home.local) (gcc 
version 3.4.4 20050721 (Red Hat 3.4.4-2)) #1 SMP Sun Nov 6 1
3:58:14 CST 2005
Feb 13 08:50:06 tier2-home kernel: BIOS-provided physical RAM map:

...and the reboot continues normally.

Chris Mauritz

2006-Feb-13 22:08 UTC

head link

[CentOS] Help with disk server stability issues

Andrew Zahn wrote:> Hi All,
>
> I am looking for advice on how to cure a constantly-crashing NFS 
> server which crashes every few hours, or at least, every few days. The 
> kernel log file (below) points toward NFS as a likely cause.
>
> The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
> using a 3Ware 9000 controller to produce two RAID1 devices; these are 
> then striped (RAID0) in software to form a RAID 10 device.  We're 
> using a 2.6 kernel, xfs filesystem, and NFS3/UDP.
>
> We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel.  This kernel 
> has xfs extensions, and we're running the xfs filesystem for /home 
> (obtained from CentOS website).
>
> In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.
>
> NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server, 
> async clients, noac.
>
> This system has been serving /home in this configuration since October 
> 2005; we've seen it crash rarely, but uptimes were usually on the 
> order of months.  This past week, it can't seem to remain up for much 
> longer than about a day.
>
> Kernel log file containing the crash:
>
> Feb 12 05:52:03 tier2-home kernel: Unable to handle kernel NULL 
> pointer dereference at virtual address 00000000
> Feb 12 05:52:03 tier2-home kernel:  printing eip:
> Feb 12 05:52:03 tier2-home kernel: 00000000
> Feb 12 05:52:03 tier2-home kernel: *pde = f561f067
> Feb 12 05:52:03 tier2-home kernel: Oops: 0000 [#1]
> Feb 12 05:52:03 tier2-home kernel: PREEMPT SMP
> Feb 12 05:52:03 tier2-home kernel: Modules linked in: nfs nfsd 
> exportfs lockd md5 ipv6 autofs4 i2c_dev i2c_core sunrpc xfs dm_mirror 
> dm_mod button battery ac uhci_hcd shpchp e100
> 0 floppy ext3 jbd 3w_9xxx 3w_xxxx sd_mod scsi_mod
> Feb 12 05:52:03 tier2-home kernel: CPU:    0
<snip>

Hmmm...does it also crash if you run a non-SMP kernel?  Did you update 
the kernel around the same time as the instability began?

Cheers,

Peter Kjellström

2006-Feb-14 12:54 UTC

head link

[CentOS] Help with disk server stability issues

On Monday 13 February 2006 22:44, Andrew Zahn wrote:> Hi All,
>
> I am looking for advice on how to cure a constantly-crashing NFS server
> which crashes every few hours, or at least, every few days. The kernel
> log file (below) points toward NFS as a likely cause.
>
> The system disk is a 3ware 8000 series RAID1 mirror. The data disk is
> using a 3Ware 9000 controller to produce two RAID1 devices; these are
> then striped (RAID0) in software to form a RAID 10 device.  We're using
> a 2.6 kernel, xfs filesystem, and NFS3/UDP.
>
> We're running CentOS 4.2 with a 2.6.9-22.0.1.106 kernel.  This kernel
> has xfs extensions, and we're running the xfs filesystem for /home
One thing to consider is that the xfs module in current centosplus kernels is 
the same as kernel.org 2.6.9, that is, ancient. I never got 2.6.9 xfs stable 
for non-trivial loads and configurations.

/Peter
> (obtained from CentOS website).
>
> In "lsmod" I see both 3w_xxxx and 3w_9xxx modules.
>
> NFS is over UDP, jumbo frames (9000), 32k rsize/wsize, async server,
> async clients, noac.
>
> This system has been serving /home in this configuration since October
> 2005; we've seen it crash rarely, but uptimes were usually on the order
> of months.  This past week, it can't seem to remain up for much longer
> than about a day.
>
> Kernel log file containing the crash:
> ...
-- 
------------------------------------------------------------
  Peter Kjellstr?m               |
  National Supercomputer Centre  |
  Sweden                         | http://www.nsc.liu.se
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<http://lists.centos.org/pipermail/centos/attachments/20060214/b240b493/attachment-0002.sig>

Reasonably Related Threads

Search for more apparently analagous threads

CentOS - Feb 2006 - Help with disk server stability issues

[CentOS] Help with disk server stability issues

[CentOS] Help with disk server stability issues

[CentOS] Help with disk server stability issues

Reasonably Related Threads