thr3ads.net - CentOS - [CentOS] raid5 crash [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Farkas Levente

2005-Jul-07 09:45 UTC

[CentOS] raid5 crash

hi,
after we switch our servers from centos-3 to centos-4 (aka. rhel-4) one 
of our server always crash once a week without any oops. this happneds 
with both the normal kernel-2.6.9-11.EL and 
kernel-2.6.9-11.106.unsupported. after we change the motherboard, the 
raid contorller and the cables too we still got it. finally we start 
netdump and last but not least yesterday we got a crash log and a core 
file. it seems there is a bug in the raid5 code of the kernel.
this is our backup server with 8 x 200GB hdd in a raid5 (for the data) 
plus 2 x 40GB hdd in raid1 (for the system) with 3ware 8xxx raid 
contorller, running. i attached the netdump log of the last crash.
how can i fix it?
yours.

-- 
   Levente                               "Si vis pacem para bellum!"
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log
URL:
<http://lists.centos.org/pipermail/centos/attachments/20050707/11321ba7/attachment-0002.ksh>

James Olin Oden

2005-Jul-09 15:58 UTC

head link

[CentOS] raid5 crash

On 7/7/05, Farkas Levente <lfarkas at bppiac.hu>
wrote:> hi,
> after we switch our servers from centos-3 to centos-4 (aka. rhel-4) one
> of our server always crash once a week without any oops. this happneds
> with both the normal kernel-2.6.9-11.EL and
> kernel-2.6.9-11.106.unsupported. after we change the motherboard, the
> raid contorller and the cables too we still got it. finally we start
> netdump and last but not least yesterday we got a crash log and a core
> file. it seems there is a bug in the raid5 code of the kernel.
> this is our backup server with 8 x 200GB hdd in a raid5 (for the data)
> plus 2 x 40GB hdd in raid1 (for the system) with 3ware 8xxx raid
> contorller, running. i attached the netdump log of the last crash.
> how can i fix it?
> yours.
>Hi,

I have seen similar (but not quite the same) in the raid code on RHEL
3 kernels.   They typically have occured due to a race condition
between something updating the linked lists of raid devices and
something trying to read them.  For RHEL 3, my co-workes and I found
where one particular race condition was fixed in 2.6 kernel and back
ported to RHEL 3 kernel.   Ultimately this patch was placed in one of
the updates for the RHEL 3 kernel.

Anyway, it is likely your problem is yet another race condition.  What
I would suggest doing is get a box configured with true RHEL 4 and
reproduce.   Once reproduced file a bugzilla report with redhat.   We
have had very good success with this approach with a number of kernel
bugs we found in the Centos 3/RHEL 3 kernels. Fixes have not always
come quickly, but they generally do come.

Good Luck...james> --

Farkas Levente

2005-Jul-11 08:14 UTC

head link

[CentOS] raid5 crash

although this mail create a long thread, but anybody has any good 
solution to the original question?

Farkas Levente wrote:> hi,
> after we switch our servers from centos-3 to centos-4 (aka. rhel-4) one 
> of our server always crash once a week without any oops. this happneds 
> with both the normal kernel-2.6.9-11.EL and 
> kernel-2.6.9-11.106.unsupported. after we change the motherboard, the 
> raid contorller and the cables too we still got it. finally we start 
> netdump and last but not least yesterday we got a crash log and a core 
> file. it seems there is a bug in the raid5 code of the kernel.
> this is our backup server with 8 x 200GB hdd in a raid5 (for the data) 
> plus 2 x 40GB hdd in raid1 (for the system) with 3ware 8xxx raid 
> contorller, running. i attached the netdump log of the last crash.
> how can i fix it?
> yours.

-- 
   Levente                               "Si vis pacem para bellum!"

Seemingly Similar Threads

Search for more possibly parallel threads

CentOS - Jul 2005 - raid5 crash

[CentOS] raid5 crash

[CentOS] raid5 crash

[CentOS] raid5 crash

Seemingly Similar Threads