aurelien.degremont@cea.fr
2007-Mar-01 08:39 UTC
[Lustre-devel] [Bug 11149] EM64T, woodcrest: stack dump being called in qos_calc_rr()
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11149 We hit a quite similar bug in Lustre 1.5.97 with a missing OST using --index. RHEL4 u3, EM64T Kernel: 2.6.9-34.EL_l47smp Lustre 1.5.97 "Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: [<ffffffffa0475265>] :lov:alloc_idx_array+4264" The bug is easy to reproduce : # mkfs.lustre --ost --index 0 ... /dev/ost0 # mkfs.lustre --ost --index 1 ... /dev/ost1 --- Skip the OST #2 # mkfs.lustre --ost --index 3 ... /dev/ost3 -Start this FS (mgs, mds, the 3 osts) -Try a ''lfs df'' on a client. It stops at OST #1. -Try some I/O -> Ooops on the MDS. Quickly looking to alloc_qos() and alloc_specific(), it seems one or both ot those functions loop on an array, from 0 to number_of_ost, and face a NULL pointer when reaching the OST #2. Some checks are needed it seems. A question: it is considered ''stable'' to use a Lustre FS with some missing OST index, like this ? Should it be possible ?
nathan@clusterfs.com
2007-Mar-01 10:25 UTC
[Lustre-devel] [Bug 11149] EM64T, woodcrest: stack dump being called in qos_calc_rr()
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11149 What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |10694 nThis| | Status|NEW |ASSIGNED Version|unspecified |b1_5 this definitely needs to be fixed before release. Serge, were you using --index also?
Nathaniel Rutman
2007-Mar-01 18:14 UTC
[Lustre-devel] [Bug 11149] EM64T, woodcrest: stack dump being called in qos_calc_rr()
This definitely should work. I have attached a patch to the bug. aurelien.degremont@cea.fr wrote:> Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: > https://bugzilla.lustre.org/show_bug.cgi?id=11149 > > > > We hit a quite similar bug in Lustre 1.5.97 with a missing OST using --index. > RHEL4 u3, EM64T > Kernel: 2.6.9-34.EL_l47smp > Lustre 1.5.97 > > "Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: > [<ffffffffa0475265>] :lov:alloc_idx_array+4264" > > The bug is easy to reproduce : > # mkfs.lustre --ost --index 0 ... /dev/ost0 > # mkfs.lustre --ost --index 1 ... /dev/ost1 > --- Skip the OST #2 > # mkfs.lustre --ost --index 3 ... /dev/ost3 > > -Start this FS (mgs, mds, the 3 osts) > -Try a ''lfs df'' on a client. It stops at OST #1. > -Try some I/O -> Ooops on the MDS. > > Quickly looking to alloc_qos() and alloc_specific(), it seems one or both ot > those functions loop on an array, from 0 to number_of_ost, and face a NULL > pointer when reaching the OST #2. > > Some checks are needed it seems. > > A question: it is considered ''stable'' to use a Lustre FS with some missing OST > index, like this ? Should it be possible ? > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-devel > >