Wendy Cheng
2013-Oct-12 03:28 UTC
[Patch] Fix Client Kernel Crash with Mis-configured Index Numbering
Ref: http://lists.lustre.org/pipermail/lustre-devel/2013-October/004270.html I''m not really convinced the "index" setting of mkfs.lustre needs to be started with "0". However, in the minimum, the client kernel should not crash. The attached patch does this minimum fix; compiled and tested with GIT master branch. Recreated by: server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1 /dev/sdd1 server> mkfs.lustre --reformat --ost --fsname=lus1 --mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1 client> mount.lustre -o flock 192.168.20.46@o2ib0:/lus1 /mnt/lustre The client mount crashes at lmv_get_info() without changes <1>[ 215.946538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 <1>[ 215.946572] IP: [<ffffffffa07445cb>] lmv_get_info+0x32b/0x560 [lmv] <0>[ 215.947090] Call Trace:^M <4>[ 215.947143] [<ffffffffa0655b70>] ll_fill_super+0x1f40/0x4330 [lustre]^M <4>[ 215.947214] [<ffffffffa02cf527>] ? lustre_start_mgc+0x227/0x2a90 [obdclass]^M <4>[ 215.947275] [<ffffffffa02d3d60>] lustre_fill_super+0xa20/0x22f0 [obdclass]^M <4>[ 215.947304] [<ffffffff810de91f>] ? set_anon_super+0x0/0xe0^M <4>[ 215.947361] [<ffffffffa02d3340>] ? lustre_fill_super+0x0/0x22f0 [obdclass]^M <4>[ 215.947380] [<ffffffff810df601>] mount_nodev+0x50/0x84^M <4>[ 215.947437] [<ffffffffa02cc5d9>] lustre_mount+0x29/0x30 [obdclass]^M <4>[ 215.947454] [<ffffffff810df009>] vfs_kern_mount+0xa8/0x1f3^M <4>[ 215.947471] [<ffffffff810df1bc>] do_kern_mount+0x4d/0xe1^M <4>[ 215.947489] [<ffffffff810f54d7>] do_mount+0x67d/0x6d5^M <4>[ 215.947507] [<ffffffff810f57cc>] sys_mount+0x84/0xbd^M <4>[ 215.947527] [<ffffffff81002aab>] system_call_fastpath+0x16/0x1b^M Signed-off-by: Wendy Cheng <wendy.cheng-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c index 3091bfb..5f4a18b 100644 --- a/lustre/lmv/lmv_obd.c +++ b/lustre/lmv/lmv_obd.c @@ -2443,6 +2443,16 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, RETURN(rc); /* + * In the case of mis-configured OSS, instead of crashing + * the kernel during client mount, give them a warning and + * gracefully back out mount process w/ -ENXIO error. + */ + if (lmv->tgts[0] == NULL) { + CDEBUG(D_IOCTL, "NULL index\n"); + RETURN(-ENXIO); + } + + /* * Forwarding this request to first MDS, it should know LOV * desc. */
Dilger, Andreas
2013-Oct-12 06:59 UTC
Re: [Patch] Fix Client Kernel Crash with Mis-configured Index Numbering
Hi Wendy, Thanks for the patch. Could you please file a ticket at https://jira.hpdd.intel.com/ and submit the patch to our Gerrit repo (with minor tweaks as suggested below) so it is included in the next Lustre release. For more details please see: https://wiki.hpdd.intel.com/display/PUB/Submitting+Changes You are totally correct that no user input should crash the kernel. The support for multiple MDTs in the same filesystem is relatively new (previously only MDT index 0 was allowed), and I guess nobody has ever tested what you did. Cheers, Andreas On 2013-10-11, at 21:29, "Wendy Cheng" <s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote: Ref: http://lists.lustre.org/pipermail/lustre-devel/2013-October/004270.html I''m not really convinced the "index" setting of mkfs.lustre needs to be started with "0". However, in the minimum, the client kernel should not crash. The attached patch does this minimum fix; compiled and tested with GIT master branch. Recreated by: server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1 /dev/sdd1 server> mkfs.lustre --reformat --ost --fsname=lus1 --mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1 client> mount.lustre -o flock 192.168.20.46@o2ib0:/lus1 /mnt/lustre The client mount crashes at lmv_get_info() without changes <1>[ 215.946538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 <1>[ 215.946572] IP: [<ffffffffa07445cb>] lmv_get_info+0x32b/0x560 [lmv] <0>[ 215.947090] Call Trace:^M <4>[ 215.947143] [<ffffffffa0655b70>] ll_fill_super+0x1f40/0x4330 [lustre]^M <4>[ 215.947214] [<ffffffffa02cf527>] ? lustre_start_mgc+0x227/0x2a90 [obdclass]^M <4>[ 215.947275] [<ffffffffa02d3d60>] lustre_fill_super+0xa20/0x22f0 [obdclass]^M <4>[ 215.947304] [<ffffffff810de91f>] ? set_anon_super+0x0/0xe0^M <4>[ 215.947361] [<ffffffffa02d3340>] ? lustre_fill_super+0x0/0x22f0 [obdclass]^M <4>[ 215.947380] [<ffffffff810df601>] mount_nodev+0x50/0x84^M <4>[ 215.947437] [<ffffffffa02cc5d9>] lustre_mount+0x29/0x30 [obdclass]^M <4>[ 215.947454] [<ffffffff810df009>] vfs_kern_mount+0xa8/0x1f3^M <4>[ 215.947471] [<ffffffff810df1bc>] do_kern_mount+0x4d/0xe1^M <4>[ 215.947489] [<ffffffff810f54d7>] do_mount+0x67d/0x6d5^M <4>[ 215.947507] [<ffffffff810f57cc>] sys_mount+0x84/0xbd^M <4>[ 215.947527] [<ffffffff81002aab>] system_call_fastpath+0x16/0x1b^M Signed-off-by: Wendy Cheng <wendy.cheng-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org<mailto:wendy.cheng-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>> diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c index 3091bfb..5f4a18b 100644 --- a/lustre/lmv/lmv_obd.c +++ b/lustre/lmv/lmv_obd.c @@ -2443,6 +2443,16 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp, RETURN(rc); /* + * In the case of mis-configured OSS, instead of crashing This comment should read "misconfigured MDT" ... + * the kernel during client mount, give them a warning and + * gracefully back out mount process w/ -ENXIO error. + */ + if (lmv->tgts[0] == NULL) { + CDEBUG(D_IOCTL, "NULL index\n"); "NULL target for MDT0\n" + RETURN(-ENXIO); + } + + /* * Forwarding this request to first MDS, it should know LOV * desc. */ _______________________________________________ Lustre-devel mailing list Lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org<mailto:Lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org> http://lists.lustre.org/mailman/listinfo/lustre-devel
Wendy Cheng
2013-Oct-13 13:42 UTC
Re: [Patch] Fix Client Kernel Crash with Mis-configured Index Numbering
On Fri, Oct 11, 2013 at 11:59 PM, Dilger, Andreas <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:> Hi Wendy, > Thanks for the patch. Could you please file a ticket at > https://jira.hpdd.intel.com/ and submit the patch to our Gerrit repo (with > minor tweaks as suggested below) so it is included in the next Lustre > release. For more details please see: > > https://wiki.hpdd.intel.com/display/PUB/Submitting+Changes > >Thanks ... Will do the check in sometime next week when I''m back to office . -- Wendy