thr3ads.net - Lustre devel - Kernel crash from "mkfs.lustre --index" setting [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Wendy Cheng

2013-Oct-11 17:11 UTC

Kernel crash from "mkfs.lustre --index" setting

This panic seems to be generic regardless the platform, though I''m
actually on Intel Xeon Phi Lustre (client) nodes.

New to Lustre, I mistakenly thought the "index" option of mkfs.lustre
was for software raid so I formatted one of the server disks as the
following:

server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1 /dev/sdd1
server> mkfs.lustre --reformat --ost --fsname=lus1
--mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1

The client mount immediately crashed at lmv_get_info(). The attached
patch fixed that particular panic ... but unfortunately crashed at an
assertion further down the path. I''ll be travelling next week so might
give up pursuing this issue. The disks are now subsequently
re-formatted with index=0 - things seem to work fine and performance
numbers collected. Three questions here:

1. What is this "index" option all about ?
2. Does the problem worth being fixed ? Or is it a user error ?
3. The performance numbers (again, NOT Xeon Phi specific) surprise me.
Would this list be a good place to ask questions ?

-- Wendy


_______________________________________________
Lustre-devel mailing list
Lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-devel

White, Cliff

2013-Oct-11 17:41 UTC

head link

Re: Kernel crash from "mkfs.lustre --index" setting

On 10/11/13 10:11 AM, "Wendy Cheng"
<s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>This panic seems to be generic regardless the platform, though I''m
>actually on Intel Xeon Phi Lustre (client) nodes.
>
>New to Lustre, I mistakenly thought the "index" option of
mkfs.lustre
>was for software raid so I formatted one of the server disks as the
>following:
>
>server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=1
>/dev/sdd1
>server> mkfs.lustre --reformat --ost --fsname=lus1
>--mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sde1
>
>The client mount immediately crashed at lmv_get_info(). The attached
>patch fixed that particular panic ... but unfortunately crashed at an
>assertion further down the path. I''ll be travelling next week so
might
>give up pursuing this issue. The disks are now subsequently
>re-formatted with index=0 - things seem to work fine and performance
>numbers collected. Three questions here:
>
>1. What is this "index" option all about ?
>2. Does the problem worth being fixed ? Or is it a user error ?
>3. The performance numbers (again, NOT Xeon Phi specific) surprise me.
>Would this list be a good place to ask questions ?
>
>-- Wendy
>1.
--index is used to enumerate OSTs and MDT, when using DNE.
The index MUST be unique, and indexes must not have gaps.
So, you should do this:
server> mkfs.lustre --reformat --fsname=lus1 --mgs --mdt --index=0
/dev/sdd1 /* First MDT */
server> mkfs.lustre --reformat --ost --fsname=lus1
--mgsnode=192.168.20.46@o2ib0 --index=0 /dev/sde1 /* first OST */
If you add a second OST partition:

server> mkfs.lustre --reformat --ost --fsname=lus1
--mgsnode=192.168.20.46@o2ib0 --index=1 /dev/sdfoo /* second OST */

And a third:
server> mkfs.lustre --reformat --ost --fsname=lus1
--mgsnode=192.168.20.46@o2ib0 --index=2 /dev/sdbar /* third OST */


2.- You must fix this, or things won''t work. I would suggest starting
again, and doing a reformat
Etc,etc

3. Surprise you how?

HPDD-discuss is likely a better list for these sorts of questions,
lustre-devel is for code development.
Cliffw

Wendy Cheng

2013-Oct-11 17:59 UTC

head link

Re: Kernel crash from "mkfs.lustre --index" setting

On Fri, Oct 11, 2013 at 10:41 AM, White, Cliff
<cliff.white-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> 1.
> --index is used to enumerate OSTs and MDT, when using DNE.
> The index MUST be unique, and indexes must not have gaps.
I see ... index must not have gaps. However, a user error could crash
the kernelr . Does that sound right ? .
>
> 3. Surprise you how?
>
> HPDD-discuss is likely a better list for these sorts of questions,
> lustre-devel is for code development.
Thanks .. I''ll move the discuss there sometime next week. It looks to
me Lustre is doing sync to the disks all the time vs. other network
filesystem (e.g. NFS) that does caching quite aggressively.

-- Wendy

White, Cliff

2013-Oct-11 18:30 UTC

head link

Re: Kernel crash from "mkfs.lustre --index" setting

On 10/11/13 10:59 AM, "Wendy Cheng"
<s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>On Fri, Oct 11, 2013 at 10:41 AM, White, Cliff
<cliff.white-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>wrote:
>
>> 1.
>> --index is used to enumerate OSTs and MDT, when using DNE.
>> The index MUST be unique, and indexes must not have gaps.
>
>I see ... index must not have gaps. However, a user error could crash
>the kernelr . Does that sound right ? .
Well, creating the filesystem is normally done by admins, not users, but
yes, it shouldn''t crash.
Lustre-devel is the place for your patch, sorry I wasn''t clear.
-discuss
is more for the ''why are their indexes'' type of questions.
:)>
>>
>> 3. Surprise you how?
>>
>> HPDD-discuss is likely a better list for these sorts of questions,
>> lustre-devel is for code development.
>
>Thanks .. I''ll move the discuss there sometime next week. It looks
to
>me Lustre is doing sync to the disks all the time vs. other network
>filesystem (e.g. NFS) that does caching quite aggressively.
Yes, Lustre by design does direct IO to disk, and does not cache data on
the servers. 
Some caching can be enabled, but in general no, you should not see the
servers caching.

However, the clients should be using the normal Linux block cache, if the
clients are not caching
There may be an issue with your setup.
Cliffw
>
>-- Wendy
>

Wendy Cheng

2013-Oct-12 00:47 UTC

head link

Re: Kernel crash from "mkfs.lustre --index" setting

On Fri, Oct 11, 2013 at 11:30 AM, White, Cliff
<cliff.white-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
wrote:> On 10/11/13 10:59 AM, "Wendy Cheng"
<s.wendy.cheng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>>On Fri, Oct 11, 2013 at 10:41 AM, White, Cliff
<cliff.white-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>wrote:
>>
>
> Yes, Lustre by design does direct IO to disk, and does not cache data on
> the servers.
I see .. Direct IO . the data makes sense now :)

Thanks !

-- Wendy

Lustre devel - Oct 2013 - Kernel crash from "mkfs.lustre --index" setting

Kernel crash from "mkfs.lustre --index" setting

Re: Kernel crash from "mkfs.lustre --index" setting

Re: Kernel crash from "mkfs.lustre --index" setting

Re: Kernel crash from "mkfs.lustre --index" setting

Re: Kernel crash from "mkfs.lustre --index" setting