thr3ads.net - Lustre discuss - [Lustre-discuss] MGS behaviour [Oct 2006]

If this information is useful, please help other people find it:
Share via:

jrd@jrd.org

2006-Oct-12 09:25 UTC

[Lustre-discuss] MGS behaviour

I just got myself into a bit of a mess this morning by doing things that sure
seemed reasonable at the time, but turned out not to be.

I''ve been experimenting with a small ram-based (!) lustre fs,
essentially
feeding some diskless nodes from a lustre fs whose OSTs are ramdisk partitions
on a separate batch of servers.  To test all this, I formatted up the RAM OSTs
and MDT, pointed them all at my existing (disk-based) MGS, just because it was
already there, and tried it out.  Everything worked perfectly.

But it turned out that I hadn''t made my RAM MDT big enough, so I tried
to shut
down the RAM pieces, reformat them, and start them up again with bigger
sizes.  Nope.  I couldn''t mount the MDT, as mount.lustre complained
that my
service index was already in use.

I realized that the MGS remembered the previous MDT, and was booting me out
because it thought the new one was a name collision.  So I figured I should
rebuild the MGS to make it forget, and let everybody register.

Nope.  Zapping the MGS after other components such as OSTs have registered
with it causes them to fail to reregister, because they claim they already
have. 

The error message in the log of the MGS machine suggested that I could
straighten it out by running tunefs.lustre to reset the OSTs'' idea of
who the
MGS was.  Nope.  tunefs.lustre told me it couldnt help me I had extended fs
properties, and that it should be fixed in a newer release, which I should
download.

The net effect of all this was that I started over and rebuilt my disk-based
lustrefs from scratch.  When I get all the pieces of the test system online,
I''ll go back and start over on the RAM-based one.  I''ll
definitely use a
completely separate MGS this time, to avoid polluting the main one.

I think the lesson here is that if you propose to use separate lustre
filesystems sharing an MGS, you need to be careful about it.  It might have
worked if I''d done cleaner shutdowns of my RAM-based pieces, but what
if you
lose a machine?  I don''t think it''s safe to assume that one fs
will always be
shut down in the way you expect.  It also might have worked if I''d had
the
right version of tunefs.lustre; I didn''t have time to go through the
process
of upgrading it all to see if a newer version would have fixed that particular
issue. 

I believe that in the general case, the issue is around sharing a single MGS
between a filesystem that''s expected to live for a while, and one
that''s
expected to get reconstituted frequently.  Given that the MGS remembers stuff
about all the filesystems that are or have been attached to it, that seems
like a pretty dodgey proposition.

Perhaps there''s also a corollary issue, around what happens if you
lose, say,
the disk on your MGS.  What''s the procedure meant to be to build up a
fresh
one and get everybody talking to each other again?

I''d be interested in hearing about anybody else who''s using an
MGS for more
than one fs concurrently, and about what the experts at CFS say is the right
way to do that.

Nathaniel Rutman

2006-Oct-12 10:01 UTC

head link

[Lustre-discuss] MGS behaviour

jrd@jrd.org wrote:> I just got myself into a bit of a mess this morning by doing things that
sure
> seemed reasonable at the time, but turned out not to be.
>
> I''ve been experimenting with a small ram-based (!) lustre fs,
essentially
> feeding some diskless nodes from a lustre fs whose OSTs are ramdisk
partitions
> on a separate batch of servers.  To test all this, I formatted up the RAM
OSTs
> and MDT, pointed them all at my existing (disk-based) MGS, just because it
was
> already there, and tried it out.  Everything worked perfectly.
>
> But it turned out that I hadn''t made my RAM MDT big enough, so I
tried to shut
> down the RAM pieces, reformat them, and start them up again with bigger
> sizes.  Nope.  I couldn''t mount the MDT, as mount.lustre
complained that my
> service index was already in use.
>
> I realized that the MGS remembered the previous MDT, and was booting me out
> because it thought the new one was a name collision.  
Right.> So I figured I should
> rebuild the MGS to make it forget, and let everybody register.
>
>   Rebuilding the MGS will forget all logs for all registered FS''s, which 
is not what you want.
This is a correct place to use the "--writeconf" flag to tunefs.

Reformatting an already-registered MDT gives an error when you try to 
remount:
cfs21:~# mount -t lustre -o loop /tmp/testmdt /mnt/test
mount.lustre: mount /dev/loop3 at /mnt/test failed: Address already in use
The target service''s index is already in use. (/dev/loop3)

To override this check, use the "--writeconf" flag on the MDT to force
destruction of all
config files for this FS.
tunefs.lustre --writeconf /tmp/testmdt
Then the mount will succeed, and you will see in dmesg:
Lustre: MGS: Logs for fs test were removed by user request.  All servers 
must be restarted in order to regenerate the logs.
> Nope.  Zapping the MGS after other components such as OSTs have registered
> with it causes them to fail to reregister, because they claim they already
> have. 
>   Again, you can force re-registration by specifying tunefs.lustre 
--writeconf for each server.> The error message in the log of the MGS machine suggested that I could
> straighten it out by running tunefs.lustre to reset the OSTs'' idea
of who the
> MGS was.  Nope.  tunefs.lustre told me it couldnt help me I had extended fs
> properties, and that it should be fixed in a newer release, which I should
> download.
>   The tunefs.lustre error message should have included:
Use e2fsprogs-1.38-cfs1 or later, available from 
ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
which you should do, because older e2fsprogs can''t understand extended 
attributes which are used
on the OSTs, and therefore can''t modify it.> The net effect of all this was that I started over and rebuilt my
disk-based
> lustrefs from scratch.  When I get all the pieces of the test system
online,
> I''ll go back and start over on the RAM-based one.  I''ll
definitely use a
> completely separate MGS this time, to avoid polluting the main one.
>
> I think the lesson here is that if you propose to use separate lustre
> filesystems sharing an MGS, you need to be careful about it.  It might have
> worked if I''d done cleaner shutdowns of my RAM-based pieces, but
what if you
> lose a machine?  I don''t think it''s safe to assume that
one fs will always be
> shut down in the way you expect.  It also might have worked if I''d
had the
> right version of tunefs.lustre; I didn''t have time to go through
the process
> of upgrading it all to see if a newer version would have fixed that
particular
> issue. 
>   A tunefs.lustre --writeconf should have.  There is more information 
about writeconf here
https://mail.clusterfs.com/wikis/lustre/MountConf> I believe that in the general case, the issue is around sharing a single
MGS
> between a filesystem that''s expected to live for a while, and one
that''s
> expected to get reconstituted frequently.  Given that the MGS remembers
stuff
> about all the filesystems that are or have been attached to it, that seems
> like a pretty dodgey proposition.
>
> Perhaps there''s also a corollary issue, around what happens if you
lose, say,
> the disk on your MGS.  What''s the procedure meant to be to build
up a fresh
> one and get everybody talking to each other again?
>   Reformat the MGS, and again, you guessed it, writeconf on each of the 
servers.> I''d be interested in hearing about anybody else who''s
using an MGS for more
> than one fs concurrently, and about what the experts at CFS say is the
right
> way to do that.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
>

John R. Dunning

2006-Oct-12 11:09 UTC

head link

[Lustre-discuss] MGS behaviour

From: Nathaniel Rutman <nathan@clusterfs.com>
    Date: Thu, 12 Oct 2006 08:59:15 -0700

    Rebuilding the MGS will forget all logs for all registered FS''s,
which
    is not what you want.

Yeah, I kind of figured that out :-}

    The tunefs.lustre error message should have included:
    Use e2fsprogs-1.38-cfs1 or later, available from 
    ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
    which you should do, 

I believe it did.  I was in a low-level panic because what was planned
to be a short downtime of my test system was turning into a much
bigger problem, so at that point I just decided to fall back and use a
receipe I knew would work.

I''m still running 1.6b4; if I update to b5 does that stuff come with
it, or do I still need to treat it as a separate thing?

Thanks!

Nathaniel Rutman

2006-Oct-12 11:24 UTC

head link

[Lustre-discuss] MGS behaviour

John R. Dunning wrote:>     From: Nathaniel Rutman <nathan@clusterfs.com>
>     Date: Thu, 12 Oct 2006 08:59:15 -0700
>     
>     Rebuilding the MGS will forget all logs for all registered
FS''s, which
>     is not what you want.
>
> Yeah, I kind of figured that out :-}
>
>     The tunefs.lustre error message should have included:
>     Use e2fsprogs-1.38-cfs1 or later, available from 
>     ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
>     which you should do, 
>
> I believe it did.  I was in a low-level panic because what was planned
> to be a short downtime of my test system was turning into a much
> bigger problem, so at that point I just decided to fall back and use a
> receipe I knew would work.
>
> I''m still running 1.6b4; if I update to b5 does that stuff come
with
> it, or do I still need to treat it as a separate thing?
>
>   The e2fsprogs are separate.  Eventually these features will migrate into 
the standard distros, but they
aren''t there yet.

b5 has more of the --writeconf stuff worked out iirc, so it''s probably 
good to migrate.  But a word of
warning, b4 disks won''t run under b5 due to a config file format 
change.  That''s the last change we
forsee.

John R. Dunning

2006-Oct-12 12:22 UTC

head link

[Lustre-discuss] MGS behaviour

From: Nathaniel Rutman <nathan@clusterfs.com>
    Date: Thu, 12 Oct 2006 10:22:27 -0700
    
    The e2fsprogs are separate.  Eventually these features will migrate into 
    the standard distros, but they
    aren''t there yet.

Ok.
    
    b5 has more of the --writeconf stuff worked out iirc, so it''s
probably
    good to migrate.  But a word of
    warning, b4 disks won''t run under b5 due to a config file format 
    change.  That''s the last change we
    forsee.

That''s fine.  As long as I have some time to plan the migration/upgrade
on my
test systems, it should be no problem.

Good to know about the incompatibility, though.

Lustre discuss - Oct 2006 - MGS behaviour

[Lustre-discuss] MGS behaviour

[Lustre-discuss] MGS behaviour

[Lustre-discuss] MGS behaviour

[Lustre-discuss] MGS behaviour

[Lustre-discuss] MGS behaviour