thr3ads.net - Lustre discuss - [Lustre-discuss] Decommission an OST [Nov 2006]

If this information is useful, please help other people find it:
Share via:

John R. Dunning

2006-Nov-17 12:12 UTC

[Lustre-discuss] Decommission an OST

Hi all.  I believe this has been discussed before, but 15 minutes of googling
and searching my mail archives didn''t reveal the answer; no doubt when
somebody reminds me what it is, I''ll get to say D''oh!

I''ve got a test installation running 1.6b5, and it looks like one of
the
drives (containing an OST) is on its way out.  I''ve migrated all the
data off
it (by deactivating it on the mds and using lfs find to identify all the files
that needed to be copied) and now I''m trying to cleanly shut down that
OST and
make the rest of the system forget about it, at least for a while.

I tried deactivating the device on the OSS, using lctl --device N deactivate,
but that gripes invalid argument.  If I just dismount it, the MDS/MGS sit
around griping that they''re trying to recover it.  I could have sworn
there
was a way to get the system to no longer think that OST is a part of it, but I
can''t seem to find it now.  Anybody got hints?

Thanks in advance...

Nathaniel Rutman

2006-Nov-17 12:40 UTC

head link

[Lustre-discuss] Decommission an OST

Deactivate the device on the MDT side for a currently-running server
e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
lctl --device 13 deactivate

To start a client or MDT with a known down OST:
mount -t lustre -o exclude=lustre-OST0001 ...


John R. Dunning wrote:> Hi all.  I believe this has been discussed before, but 15 minutes of
googling
> and searching my mail archives didn''t reveal the answer; no doubt
when
> somebody reminds me what it is, I''ll get to say D''oh!
>
> I''ve got a test installation running 1.6b5, and it looks like one
of the
> drives (containing an OST) is on its way out.  I''ve migrated all
the data off
> it (by deactivating it on the mds and using lfs find to identify all the
files
> that needed to be copied) and now I''m trying to cleanly shut down
that OST and
> make the rest of the system forget about it, at least for a while.
>
> I tried deactivating the device on the OSS, using lctl --device N
deactivate,
> but that gripes invalid argument.  If I just dismount it, the MDS/MGS sit
> around griping that they''re trying to recover it.  I could have
sworn there
> was a way to get the system to no longer think that OST is a part of it,
but I
> can''t seem to find it now.  Anybody got hints?
>
> Thanks in advance...
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
>

John R. Dunning

2006-Nov-17 12:56 UTC

head link

[Lustre-discuss] Decommission an OST

From: Nathaniel Rutman <nathan@clusterfs.com>
    Date: Fri, 17 Nov 2006 11:39:59 -0800

    Deactivate the device on the MDT side for a currently-running server
    e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
    lctl --device 13 deactivate

Ok, did that.  It still shows as UP when I lctl dl, though.

    To start a client or MDT with a known down OST:
    mount -t lustre -o exclude=lustre-OST0001 ...

Ah, ok.  So there isn''t any way to say "Remove all traces of this
OST from the
system so that nobody knows it was ever there" ?

Nathaniel Rutman

2006-Nov-17 13:11 UTC

head link

[Lustre-discuss] Decommission an OST

John R. Dunning wrote:>     From: Nathaniel Rutman <nathan@clusterfs.com>
>     Date: Fri, 17 Nov 2006 11:39:59 -0800
>     
>     Deactivate the device on the MDT side for a currently-running server
>     e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
>     lctl --device 13 deactivate
>
> Ok, did that.  It still shows as UP when I lctl dl, though.
>   Yes, it does.  Your question prompted me to take a look at changing that...

For now, you can get to it here:
cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID INACTIVE
>     
>     To start a client or MDT with a known down OST:
>     mount -t lustre -o exclude=lustre-OST0001 ...
>     
> Ah, ok.  So there isn''t any way to say "Remove all traces of
this OST from the
> system so that nobody knows it was ever there" ?
>   That is an eventual planned feature, but isn''t implemented yet.

You could --writeconf the MDT to nuke the config logs, then restart the 
servers, and
that will truly erase all traces of OSTs that don''t restart.  Beware, 
any file that has
stripes on such an erased OST will be very confusing to Lustre...
Beware #2: I don''t claim to have tried this myself.

John R. Dunning

2006-Nov-17 13:26 UTC

head link

[Lustre-discuss] Decommission an OST

From: Nathaniel Rutman <nathan@clusterfs.com>
    Date: Fri, 17 Nov 2006 12:11:25 -0800
    
    John R. Dunning wrote:
    >     From: Nathaniel Rutman <nathan@clusterfs.com>
    >     Date: Fri, 17 Nov 2006 11:39:59 -0800
    >     
    >     Deactivate the device on the MDT side for a currently-running
server
    >     e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
    >     lctl --device 13 deactivate
    >
    > Ok, did that.  It still shows as UP when I lctl dl, though.
    >   
    Yes, it does.  Your question prompted me to take a look at changing that...
    
    For now, you can get to it here:
    cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd
    0: lustre-OST0000_UUID ACTIVE
    1: lustre-OST0001_UUID INACTIVE

Ok.
    
    >     
    >     To start a client or MDT with a known down OST:
    >     mount -t lustre -o exclude=lustre-OST0001 ...
    >     
    > Ah, ok.  So there isn''t any way to say "Remove all traces
of this OST from the
    > system so that nobody knows it was ever there" ?
    >   
    That is an eventual planned feature, but isn''t implemented yet.

Ok.
    
    You could --writeconf the MDT to nuke the config logs, then restart the 
    servers, 

Example?
	     and
    that will truly erase all traces of OSTs that don''t restart. 
Beware,
    any file that has
    stripes on such an erased OST will be very confusing to Lustre...

Sure, of course.  I suppose to do it really right, you''d want some kind
of
tool that could examine the MD and gripe about anything that had stripes on
the OST in question.  But that would be pretty slow.

    Beware #2: I don''t claim to have tried this myself.
    
    
Understood.  Perhaps I''ll try this next week, or perhaps I''ll
just blow it
away and rebuild it without the offending unit.

Thanks...

Nathaniel Rutman

2006-Nov-17 14:37 UTC

head link

[Lustre-discuss] Decommission an OST

John R. Dunning wrote:>     From: Nathaniel Rutman <nathan@clusterfs.com>
>     Date: Fri, 17 Nov 2006 12:11:25 -0800
>     
>     John R. Dunning wrote:
>     >     From: Nathaniel Rutman <nathan@clusterfs.com>
>     >     Date: Fri, 17 Nov 2006 11:39:59 -0800
>     >     
>     >     Deactivate the device on the MDT side for a currently-running
server
>     >     e.g. 13 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
>     >     lctl --device 13 deactivate
>     >
>     > Ok, did that.  It still shows as UP when I lctl dl, though.
>     >   
>     Yes, it does.  Your question prompted me to take a look at changing
that...
>     
>     For now, you can get to it here:
>     cfs21:~# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd
>     0: lustre-OST0000_UUID ACTIVE
>     1: lustre-OST0001_UUID INACTIVE
>
> Ok.
>     
>     >     
>     >     To start a client or MDT with a known down OST:
>     >     mount -t lustre -o exclude=lustre-OST0001 ...
>     >     
>     > Ah, ok.  So there isn''t any way to say "Remove all
traces of this OST from the
>     > system so that nobody knows it was ever there" ?
>     >   
>     That is an eventual planned feature, but isn''t implemented
yet.
>
> Ok.
>     
>     You could --writeconf the MDT to nuke the config logs, then restart the
>     servers, 
>
> Example?
>   See the wiki:
https://mail.clusterfs.com/wikis/lustre/MountConf#head-18c689130e5184035dcec1e6e2b49597afdab189
I just noticed a regression in my current code (and updated the wiki) - 
you''ll have to tunefs.lustre --writeconf every server
disk, not only the MDT, to regen the logs.  I have now fixed that so you 
only need to --writeconf the MDT,
but it is always safe to do them all.  (Not sure when that regressed.)
> 	     and
>     that will truly erase all traces of OSTs that don''t restart. 
Beware,
>     any file that has
>     stripes on such an erased OST will be very confusing to Lustre...
>
> Sure, of course.  I suppose to do it really right, you''d want some
kind of
> tool that could examine the MD and gripe about anything that had stripes on
> the OST in question.  But that would be pretty slow.
>
>     Beware #2: I don''t claim to have tried this myself.
>     
>     
> Understood.  Perhaps I''ll try this next week, or perhaps
I''ll just blow it
> away and rebuild it without the offending unit.
>   I just tried it myself, and it works like a charm.
Files on lost OSTs don''t actually seem to confuse Lustre at all, they 
just act corrupted:
cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre
total 4
?---------  ? ?    ?       ?            ? p2
-rw-r--r--  1 root root 1699 Nov 17 12:53 passwd

Adding a new OST that reuses the old index results in a valid but 
truncated file:
cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre
total 4
-rw-r--r--  1 root root    0 Nov 17 13:31 p2
-rw-r--r--  1 root root 1699 Nov 17 12:53 passwd

Andreas Dilger

2006-Nov-17 14:44 UTC

head link

[Lustre-discuss] Decommission an OST

On Nov 17, 2006  15:26 -0500, John R. Dunning wrote:> Sure, of course.  I suppose to do it really right, you''d want some
kind of
> tool that could examine the MD and gripe about anything that had stripes on
> the OST in question.  But that would be pretty slow.
That is what "lfs find -obd ...." does, and you''ve already
done that.  As
long as the OST is deactivated on the MDS no objects will be created there,
but I''d consider doing one last pass before removing it completely (in
case
it was active while the fs was in use, I don''t know how tightly this
system
is controlled by you).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Andreas Dilger

2006-Nov-17 14:48 UTC

head link

[Lustre-discuss] Decommission an OST

On Nov 17, 2006  13:37 -0800, Nathaniel Rutman wrote:> Adding a new OST that reuses the old index results in a valid but 
> truncated file:
> cfs21:~/cfs/b1_5/lustre/tests# ll /mnt/lustre
> total 4
> -rw-r--r--  1 root root    0 Nov 17 13:31 p2
> -rw-r--r--  1 root root 1699 Nov 17 12:53 passwd
Hmm, that shouldn''t be possible.  What should instead happen is that
either this OST index is marked "do not use" or the
"ost_gen" field
in the lov_tgts/lov_ost_data_v1 is incremented to indicate that while
the index is the same this is in fact a different OST (that avoids
the need to have potentially thousands of empty slots in lov_tgts).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

John R. Dunning

2006-Nov-20 10:00 UTC

head link

[Lustre-discuss] Decommission an OST

From: Andreas Dilger <adilger@clusterfs.com>
    Date: Fri, 17 Nov 2006 14:43:59 -0700

    On Nov 17, 2006  15:26 -0500, John R. Dunning wrote:
    > Sure, of course.  I suppose to do it really right, you''d want
some kind of
    > tool that could examine the MD and gripe about anything that had
stripes on
    > the OST in question.  But that would be pretty slow.

    That is what "lfs find -obd ...." does, and you''ve
already done that.  As
    long as the OST is deactivated on the MDS no objects will be created there,
    but I''d consider doing one last pass before removing it completely
(in case
    it was active while the fs was in use, I don''t know how tightly
this system
    is controlled by you).

In this (test) case, the answer is "totally".  In the likely scenarios
that I
can see for deployment, the answer is also likely to be "totally".  In
the
case where it''s an external fs that I''m interfacing to along
with other
clients, I hope and expect that I can push the problem off onto
whoever''s
managing it.

What got me started on that line of thought was what happens when you have a
lustre fs that lives for a long time.  Growing it is easy enough, but what if,
for whatever reason, you want to shrink it while leaving it operational.  In
that case, you might well want to reduce the number of OSTs, so a procedure
which allows one to reliably get rid of an OST and tell the system not to
expect it to come back seems like a Good Thing (tm).

Maybe I shouldn''t worry about it, storage is cheap enough and getting
cheaper,
so maybe I should only expect things to grow :-}

Lustre discuss - Nov 2006 - Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST

[Lustre-discuss] Decommission an OST