thr3ads.net - Lustre discuss - [Lustre-discuss] how to replace a bad OST. [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Lundgren, Andrew

2008-Mar-17 17:29 UTC

[Lustre-discuss] how to replace a bad OST.

I am trying to learn how to replace a defective OST with a new one.  Assuming
the old OST can not be salvaged.

I have a test cluster that I am working on.

I deactivated the volume on the MGS using:

lctl conf_param content-OST0002-osc.osc.active=0

I unlinked all of the bad files by finding the ones on the bad volume.

I formatted a fresh OST using the index number of the bad device:

mkfs.lustre --reformat  --fsname content --ost --mgsnode=4.248.52.81 at tcp0
--param="failover.mode=failout" --index=02 /dev/md6

Then I tried to mount the freshly formatted OST into the cluster.

Unfortunately, I end up with an error:

mount.lustre: mount /dev/md6 at /lustre_raw_ost_one failed: Address already in
use
The target service''s index is already in use. (/dev/md6)

How can I re-use the index number to prevent always having a "dead"
point in my cluster?

Thanks!

--
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080317/3e9d7e81/attachment-0002.html

Mailer PH

2008-Mar-17 18:07 UTC

head link

[Lustre-discuss] how to replace a bad OST.

I run into similar problem few weeks ago .

You need to run :
tunefs.lustre --writeconf /dev/.............

On MDT/MGS after unmounting it , maybbe there is another way to do that without
unmounting MDT/MGS but im not sure .

Cheers .

----- Original Message -----
From: Lundgren, Andrew
To: ''Lustre-discuss at clusterfs.com''
Sent: Monday, March 17, 2008 7:29 PM
Subject: [Lustre-discuss] how to replace a bad OST.

I am trying to learn how to replace a defective OST with a new one. Assuming
the old OST can not be salvaged.

I have a test cluster that I am working on.

I deactivated the volume on the MGS using:

lctl conf_param content-OST0002-osc.osc.active=0

I unlinked all of the bad files by finding the ones on the bad volume.

I formatted a fresh OST using the index number of the bad device:

mkfs.lustre --reformat --fsname content --ost --mgsnode=4.248.52.81 at tcp0
--param="failover.mode=failout" --index=02 /dev/md6

Then I tried to mount the freshly formatted OST into the cluster.

Unfortunately, I end up with an error:

mount.lustre: mount /dev/md6 at /lustre_raw_ost_one failed: Address already in
use

The target service''s index is already in use. (/dev/md6)

How can I re-use the index number to prevent always having a ?dead? point in
my cluster?

Thanks!

Andrew

----------------------------------------------------------
Outgoing messages are virus free checked by NOD32 system
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080317/e947d81f/attachment-0002.html

Lundgren, Andrew

2008-Mar-17 19:09 UTC

head link

[Lustre-discuss] how to replace a bad OST.

Are you saying to wipe the entire cluster?  That says it wipes the config logs
for the fs, but I am not entirely sure what that means...

--
Andrew
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Mailer PH
> Sent: Monday, March 17, 2008 12:07 PM
> To: Lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] how to replace a bad OST.
>
> I run into similar problem few weeks ago .
>
> You need to run :
> tunefs.lustre --writeconf /dev/.............
>
> On MDT/MGS after unmounting it , maybbe there is another way to do that
> without unmounting MDT/MGS but im not sure .
>
> Cheers .
>
>
>
>       ----- Original Message -----
>       From: Lundgren, Andrew
>       To: ''Lustre-discuss at clusterfs.com''
>       Sent: Monday, March 17, 2008 7:29 PM
>       Subject: [Lustre-discuss] how to replace a bad OST.
>
>
>       I am trying to learn how to replace a defective OST with a new one.
> Assuming the old OST can not be salvaged.
>
>
>
>       I have a test cluster that I am working on.
>
>
>
>       I deactivated the volume on the MGS using:
>
>
>
>       lctl conf_param content-OST0002-osc.osc.active=0
>
>
>
>       I unlinked all of the bad files by finding the ones on the bad
> volume.
>
>
>
>       I formatted a fresh OST using the index number of the bad device:
>
>
>
>       mkfs.lustre --reformat  --fsname content --ost --
> mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout"
--index=02
> /dev/md6
>
>
>
>       Then I tried to mount the freshly formatted OST into the cluster.
>
>
>
>       Unfortunately, I end up with an error:
>
>
>
>       mount.lustre: mount /dev/md6 at /lustre_raw_ost_one failed: Address
> already in use
>
>       The target service''s index is already in use. (/dev/md6)
>
>
>
>       How can I re-use the index number to prevent always having a
"dead"
> point in my cluster?
>
>
>
>       Thanks!
>
>
>
>       --
>
>       Andrew
>
>
>
> ----------------------------------------------------------
> Outgoing messages are virus free checked by NOD32 system

Lundgren, Andrew

2008-Mar-18 16:04 UTC

head link

[Lustre-discuss] how to replace a bad OST.

Well,

That did work, I also had to unmount all of the OSTs and clients to get it to
function however.

Does anyone know, is there a way to do this w/o resetting the entire file
system?

--
Andrew
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Mailer PH
> Sent: Monday, March 17, 2008 12:07 PM
> To: Lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] how to replace a bad OST.
>
> I run into similar problem few weeks ago .
>
> You need to run :
> tunefs.lustre --writeconf /dev/.............
>
> On MDT/MGS after unmounting it , maybbe there is another way to do that
> without unmounting MDT/MGS but im not sure .
>
> Cheers .
>
>
>
>       ----- Original Message -----
>       From: Lundgren, Andrew
>       To: ''Lustre-discuss at clusterfs.com''
>       Sent: Monday, March 17, 2008 7:29 PM
>       Subject: [Lustre-discuss] how to replace a bad OST.
>
>
>       I am trying to learn how to replace a defective OST with a new one.
> Assuming the old OST can not be salvaged.
>
>
>
>       I have a test cluster that I am working on.
>
>
>
>       I deactivated the volume on the MGS using:
>
>
>
>       lctl conf_param content-OST0002-osc.osc.active=0
>
>
>
>       I unlinked all of the bad files by finding the ones on the bad
> volume.
>
>
>
>       I formatted a fresh OST using the index number of the bad device:
>
>
>
>       mkfs.lustre --reformat  --fsname content --ost --
> mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout"
--index=02
> /dev/md6
>
>
>
>       Then I tried to mount the freshly formatted OST into the cluster.
>
>
>
>       Unfortunately, I end up with an error:
>
>
>
>       mount.lustre: mount /dev/md6 at /lustre_raw_ost_one failed: Address
> already in use
>
>       The target service''s index is already in use. (/dev/md6)
>
>
>
>       How can I re-use the index number to prevent always having a
"dead"
> point in my cluster?
>
>
>
>       Thanks!
>
>
>
>       --
>
>       Andrew
>
>
>
> ----------------------------------------------------------
> Outgoing messages are virus free checked by NOD32 system

Lundgren, Andrew

2008-Mar-18 16:20 UTC

head link

[Lustre-discuss] how to replace a bad OST.

I thought it worked, I seem to have lost the content of my entire test cluster
except the subdirectories...
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Lundgren, Andrew
> Sent: Tuesday, March 18, 2008 10:05 AM
> To: Mailer PH; Lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] how to replace a bad OST.
>
> Well,
>
> That did work, I also had to unmount all of the OSTs and clients to get it
> to function however.
>
> Does anyone know, is there a way to do this w/o resetting the entire file
> system?
>
> --
> Andrew
>
> > -----Original Message-----
> > From: lustre-discuss-bounces at lists.lustre.org
[mailto:lustre-discuss-
> > bounces at lists.lustre.org] On Behalf Of Mailer PH
> > Sent: Monday, March 17, 2008 12:07 PM
> > To: Lustre-discuss at lists.lustre.org
> > Subject: Re: [Lustre-discuss] how to replace a bad OST.
> >
> > I run into similar problem few weeks ago .
> >
> > You need to run :
> > tunefs.lustre --writeconf /dev/.............
> >
> > On MDT/MGS after unmounting it , maybbe there is another way to do
that
> > without unmounting MDT/MGS but im not sure .
> >
> > Cheers .
> >
> >
> >
> >       ----- Original Message -----
> >       From: Lundgren, Andrew
> >       To: ''Lustre-discuss at clusterfs.com''
> >       Sent: Monday, March 17, 2008 7:29 PM
> >       Subject: [Lustre-discuss] how to replace a bad OST.
> >
> >
> >       I am trying to learn how to replace a defective OST with a new
> one.
> > Assuming the old OST can not be salvaged.
> >
> >
> >
> >       I have a test cluster that I am working on.
> >
> >
> >
> >       I deactivated the volume on the MGS using:
> >
> >
> >
> >       lctl conf_param content-OST0002-osc.osc.active=0
> >
> >
> >
> >       I unlinked all of the bad files by finding the ones on the bad
> > volume.
> >
> >
> >
> >       I formatted a fresh OST using the index number of the bad
device:
> >
> >
> >
> >       mkfs.lustre --reformat  --fsname content --ost --
> > mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout"
--index=02
> > /dev/md6
> >
> >
> >
> >       Then I tried to mount the freshly formatted OST into the
cluster.
> >
> >
> >
> >       Unfortunately, I end up with an error:
> >
> >
> >
> >       mount.lustre: mount /dev/md6 at /lustre_raw_ost_one failed:
> Address
> > already in use
> >
> >       The target service''s index is already in use.
(/dev/md6)
> >
> >
> >
> >       How can I re-use the index number to prevent always having a
> "dead"
> > point in my cluster?
> >
> >
> >
> >       Thanks!
> >
> >
> >
> >       --
> >
> >       Andrew
> >
> >
> >
> > ----------------------------------------------------------
> > Outgoing messages are virus free checked by NOD32 system
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2008-Mar-18 23:08 UTC

head link

[Lustre-discuss] how to replace a bad OST.

On Mar 17, 2008  11:29 -0600, Lundgren, Andrew wrote:> I am trying to learn how to replace a defective OST with a new one.
> Assuming the old OST can not be salvaged.
> 
> I have a test cluster that I am working on.
> 
> I deactivated the volume on the MGS using:
> 
> lctl conf_param content-OST0002-osc.osc.active=0
> 
> I unlinked all of the bad files by finding the ones on the bad volume.
> 
> I formatted a fresh OST using the index number of the bad device:
> 
> mkfs.lustre --reformat  --fsname content --ost --mgsnode=4.248.52.81 at
tcp0 --param="failover.mode=failout" --index=02 /dev/md6
You do not necessarily want to add the new OST in the same slot as the
old one.  There are a few compilcations with doing that, in particular:
- the MDS will think that new OST has objects up to what the old OST
  had, and when the new OST is first started it will recreate them.
  That will take a long time, and waste a lot of space on the OST, maybe
  all of the inodes in the whole filesystem
- if you missed removing some of the bad files by accident, they will
  think that the new OST is the same as the old one.  Not fatal, but
  you would probably prefer to get an IO error back instead of just
  a zero-length file.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lundgren, Andrew

2008-Mar-19 16:22 UTC

head link

[Lustre-discuss] how to replace a bad OST.

What is the best way to do this then when you know an OST cannot be recovered
and you don''t want your cluster to contain a point that is offline?

--
Andrew
> -----Original Message-----
> From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On
Behalf Of
> Andreas Dilger
> Sent: Tuesday, March 18, 2008 5:08 PM
> To: Lundgren, Andrew
> Cc: ''Lustre-discuss at clusterfs.com''; Nathaniel Rutman
> Subject: Re: [Lustre-discuss] how to replace a bad OST.
>
> On Mar 17, 2008  11:29 -0600, Lundgren, Andrew wrote:
> > I am trying to learn how to replace a defective OST with a new one.
> > Assuming the old OST can not be salvaged.
> >
> > I have a test cluster that I am working on.
> >
> > I deactivated the volume on the MGS using:
> >
> > lctl conf_param content-OST0002-osc.osc.active=0
> >
> > I unlinked all of the bad files by finding the ones on the bad volume.
> >
> > I formatted a fresh OST using the index number of the bad device:
> >
> > mkfs.lustre --reformat  --fsname content --ost --
> mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout"
--index=02
> /dev/md6
>
> You do not necessarily want to add the new OST in the same slot as the
> old one.  There are a few compilcations with doing that, in particular:
> - the MDS will think that new OST has objects up to what the old OST
>   had, and when the new OST is first started it will recreate them.
>   That will take a long time, and waste a lot of space on the OST, maybe
>   all of the inodes in the whole filesystem
> - if you missed removing some of the bad files by accident, they will
>   think that the new OST is the same as the old one.  Not fatal, but
>   you would probably prefer to get an IO error back instead of just
>   a zero-length file.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.

Lustre discuss - Mar 2008 - how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.

[Lustre-discuss] how to replace a bad OST.