thr3ads.net - Lustre discuss - [Lustre-discuss] failing hardware in ost [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Stuart Midgley

2007-Jun-05 18:02 UTC

[Lustre-discuss] failing hardware in ost

Morning

We have some failing hardware in an oss (md raid5 over 6 x 750GB sata  
disks) and I would like to migrate all the data off to a new oss.   
This looks relatively straight forward enough, but can you do it semi- 
live?

What I want to do is mount the lustre partition read only while it is  
still being served up and rsync the data off to a new oss (which will  
take ~24hrs).  Then, bring the entire cluster down, do the last rsync  
(which will hopefully be fast), turn the old oss off and bring the  
new replacement oss into the cluster.  In this way,  downtime will be  
minimised.

Will this work?  Has anybody tired such a scheme?

Thanks.


-- 
Dr Stuart Midgley
sdm900@gmail.com

Stephen Willey

2007-Jun-06 02:52 UTC

head link

[Lustre-discuss] failing hardware in ost

I inquired about this a while back and got the following:

"In order to minimize downtime, it would also be possible to use the ext2
"dump" program in order to do device-level backups (including the
extended attributes) while the filesystem is in use.  This backup would
not be 100% coherent with the actual filesystem.

The problem with running rsync to do the final sync step is that this
has no understanding of extended attributes.  For current (1.4.8)
versions of Lustre the OST EAs are not required for the correct
operation of the filesystem (they are redundant information to assist
recovery in case of corruption), but in the future that may not be true."

The (offline) method of migrating an OST is here:
https://bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the above
I guess you should probably run the getfattr/setfattr commands on the OSTs as
well as the MDT.

Stephen

----- "Stuart Midgley" <sdm900@gmail.com>
wrote:> Morning
> 
> We have some failing hardware in an oss (md raid5 over 6 x 750GB sata 
> 
> disks) and I would like to migrate all the data off to a new oss.   
> This looks relatively straight forward enough, but can you do it semi-
> 
> live?
> 
> What I want to do is mount the lustre partition read only while it is 
> 
> still being served up and rsync the data off to a new oss (which will 
> 
> take ~24hrs).  Then, bring the entire cluster down, do the last rsync 
> 
> (which will hopefully be fast), turn the old oss off and bring the  
> new replacement oss into the cluster.  In this way,  downtime will be 
> 
> minimised.
> 
> Will this work?  Has anybody tired such a scheme?
> 
> Thanks.
> 
> 
> -- 
> Dr Stuart Midgley
> sdm900@gmail.com
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

-- 
Stephen Willey

Senior Systems Engineer
Framestore CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com

Stuart Midgley

2007-Jun-06 05:05 UTC

head link

[Lustre-discuss] failing hardware in ost

Yes, I had seen the issue of extended attributes and that doesn''t  
worry me that much.  It won''t be much information for the last copy  
of data.

I guess my query comes down to how safe it is to mount the raw lustre  
partition ro while it is still being served... and then to copy data  
off it.  I appreciate there will be a performance penalty.

Stu.


On 06/06/2007, at 4:52 PM, Stephen Willey wrote:
> I inquired about this a while back and got the following:
>
> "In order to minimize downtime, it would also be possible to use  
> the ext2
> "dump" program in order to do device-level backups (including the
> extended attributes) while the filesystem is in use.  This backup  
> would
> not be 100% coherent with the actual filesystem.
>
> The problem with running rsync to do the final sync step is that this
> has no understanding of extended attributes.  For current (1.4.8)
> versions of Lustre the OST EAs are not required for the correct
> operation of the filesystem (they are redundant information to assist
> recovery in case of corruption), but in the future that may not be  
> true."
>
> The (offline) method of migrating an OST is here: https:// 
> bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the  
> above I guess you should probably run the getfattr/setfattr  
> commands on the OSTs as well as the MDT.
>
> Stephen
-- 
Dr Stuart Midgley
sdm900@gmail.com

Stephen Willey

2007-Jun-06 05:34 UTC

head link

[Lustre-discuss] failing hardware in ost

I don''t think you can ro mount the fs while it''s in use, hence
the suggestion of the e2fs dump.

Stephen



----- "Stuart Midgley" <sdm900@gmail.com>
wrote:> Yes, I had seen the issue of extended attributes and that doesn''t
> worry me that much.  It won''t be much information for the last
copy
> of data.
> 
> I guess my query comes down to how safe it is to mount the raw lustre 
> 
> partition ro while it is still being served... and then to copy data 
> 
> off it.  I appreciate there will be a performance penalty.
> 
> Stu.
> 
> 
> On 06/06/2007, at 4:52 PM, Stephen Willey wrote:
> 
> > I inquired about this a while back and got the following:
> >
> > "In order to minimize downtime, it would also be possible to use
> > the ext2
> > "dump" program in order to do device-level backups
(including the
> > extended attributes) while the filesystem is in use.  This backup  
> > would
> > not be 100% coherent with the actual filesystem.
> >
> > The problem with running rsync to do the final sync step is that
> this
> > has no understanding of extended attributes.  For current (1.4.8)
> > versions of Lustre the OST EAs are not required for the correct
> > operation of the filesystem (they are redundant information to
> assist
> > recovery in case of corruption), but in the future that may not be 
> 
> > true."
> >
> > The (offline) method of migrating an OST is here: https:// 
> > bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the  
> > above I guess you should probably run the getfattr/setfattr  
> > commands on the OSTs as well as the MDT.
> >
> > Stephen
> 
> -- 
> Dr Stuart Midgley
> sdm900@gmail.com

-- 
Stephen Willey

Senior Systems Engineer
Framestore CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com

Stuart Midgley

2007-Jun-06 06:11 UTC

head link

[Lustre-discuss] failing hardware in ost

hmmm... ok, thinking about it, this may be exactly what I need.

I can do a dump of the device, pipe it via ssh to the new system  
straight into restore

	dump /dev/md1 -f - | ssh username@new_oss "cd /mnt && restore -rf
- "

which will take a while (days).  Once it completes, I can shutdown  
the cluster, do a final rsync, copy across the extended attributes...  
and bring the new node up as the old oss.

??

Stu.

On 06/06/2007, at 7:34 PM, Stephen Willey wrote:
> I don''t think you can ro mount the fs while it''s in use,
hence the
> suggestion of the e2fs dump.
>
> Stephen
>
-- 
Dr Stuart Midgley
sdm900@gmail.com

Felix, Evan J

2007-Jun-06 10:57 UTC

head link

[Lustre-discuss] failing hardware in ost

How many other OSS''s do you have, and do you have enough space to just
migrate the data off the ost?  You can copy all the files you know are
on the ost, and then remove the original.  If you deactivate the failing
ost on the MDS, no new files will get placed there.  Once you have
everything off the ost, the final dump/restore should go fairly quickly

Evan
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com 
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of 
> Stuart Midgley
> Sent: Tuesday, June 05, 2007 5:02 PM
> To: lustre
> Subject: [Lustre-discuss] failing hardware in ost
> 
> Morning
> 
> We have some failing hardware in an oss (md raid5 over 6 x 
> 750GB sata  
> disks) and I would like to migrate all the data off to a new oss.   
> This looks relatively straight forward enough, but can you do 
> it semi- live?
> 
> What I want to do is mount the lustre partition read only 
> while it is still being served up and rsync the data off to a 
> new oss (which will take ~24hrs).  Then, bring the entire 
> cluster down, do the last rsync (which will hopefully be 
> fast), turn the old oss off and bring the new replacement oss 
> into the cluster.  In this way,  downtime will be minimised.
> 
> Will this work?  Has anybody tired such a scheme?
> 
> Thanks.
> 
> 
> --
> Dr Stuart Midgley
> sdm900@gmail.com
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Stuart Midgley

2007-Jun-06 17:17 UTC

head link

[Lustre-discuss] failing hardware in ost

Yes and no.  A lot of the data is scratch, so within a month or so, I  
could delete most of it and copy the rest off.... but I wanted to  
move faster than that :)

Deactivating the ost would be preferable anyway, it would mean my  
dump would be almost 100% up-to-date.  I''ve been trawling the  
documentation, how do you deactivate an ost (stop new files being  
written to it) while keeping it online (allowing existing files to be  
read)?

Thanks
Stu.

On 07/06/2007, at 12:57 AM, Felix, Evan J wrote:
> How many other OSS''s do you have, and do you have enough space to
just
> migrate the data off the ost?  You can copy all the files you know are
> on the ost, and then remove the original.  If you deactivate the  
> failing
> ost on the MDS, no new files will get placed there.  Once you have
> everything off the ost, the final dump/restore should go fairly  
> quickly
>
> Evan
-- 
Dr Stuart Midgley
sdm900@gmail.com

Adam Cassar

2007-Jun-06 17:42 UTC

head link

[Lustre-discuss] failing hardware in ost

on the mds:

lctl --device N deactivate

Stuart Midgley wrote:> Yes and no.  A lot of the data is scratch, so within a month or so, I 
> could delete most of it and copy the rest off.... but I wanted to move 
> faster than that :)
> 
> Deactivating the ost would be preferable anyway, it would mean my dump 
> would be almost 100% up-to-date.  I''ve been trawling the
documentation,
> how do you deactivate an ost (stop new files being written to it) while 
> keeping it online (allowing existing files to be read)?
> 
> Thanks
> Stu.
> 
>     
> On 07/06/2007, at 12:57 AM, Felix, Evan J wrote:
> 
>> How many other OSS''s do you have, and do you have enough space
to just
>> migrate the data off the ost?  You can copy all the files you know are
>> on the ost, and then remove the original.  If you deactivate the
failing
>> ost on the MDS, no new files will get placed there.  Once you have
>> everything off the ost, the final dump/restore should go fairly quickly
>>
>> Evan
>

Stuart Midgley

2007-Jun-06 20:56 UTC

head link

[Lustre-discuss] failing hardware in ost

This does not appear to be working.  The command executes fine...

I run a test script which creates a 100 files in quick succession and then
run lfs get_stripe <dir> to see which ost they are on... and they are
still going to the ones I''ve deactiviated.

Stu.

> on the mds:
>
> lctl --device N deactivate
>
-- 
Dr Stuart Midgley
sdm900@gmail.com

Stu Midgley

2007-Jun-06 22:19 UTC

head link

[Lustre-discuss] failing hardware in ost

ok, my mistake, I was putting the obd number not the device number.  I
now have it corrected and files are no longer being created on the
ost.

Thanks
Stu.


On 6/7/07, Stuart Midgley <sdm900@gmail.com>
wrote:> This does not appear to be working.  The command executes fine...
>
> I run a test script which creates a 100 files in quick succession and then
> run lfs get_stripe <dir> to see which ost they are on... and they are
> still going to the ones I''ve deactiviated.
>
> Stu.
>

-- 
Dr Stuart Midgley
sdm900@gmail.com

Peter Kjellstrom

2007-Jun-07 05:31 UTC

head link

[Lustre-discuss] failing hardware in ost

On Wednesday 06 June 2007, Stuart Midgley wrote:> hmmm... ok, thinking about it, this may be exactly what I need.
>
> I can do a dump of the device, pipe it via ssh to the new system
> straight into restore
>
> 	dump /dev/md1 -f - | ssh username@new_oss "cd /mnt && restore
-rf - "
>
> which will take a while (days).
With the obvious disadvantage of being non-resumable... I personally dislike 
stuff that can''t be restarted, especially if we''re talking
days.

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070607/5f6bcc44/attachment.bin

Robert LeBlanc

2007-Jun-07 08:09 UTC

head link

[Lustre-discuss] failing hardware in ost

Just to be clear on this, you did:

lctl --device /dev/sda deactivate

Did you say that you are using 1.4 or 1.6? Does it make a difference?

Thanks,
Robert


On 6/6/07 10:19 PM, "Stu Midgley" <sdm900@gmail.com> wrote:
> ok, my mistake, I was putting the obd number not the device number.  I
> now have it corrected and files are no longer being created on the
> ost.
> 
> Thanks
> Stu.
> 
> 
> On 6/7/07, Stuart Midgley <sdm900@gmail.com> wrote:
>> This does not appear to be working.  The command executes fine...
>> 
>> I run a test script which creates a 100 files in quick succession and
then
>> run lfs get_stripe <dir> to see which ost they are on... and they
are
>> still going to the ones I''ve deactiviated.
>> 
>> Stu.
>> 
> 
 
Robert LeBlanc
BioAg Computer Support
Brigham Young University
leblanc@byu.edu
(801)422-1882

Stuart Midgley

2007-Jun-07 08:26 UTC

head link

[Lustre-discuss] failing hardware in ost

no, I did

	lctl device_list

which lists all the devices and their number (I originally mis- 
interpreted the number)... then I did

	lctl --device 7 deactivate
	lctl --device 8 deactivate
	lclt --device 9 deactivate

which deactiviated the nodes.  My mis-understanding was with the  
device number.  In my first few attempts I put the OBD number (from  
lfs osts).

Stu.

On 07/06/2007, at 10:09 PM, Robert LeBlanc wrote:
> Just to be clear on this, you did:
>
> lctl --device /dev/sda deactivate
>
> Did you say that you are using 1.4 or 1.6? Does it make a difference?
>
> Thanks,
> Robert
>
-- 
Dr Stuart Midgley
sdm900@gmail.com

Robert LeBlanc

2007-Jun-07 08:33 UTC

head link

[Lustre-discuss] failing hardware in ost

Thanks. I''m still learning all about Lustre and this seems like it
could
come in handy if I ever needed it.

Thanks,
Robert


On 6/7/07 8:26 AM, "Stuart Midgley" <sdm900@gmail.com> wrote:
> no, I did
> 
> lctl device_list
> 
> which lists all the devices and their number (I originally mis-
> interpreted the number)... then I did
> 
> lctl --device 7 deactivate
> lctl --device 8 deactivate
> lclt --device 9 deactivate
> 
> which deactiviated the nodes.  My mis-understanding was with the
> device number.  In my first few attempts I put the OBD number (from
> lfs osts).
> 
> Stu.
> 
> 
> On 07/06/2007, at 10:09 PM, Robert LeBlanc wrote:
> 
>> Just to be clear on this, you did:
>> 
>> lctl --device /dev/sda deactivate
>> 
>> Did you say that you are using 1.4 or 1.6? Does it make a difference?
>> 
>> Thanks,
>> Robert
>> 
 
Robert LeBlanc
BioAg Computer Support
Brigham Young University
leblanc@byu.edu
(801)422-1882

Stuart Midgley

2007-Jun-10 06:22 UTC

head link

[Lustre-discuss] failing hardware in ost

OK, managed to move an oss to another node.  Roughly:

Deactivated the broken OSS.  Used dump to dump the raw ost to another  
system while lustre was live and serving our cluster.  Once the dump  
had finished (~30hrs for 3T) shut all lustre clients down (except for  
1).  Used the 1 client to do md5 check sums on around 10% of the  
files on the broken ost (random) and saved the result.  Unmounted the  
final client, stop lustre on all oss''s and mds.  Mounted the broke  
oss ost as an ext file system (same on temporary system) and did a  
final rsync.  This took about 15mins.  Then shutdown the broken oss  
and brought the temporary system up in its place.  Restarted lustre  
with 1 client and checked the md5 check sums to make sure files had  
been copied reliably.  Then got back to work.

Stu.

On 06/06/2007, at 4:52 PM, Stephen Willey wrote:
> I inquired about this a while back and got the following:
>
> "In order to minimize downtime, it would also be possible to use  
> the ext2
> "dump" program in order to do device-level backups (including the
> extended attributes) while the filesystem is in use.  This backup  
> would
> not be 100% coherent with the actual filesystem.
>
> The problem with running rsync to do the final sync step is that this
> has no understanding of extended attributes.  For current (1.4.8)
> versions of Lustre the OST EAs are not required for the correct
> operation of the filesystem (they are redundant information to assist
> recovery in case of corruption), but in the future that may not be  
> true."
>
> The (offline) method of migrating an OST is here: https:// 
> bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the  
> above I guess you should probably run the getfattr/setfattr  
> commands on the OSTs as well as the MDT.
>
> Stephen
-- 
Dr Stuart Midgley
sdm900@gmail.com

Lustre discuss - Jun 2007 - failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost

[Lustre-discuss] failing hardware in ost