Morning We have some failing hardware in an oss (md raid5 over 6 x 750GB sata disks) and I would like to migrate all the data off to a new oss. This looks relatively straight forward enough, but can you do it semi- live? What I want to do is mount the lustre partition read only while it is still being served up and rsync the data off to a new oss (which will take ~24hrs). Then, bring the entire cluster down, do the last rsync (which will hopefully be fast), turn the old oss off and bring the new replacement oss into the cluster. In this way, downtime will be minimised. Will this work? Has anybody tired such a scheme? Thanks. -- Dr Stuart Midgley sdm900@gmail.com
I inquired about this a while back and got the following: "In order to minimize downtime, it would also be possible to use the ext2 "dump" program in order to do device-level backups (including the extended attributes) while the filesystem is in use. This backup would not be 100% coherent with the actual filesystem. The problem with running rsync to do the final sync step is that this has no understanding of extended attributes. For current (1.4.8) versions of Lustre the OST EAs are not required for the correct operation of the filesystem (they are redundant information to assist recovery in case of corruption), but in the future that may not be true." The (offline) method of migrating an OST is here: https://bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the above I guess you should probably run the getfattr/setfattr commands on the OSTs as well as the MDT. Stephen ----- "Stuart Midgley" <sdm900@gmail.com> wrote:> Morning > > We have some failing hardware in an oss (md raid5 over 6 x 750GB sata > > disks) and I would like to migrate all the data off to a new oss. > This looks relatively straight forward enough, but can you do it semi- > > live? > > What I want to do is mount the lustre partition read only while it is > > still being served up and rsync the data off to a new oss (which will > > take ~24hrs). Then, bring the entire cluster down, do the last rsync > > (which will hopefully be fast), turn the old oss off and bring the > new replacement oss into the cluster. In this way, downtime will be > > minimised. > > Will this work? Has anybody tired such a scheme? > > Thanks. > > > -- > Dr Stuart Midgley > sdm900@gmail.com > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- Stephen Willey Senior Systems Engineer Framestore CFC +44 (0)207 344 8000 http://www.framestore-cfc.com
Yes, I had seen the issue of extended attributes and that doesn''t worry me that much. It won''t be much information for the last copy of data. I guess my query comes down to how safe it is to mount the raw lustre partition ro while it is still being served... and then to copy data off it. I appreciate there will be a performance penalty. Stu. On 06/06/2007, at 4:52 PM, Stephen Willey wrote:> I inquired about this a while back and got the following: > > "In order to minimize downtime, it would also be possible to use > the ext2 > "dump" program in order to do device-level backups (including the > extended attributes) while the filesystem is in use. This backup > would > not be 100% coherent with the actual filesystem. > > The problem with running rsync to do the final sync step is that this > has no understanding of extended attributes. For current (1.4.8) > versions of Lustre the OST EAs are not required for the correct > operation of the filesystem (they are redundant information to assist > recovery in case of corruption), but in the future that may not be > true." > > The (offline) method of migrating an OST is here: https:// > bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the > above I guess you should probably run the getfattr/setfattr > commands on the OSTs as well as the MDT. > > Stephen-- Dr Stuart Midgley sdm900@gmail.com
I don''t think you can ro mount the fs while it''s in use, hence the suggestion of the e2fs dump. Stephen ----- "Stuart Midgley" <sdm900@gmail.com> wrote:> Yes, I had seen the issue of extended attributes and that doesn''t > worry me that much. It won''t be much information for the last copy > of data. > > I guess my query comes down to how safe it is to mount the raw lustre > > partition ro while it is still being served... and then to copy data > > off it. I appreciate there will be a performance penalty. > > Stu. > > > On 06/06/2007, at 4:52 PM, Stephen Willey wrote: > > > I inquired about this a while back and got the following: > > > > "In order to minimize downtime, it would also be possible to use > > the ext2 > > "dump" program in order to do device-level backups (including the > > extended attributes) while the filesystem is in use. This backup > > would > > not be 100% coherent with the actual filesystem. > > > > The problem with running rsync to do the final sync step is that > this > > has no understanding of extended attributes. For current (1.4.8) > > versions of Lustre the OST EAs are not required for the correct > > operation of the filesystem (they are redundant information to > assist > > recovery in case of corruption), but in the future that may not be > > > true." > > > > The (offline) method of migrating an OST is here: https:// > > bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the > > above I guess you should probably run the getfattr/setfattr > > commands on the OSTs as well as the MDT. > > > > Stephen > > -- > Dr Stuart Midgley > sdm900@gmail.com-- Stephen Willey Senior Systems Engineer Framestore CFC +44 (0)207 344 8000 http://www.framestore-cfc.com
hmmm... ok, thinking about it, this may be exactly what I need. I can do a dump of the device, pipe it via ssh to the new system straight into restore dump /dev/md1 -f - | ssh username@new_oss "cd /mnt && restore -rf - " which will take a while (days). Once it completes, I can shutdown the cluster, do a final rsync, copy across the extended attributes... and bring the new node up as the old oss. ?? Stu. On 06/06/2007, at 7:34 PM, Stephen Willey wrote:> I don''t think you can ro mount the fs while it''s in use, hence the > suggestion of the e2fs dump. > > Stephen >-- Dr Stuart Midgley sdm900@gmail.com
How many other OSS''s do you have, and do you have enough space to just migrate the data off the ost? You can copy all the files you know are on the ost, and then remove the original. If you deactivate the failing ost on the MDS, no new files will get placed there. Once you have everything off the ost, the final dump/restore should go fairly quickly Evan> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > Stuart Midgley > Sent: Tuesday, June 05, 2007 5:02 PM > To: lustre > Subject: [Lustre-discuss] failing hardware in ost > > Morning > > We have some failing hardware in an oss (md raid5 over 6 x > 750GB sata > disks) and I would like to migrate all the data off to a new oss. > This looks relatively straight forward enough, but can you do > it semi- live? > > What I want to do is mount the lustre partition read only > while it is still being served up and rsync the data off to a > new oss (which will take ~24hrs). Then, bring the entire > cluster down, do the last rsync (which will hopefully be > fast), turn the old oss off and bring the new replacement oss > into the cluster. In this way, downtime will be minimised. > > Will this work? Has anybody tired such a scheme? > > Thanks. > > > -- > Dr Stuart Midgley > sdm900@gmail.com > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Yes and no. A lot of the data is scratch, so within a month or so, I could delete most of it and copy the rest off.... but I wanted to move faster than that :) Deactivating the ost would be preferable anyway, it would mean my dump would be almost 100% up-to-date. I''ve been trawling the documentation, how do you deactivate an ost (stop new files being written to it) while keeping it online (allowing existing files to be read)? Thanks Stu. On 07/06/2007, at 12:57 AM, Felix, Evan J wrote:> How many other OSS''s do you have, and do you have enough space to just > migrate the data off the ost? You can copy all the files you know are > on the ost, and then remove the original. If you deactivate the > failing > ost on the MDS, no new files will get placed there. Once you have > everything off the ost, the final dump/restore should go fairly > quickly > > Evan-- Dr Stuart Midgley sdm900@gmail.com
on the mds: lctl --device N deactivate Stuart Midgley wrote:> Yes and no. A lot of the data is scratch, so within a month or so, I > could delete most of it and copy the rest off.... but I wanted to move > faster than that :) > > Deactivating the ost would be preferable anyway, it would mean my dump > would be almost 100% up-to-date. I''ve been trawling the documentation, > how do you deactivate an ost (stop new files being written to it) while > keeping it online (allowing existing files to be read)? > > Thanks > Stu. > > > On 07/06/2007, at 12:57 AM, Felix, Evan J wrote: > >> How many other OSS''s do you have, and do you have enough space to just >> migrate the data off the ost? You can copy all the files you know are >> on the ost, and then remove the original. If you deactivate the failing >> ost on the MDS, no new files will get placed there. Once you have >> everything off the ost, the final dump/restore should go fairly quickly >> >> Evan >
This does not appear to be working. The command executes fine... I run a test script which creates a 100 files in quick succession and then run lfs get_stripe <dir> to see which ost they are on... and they are still going to the ones I''ve deactiviated. Stu.> on the mds: > > lctl --device N deactivate >-- Dr Stuart Midgley sdm900@gmail.com
ok, my mistake, I was putting the obd number not the device number. I now have it corrected and files are no longer being created on the ost. Thanks Stu. On 6/7/07, Stuart Midgley <sdm900@gmail.com> wrote:> This does not appear to be working. The command executes fine... > > I run a test script which creates a 100 files in quick succession and then > run lfs get_stripe <dir> to see which ost they are on... and they are > still going to the ones I''ve deactiviated. > > Stu. >-- Dr Stuart Midgley sdm900@gmail.com
On Wednesday 06 June 2007, Stuart Midgley wrote:> hmmm... ok, thinking about it, this may be exactly what I need. > > I can do a dump of the device, pipe it via ssh to the new system > straight into restore > > dump /dev/md1 -f - | ssh username@new_oss "cd /mnt && restore -rf - " > > which will take a while (days).With the obvious disadvantage of being non-resumable... I personally dislike stuff that can''t be restarted, especially if we''re talking days. /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070607/5f6bcc44/attachment.bin
Just to be clear on this, you did: lctl --device /dev/sda deactivate Did you say that you are using 1.4 or 1.6? Does it make a difference? Thanks, Robert On 6/6/07 10:19 PM, "Stu Midgley" <sdm900@gmail.com> wrote:> ok, my mistake, I was putting the obd number not the device number. I > now have it corrected and files are no longer being created on the > ost. > > Thanks > Stu. > > > On 6/7/07, Stuart Midgley <sdm900@gmail.com> wrote: >> This does not appear to be working. The command executes fine... >> >> I run a test script which creates a 100 files in quick succession and then >> run lfs get_stripe <dir> to see which ost they are on... and they are >> still going to the ones I''ve deactiviated. >> >> Stu. >> >Robert LeBlanc BioAg Computer Support Brigham Young University leblanc@byu.edu (801)422-1882
no, I did lctl device_list which lists all the devices and their number (I originally mis- interpreted the number)... then I did lctl --device 7 deactivate lctl --device 8 deactivate lclt --device 9 deactivate which deactiviated the nodes. My mis-understanding was with the device number. In my first few attempts I put the OBD number (from lfs osts). Stu. On 07/06/2007, at 10:09 PM, Robert LeBlanc wrote:> Just to be clear on this, you did: > > lctl --device /dev/sda deactivate > > Did you say that you are using 1.4 or 1.6? Does it make a difference? > > Thanks, > Robert >-- Dr Stuart Midgley sdm900@gmail.com
Thanks. I''m still learning all about Lustre and this seems like it could come in handy if I ever needed it. Thanks, Robert On 6/7/07 8:26 AM, "Stuart Midgley" <sdm900@gmail.com> wrote:> no, I did > > lctl device_list > > which lists all the devices and their number (I originally mis- > interpreted the number)... then I did > > lctl --device 7 deactivate > lctl --device 8 deactivate > lclt --device 9 deactivate > > which deactiviated the nodes. My mis-understanding was with the > device number. In my first few attempts I put the OBD number (from > lfs osts). > > Stu. > > > On 07/06/2007, at 10:09 PM, Robert LeBlanc wrote: > >> Just to be clear on this, you did: >> >> lctl --device /dev/sda deactivate >> >> Did you say that you are using 1.4 or 1.6? Does it make a difference? >> >> Thanks, >> Robert >>Robert LeBlanc BioAg Computer Support Brigham Young University leblanc@byu.edu (801)422-1882
OK, managed to move an oss to another node. Roughly: Deactivated the broken OSS. Used dump to dump the raw ost to another system while lustre was live and serving our cluster. Once the dump had finished (~30hrs for 3T) shut all lustre clients down (except for 1). Used the 1 client to do md5 check sums on around 10% of the files on the broken ost (random) and saved the result. Unmounted the final client, stop lustre on all oss''s and mds. Mounted the broke oss ost as an ext file system (same on temporary system) and did a final rsync. This took about 15mins. Then shutdown the broken oss and brought the temporary system up in its place. Restarted lustre with 1 client and checked the md5 check sums to make sure files had been copied reliably. Then got back to work. Stu. On 06/06/2007, at 4:52 PM, Stephen Willey wrote:> I inquired about this a while back and got the following: > > "In order to minimize downtime, it would also be possible to use > the ext2 > "dump" program in order to do device-level backups (including the > extended attributes) while the filesystem is in use. This backup > would > not be 100% coherent with the actual filesystem. > > The problem with running rsync to do the final sync step is that this > has no understanding of extended attributes. For current (1.4.8) > versions of Lustre the OST EAs are not required for the correct > operation of the filesystem (they are redundant information to assist > recovery in case of corruption), but in the future that may not be > true." > > The (offline) method of migrating an OST is here: https:// > bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the > above I guess you should probably run the getfattr/setfattr > commands on the OSTs as well as the MDT. > > Stephen-- Dr Stuart Midgley sdm900@gmail.com