Nick Jennings
2009-Dec-30 02:33 UTC
[Lustre-discuss] MD1000 woes and OSS migration suggestions
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Everyone, We''ve been using an MD1000 as our storage array for close to a year now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 more servers, one to be hooked up to the MD1000 to help distribute the load, the other to act as a lustre client (web node). The hosting company informs me that the MD1000 was never setup to operate in split mode (which I asked for in the beginning) so basically only one server can be connected to it. I now am faced with a tough call, we can''t bring the filesystem down for any extended period of time (a few minutes is OK, though 0 downtime would be perfect!) and I''m not sure how to proceed in a way that would make things cause the least amount of headache. The only thing I can think of is to set up a second MD1000 (configured for split mode) connect it to OSS2 (the new one which is not yet being used), add it to the Lustre filesystem and then somehow migrate the data from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 offline, and connect it to the second partition of new MD1000 and bring that end online once more. I''ve never done anything like this and am not entirely sure if this is the best method. Any suggestions, alternatives, docs or things to look out for would be greatly appreciated. Thanks, Nick - -- Nick Jennings Director of Technology Creative Motion Design www.creativemotiondesign.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAks6u+QACgkQ3WjKacHecdMqgwCfZorkD1w1ri3I2/M3APHIpxQI /68An0GvkWvR6F5vOY5zz9Ty2u23rtaO =Rurj -----END PGP SIGNATURE-----
Andreas Dilger
2009-Dec-30 22:44 UTC
[Lustre-discuss] MD1000 woes and OSS migration suggestions
On 2009-12-29, at 19:33, Nick Jennings wrote:> Hi Everyone,Hi Nick,> We''ve been using an MD1000 as our storage array for close to a year > now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 > more > servers, one to be hooked up to the MD1000 to help distribute the > load, > the other to act as a lustre client (web node). > > The hosting company informs me that the MD1000 was never setup to > operate in split mode (which I asked for in the beginning) so > basically > only one server can be connected to it. > > I now am faced with a tough call, we can''t bring the filesystem down > for any extended period of time (a few minutes is OK, though 0 > downtime > would be perfect!) and I''m not sure how to proceed in a way that would > make things cause the least amount of headache. > > The only thing I can think of is to set up a second MD1000 (configured > for split mode) connect it to OSS2 (the new one which is not yet being > used), add it to the Lustre filesystem and then somehow migrate the > data > from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 > offline, and connect it to the second partition of new MD1000 and > bring > that end online once more.There is a section in the manual about "manual data migration", which should let you move the data. Note that this is not 100% transparent to applications, but it is safe if you know either that files are unused, or only open for reading. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Aaron Knister
2009-Dec-31 00:34 UTC
[Lustre-discuss] MD1000 woes and OSS migration suggestions
Andreas, Out of sheer curiosity, I was wondering when the "lfs migrate" feature mentioned on the lustre arch wiki might show up :) I can''t wait to add it to my lustre utility belt. -Aaron On Dec 30, 2009, at 5:44 PM, Andreas Dilger wrote:> On 2009-12-29, at 19:33, Nick Jennings wrote: >> Hi Everyone, > > Hi Nick, > >> We''ve been using an MD1000 as our storage array for close to a year >> now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 >> more >> servers, one to be hooked up to the MD1000 to help distribute the >> load, >> the other to act as a lustre client (web node). >> >> The hosting company informs me that the MD1000 was never setup to >> operate in split mode (which I asked for in the beginning) so >> basically >> only one server can be connected to it. >> >> I now am faced with a tough call, we can''t bring the filesystem down >> for any extended period of time (a few minutes is OK, though 0 >> downtime >> would be perfect!) and I''m not sure how to proceed in a way that would >> make things cause the least amount of headache. >> >> The only thing I can think of is to set up a second MD1000 (configured >> for split mode) connect it to OSS2 (the new one which is not yet being >> used), add it to the Lustre filesystem and then somehow migrate the >> data >> from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 >> offline, and connect it to the second partition of new MD1000 and >> bring >> that end online once more. > > There is a section in the manual about "manual data migration", which > should let you move the data. Note that this is not 100% transparent > to applications, but it is safe if you know either that files are > unused, or only open for reading. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Wojciech Turek
2009-Dec-31 01:55 UTC
[Lustre-discuss] MD1000 woes and OSS migration suggestions
Hi Nick, I don''t think you should invest into new MD1000 brick just to make it working in split mode. Split mode doesn''t give you much, except that you will have your 15 disk split between two servers but individual servers won''t be able to see other half of the MD1000 storage. This is not that great because you don''t get extra redundancy and failover functionality in lustre. I think the best approach here will be to buy MD3000 RAID array enclosure (which is basically MD1000 + two built in raid controller modules). It costs around ?1.5k more than MD1000 but it is definitely worth it. MD3000 allows to connect up to two servers with fully redundant data paths from each server to any virtual disk configured on the MD3000 controller. if you follow link below you find cabling diagram Figure 2-9. Cabling Two Hosts (with Dual HBAs) Using Redundant Data Paths Also you can connect maximum up to four server with non redundant data paths, Figure 2-6. Cabling Up to Four Hosts with Non redundant Data Paths http://support.dell.com/support/edocs/systems/md3000/en/2ndGen/HOM/HTML/operate.htm In addition to that you can hook up extra two MD1000 enclosures to a single MD3000 array and they will be managed by MD3000 RAID controllers which will make your life much easier. In order to migrate your data from lustre file system ''lustre1'' located on OSS1 I suggest to set up brand new lustre file system ''lustre2'' on OSS2 connected to MD3000 enclosure and then using your third server acting as a lustre client mount both lustre file systems and copy data from lustre1 to lustre2. At some point you will need to make lustre1 quiescent so there is no new writes done to it, you can do that by deactivating all lustre1 OST on the MDS and then you can make a final rsync between lustre1 and lustre2. Once this is done you can umount lustre1 and lustre2 and then mount lustre2 back under lustre1 mount point. Once you have your production lustre filesystem working on lustre2 you can disconnect MD1000 from OSS1 and connect it to MD3000 expansion ports. You can also connect OSS01 to MD3000 controller ports. This way you will get extra space available from added MD1000 which you can use to configure new OSTs and add them to lustre2 file system. Since both OSS1 and OSS2 can see each others OSTs (thanks to MD3000) you can configure lustre failover on this servers. If you will need more capacity in the future you can just connect second MD1000 to your MD3000 controller. In my cluster I have six (MD3000 MD1000 MD1000) triplets configured as a single large lustre file system which provides around 180TB RAID6 usable space and it works pretty well providing very good aggregated bandwidth. If you will have more questions don''t hesitate to drop me an email. I have a bit of experience (bad and good) with this Dell hardware and I am happy to help. Best regards, Wojciech 2009/12/30 Nick Jennings <nick at creativemotiondesign.com>:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Everyone, > > ?We''ve been using an MD1000 as our storage array for close to a year > now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 more > servers, one to be hooked up to the MD1000 to help distribute the load, > the other to act as a lustre client (web node). > > ?The hosting company informs me that the MD1000 was never setup to > operate in split mode (which I asked for in the beginning) so basically > only one server can be connected to it. > > ?I now am faced with a tough call, we can''t bring the filesystem down > for any extended period of time (a few minutes is OK, though 0 downtime > would be perfect!) and I''m not sure how to proceed in a way that would > make things cause the least amount of headache. > > ?The only thing I can think of is to set up a second MD1000 (configured > for split mode) connect it to OSS2 (the new one which is not yet being > used), add it to the Lustre filesystem and then somehow migrate the data > from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 > offline, and connect it to the second partition of new MD1000 and bring > that end online once more. > > ?I''ve never done anything like this and am not entirely sure if this is > the best method. Any suggestions, alternatives, docs or things to look > out for would be greatly appreciated. > > Thanks, > Nick > > - -- > Nick Jennings > Director of Technology > Creative Motion Design > www.creativemotiondesign.com > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAks6u+QACgkQ3WjKacHecdMqgwCfZorkD1w1ri3I2/M3APHIpxQI > /68An0GvkWvR6F5vOY5zz9Ty2u23rtaO > =Rurj > -----END PGP SIGNATURE----- > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517
Andreas Dilger
2009-Dec-31 21:43 UTC
[Lustre-discuss] MD1000 woes and OSS migration suggestions
On 2009-12-30, at 17:34, Aaron Knister wrote:> Out of sheer curiosity, I was wondering when the "lfs migrate" > feature mentioned on the lustre arch wiki might show up :) > I can''t wait to add it to my lustre utility belt.The plumbing for this feature is being implemented as part of the HSM project. The HSM functionality will be available in the 2.1 release (end 2010/early 2011), and online migration can be completed after that time.> On Dec 30, 2009, at 5:44 PM, Andreas Dilger wrote: >> On 2009-12-29, at 19:33, Nick Jennings wrote: >>> Hi Everyone, >> >> Hi Nick, >> >>> We''ve been using an MD1000 as our storage array for close to a year >>> now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 >>> more >>> servers, one to be hooked up to the MD1000 to help distribute the >>> load, >>> the other to act as a lustre client (web node). >>> >>> The hosting company informs me that the MD1000 was never setup to >>> operate in split mode (which I asked for in the beginning) so >>> basically >>> only one server can be connected to it. >>> >>> I now am faced with a tough call, we can''t bring the filesystem down >>> for any extended period of time (a few minutes is OK, though 0 >>> downtime >>> would be perfect!) and I''m not sure how to proceed in a way that >>> would >>> make things cause the least amount of headache. >>> >>> The only thing I can think of is to set up a second MD1000 >>> (configured >>> for split mode) connect it to OSS2 (the new one which is not yet >>> being >>> used), add it to the Lustre filesystem and then somehow migrate the >>> data >>> from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 >>> offline, and connect it to the second partition of new MD1000 and >>> bring >>> that end online once more. >> >> There is a section in the manual about "manual data migration", which >> should let you move the data. Note that this is not 100% transparent >> to applications, but it is safe if you know either that files are >> unused, or only open for reading. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Does HP offer something similar to what you are saying Wojciech? It sounds very impressive. On Wed, Dec 30, 2009 at 8:55 PM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:> Hi Nick, > > I don''t think you should invest into new MD1000 brick just to make it > working in split mode. > Split mode doesn''t give you much, except that you will have your 15 > disk split between two servers but individual servers won''t be able to > see other half of the MD1000 storage. This is not that great because > you don''t get extra redundancy and failover functionality in lustre. > I think the best approach here will be to buy MD3000 RAID array > enclosure (which is basically MD1000 + two built in raid controller > modules). It costs around ?1.5k more than MD1000 but it is definitely > worth it. > MD3000 allows to connect up to two servers with fully redundant data > paths from each server to any virtual disk configured on the MD3000 > controller. if you follow link below you find cabling diagram Figure > 2-9. Cabling Two Hosts (with Dual HBAs) Using Redundant Data Paths > Also you can connect maximum up to four server with non redundant data > paths, Figure 2-6. Cabling Up to Four Hosts with Non redundant Data > Paths > http://support.dell.com/support/edocs/systems/md3000/en/2ndGen/HOM/HTML/operate.htm > In addition to that you can hook up extra two MD1000 enclosures to a > single MD3000 array and they will be managed by MD3000 RAID > controllers which will make your life much easier. > > In order to migrate your data from lustre file system ''lustre1'' > located on OSS1 I suggest to set up brand new lustre file system > ''lustre2'' on OSS2 connected to MD3000 enclosure and then using your > third server acting as a lustre client mount both lustre file systems > and copy data from lustre1 to lustre2. At some point you will need to > make lustre1 quiescent so there is no new writes done to it, you can > do that by deactivating all lustre1 OST on the MDS and then you can > make a final rsync between lustre1 and lustre2. Once this is done you > can umount lustre1 and lustre2 and then mount lustre2 back under > lustre1 mount point. Once you have your production lustre filesystem > working on lustre2 you can disconnect MD1000 from OSS1 and connect it > to MD3000 expansion ports. You can also connect OSS01 to MD3000 > controller ports. This way you will get extra space available from > added MD1000 which you can use to configure new OSTs and add them to > lustre2 file system. Since both OSS1 and OSS2 can see each others OSTs > (thanks to MD3000) you can configure lustre failover on this servers. > If you will need more capacity in the future you can just connect > second MD1000 to your MD3000 controller. > > In my cluster I have six (MD3000 MD1000 MD1000) triplets configured as > a single large lustre file system which provides around 180TB RAID6 > usable space and it works pretty well providing very good aggregated > bandwidth. > > If you will have more questions don''t hesitate to drop me an email. I > have a bit of experience (bad and good) with this Dell hardware and I > am happy to help. > > Best regards, > > Wojciech > > > > 2009/12/30 Nick Jennings <nick at creativemotiondesign.com>: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi Everyone, >> >> ?We''ve been using an MD1000 as our storage array for close to a year >> now, just hooked up to one OSS (LVM+ldiskfs). I recently ordered 2 more >> servers, one to be hooked up to the MD1000 to help distribute the load, >> the other to act as a lustre client (web node). >> >> ?The hosting company informs me that the MD1000 was never setup to >> operate in split mode (which I asked for in the beginning) so basically >> only one server can be connected to it. >> >> ?I now am faced with a tough call, we can''t bring the filesystem down >> for any extended period of time (a few minutes is OK, though 0 downtime >> would be perfect!) and I''m not sure how to proceed in a way that would >> make things cause the least amount of headache. >> >> ?The only thing I can think of is to set up a second MD1000 (configured >> for split mode) connect it to OSS2 (the new one which is not yet being >> used), add it to the Lustre filesystem and then somehow migrate the data >> from OSS1 (old MD1000) to OSS2 (new MD1000) ... then, bring OSS1 >> offline, and connect it to the second partition of new MD1000 and bring >> that end online once more. >> >> ?I''ve never done anything like this and am not entirely sure if this is >> the best method. Any suggestions, alternatives, docs or things to look >> out for would be greatly appreciated. >> >> Thanks, >> Nick >> >> - -- >> Nick Jennings >> Director of Technology >> Creative Motion Design >> www.creativemotiondesign.com >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.9 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ >> >> iEYEARECAAYFAks6u+QACgkQ3WjKacHecdMqgwCfZorkD1w1ri3I2/M3APHIpxQI >> /68An0GvkWvR6F5vOY5zz9Ty2u23rtaO >> =Rurj >> -----END PGP SIGNATURE----- >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > > -- > -- > Wojciech Turek > > Assistant System Manager > > High Performance Computing Service > University of Cambridge > Email: wjt27 at cam.ac.uk > Tel: (+)44 1223 763517 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >