Tharindu Rukshan Bamunuarachchi
2009-Jul-28 10:38 UTC
[Lustre-discuss] Lustre with failover configuration
hi all, we used sun cluster for shared storage access. we are going to migrate to lustre. we have two nodes connected to single 2540 disk array. we are going to keep OSS1/MDS in node A and OSS2/MDS in node B. Please find attached diagram. how should I configure lustre to support failover/failout. Ideally single node failure should be transparent to Lustre client. could you please provide commands to configure MDS/OSS with failover support. should I give same volume for both MDS/OSS ? cheers, __ tharindu ******************************************************************************************************************************************************************* "The information contained in this email including in any attachment is confidential and is meant to be read only by the person to whom it is addressed. If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email. If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer." ******************************************************************************************************************************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090728/957fc890/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: lustre-2-nodes.pdf Type: application/pdf Size: 72477 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090728/957fc890/attachment-0001.pdf
Hi Tharindu, Some comment inline... Tharindu Rukshan Bamunuarachchi wrote:> > hi all, > > > > we used sun cluster for shared storage access. we are going to migrate > to lustre. > > we have two nodes connected to single 2540 disk array. >Sun cluster runs on Solaris whereas Lustre is purely Linux. I hope you know that.> > > > we are going to keep OSS1/MDS in node A and OSS2/MDS in node B. > > Please find attached diagram. >It is generally not a good idea to co-locate OSS and MDS on same node. It kind of defeats the purpose of using Lustre which separates metadata from data. You can still use the shown architecture but I will advise MDS being primary on Node A with Node B acting as failover partner. Similarly, Node B being primary for OSS with Node A being failover partner. Also, unlike Sun Cluster, both nodes in Lustre failover pair can not talk to same storage simultaneously. They work as a active-passive pair of nodes. Please refer to Lustre manual which explains failover in detail (manual.lustre.org). Searching Lustre failover in Google shows some interesting links with detailed examples. See http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html> > > > how should I configure lustre to support failover/failout. Ideally > single node failure should be transparent to Lustre client. > > could you please provide commands to configure MDS/OSS with failover > support. >See Lustre manual.> > > should I give same volume for both MDS/OSS ? >No. Usually OSS resides on RAID6 volume whereas MDS resides on RAID10 volume. It can be from same storage box, ST2540 in your case. Lustre benefits from having multiple OSS nodes where IO can be striped across OSTs. Cheers, -Atul> > > > / / > > cheers, > > __ > > /tharindu/ > > > > ******************************************************************************************************************************************************************* > > "The information contained in this email including in any attachment > is confidential and is meant to be read only by the person to whom it > is addressed. If you are not the intended recipient(s), you are > prohibited from printing, forwarding, saving or copying this email. If > you have received this e-mail in error, please immediately notify the > sender and delete this e-mail and its attachments from your computer." > > ******************************************************************************************************************************************************************* > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- =================================Atul Vidwansa Sun Microsystems Australia Pty Ltd Web: http://blogs.sun.com/atulvid Email: Atul.Vidwansa at Sun.COM
Tharindu Rukshan Bamunuarachchi
2009-Jul-28 14:38 UTC
[Lustre-discuss] Lustre with failover configuration
hi Atul, few points ... 1. we have migrated to SuSE11 from Solaris due to various techno/political reasons. 2. I will keep OSS1/MDS2 & OSS2/MDS1 etc. as you have instructed. One last thing ... so, I can configure lustre in a way that ... when either node_A or node_B is down (e.g. power failure) , clients can still access file system without data lost :-) thankx a lot for quick response ... __ tharindu Hi Tharindu, Some comment inline... Tharindu Rukshan Bamunuarachchi wrote:> > we used sun cluster for shared storage access. we are going to migrate > to lustre. > > we have two nodes connected to single 2540 disk array. >Sun cluster runs on Solaris whereas Lustre is purely Linux. I hope you know that.> > > > we are going to keep OSS1/MDS in node A and OSS2/MDS in node B. > > Please find attached diagram. >It is generally not a good idea to co-locate OSS and MDS on same node. It kind of defeats the purpose of using Lustre which separates metadata from data. You can still use the shown architecture but I will advise MDS being primary on Node A with Node B acting as failover partner. Similarly, Node B being primary for OSS with Node A being failover partner. Also, unlike Sun Cluster, both nodes in Lustre failover pair can not talk to same storage simultaneously. They work as a active-passive pair of nodes. Please refer to Lustre manual which explains failover in detail (manual.lustre.org). Searching Lustre failover in Google shows some interesting links with detailed examples. See http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failove r.html> > > > how should I configure lustre to support failover/failout. Ideally > single node failure should be transparent to Lustre client. > > could you please provide commands to configure MDS/OSS with failover > support. >See Lustre manual.> > > should I give same volume for both MDS/OSS ? >No. Usually OSS resides on RAID6 volume whereas MDS resides on RAID10 volume. It can be from same storage box, ST2540 in your case. Lustre benefits from having multiple OSS nodes where IO can be striped across OSTs. Cheers, -Atul> > > > / / > > cheers, > > __ > > /tharindu/ > > > >**************************************************************************** **************************************************************************** ***********> > "The information contained in this email including in any attachment > is confidential and is meant to be read only by the person to whom it > is addressed. If you are not the intended recipient(s), you are > prohibited from printing, forwarding, saving or copying this email. If > you have received this e-mail in error, please immediately notify the > sender and delete this e-mail and its attachments from your computer." > >**************************************************************************** **************************************************************************** ***********> > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- =================================Atul Vidwansa Sun Microsystems Australia Pty Ltd Web: http://blogs.sun.com/atulvid Email: Atul.Vidwansa at Sun.COM ******************************************************************************************************************************************************************* "The information contained in this email including in any attachment is confidential and is meant to be read only by the person to whom it is addressed. If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email. If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer." *******************************************************************************************************************************************************************
On Jul 28, 2009 16:08 +0530, Tharindu Rukshan Bamunuarachchi wrote:> we have two nodes connected to single 2540 disk array. > > we are going to keep OSS1/MDS in node A and OSS2/MDS in node B. > > Please find attached diagram.Several notes: - Lustre does not currently support multiple MDTs in the same filesystem. - if two servers (e.g. OSS1 + MDT1 are on the same node, then failure of that node is considered a "double failure" and cannot be recovered without IO errors to the client (though data will not be lost). You would be better off to have MDS on one node (backup on OSS node) and 2 OSTs on a single OSS node (backup on MDS node). Note that Lustre is not really designed as a high availability 2-node cluster filesystem. It is more targetted at large-scale storage where a single server cannot provide sufficient bandwidth/storage to meet the needs of the clients connected.> could you please provide commands to configure MDS/OSS with failover > support.This is documented in the Lustre manual. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
More comments inline.. Tharindu Rukshan Bamunuarachchi wrote:> hi Atul, > > few points ... > > 1. we have migrated to SuSE11 from Solaris due to various techno/political > reasons. > 2. I will keep OSS1/MDS2 & OSS2/MDS1 etc. as you have instructed. > > One last thing ... > > so, I can configure lustre in a way that ... > when either node_A or node_B is down (e.g. power failure) , clients can > still access file system without data lost :-) >Yes you can, Lustre clients will see brief hang till failover is done and will resume their operations (assuming that storage need not undergo fsck). Cheers, -Atul> thankx a lot for quick response ... > > __ > tharindu > > > > > Hi Tharindu, > > Some comment inline... > > Tharindu Rukshan Bamunuarachchi wrote: > >> we used sun cluster for shared storage access. we are going to migrate >> to lustre. >> >> we have two nodes connected to single 2540 disk array. >> >> > Sun cluster runs on Solaris whereas Lustre is purely Linux. I hope you > know that. > >> >> >> we are going to keep OSS1/MDS in node A and OSS2/MDS in node B. >> >> Please find attached diagram. >> >> > > It is generally not a good idea to co-locate OSS and MDS on same node. > It kind of defeats the purpose of using Lustre which separates metadata > from data. You can still use the shown architecture but I will advise > MDS being primary on Node A with Node B acting as failover partner. > Similarly, Node B being primary for OSS with Node A being failover partner. > > Also, unlike Sun Cluster, both nodes in Lustre failover pair can not > talk to same storage simultaneously. They work as a active-passive pair > of nodes. Please refer to Lustre manual which explains failover in > detail (manual.lustre.org). Searching Lustre failover in Google shows > some interesting links with detailed examples. See > http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failove > r.html > >> >> >> how should I configure lustre to support failover/failout. Ideally >> single node failure should be transparent to Lustre client. >> >> could you please provide commands to configure MDS/OSS with failover >> support. >> >> > See Lustre manual. > > >> >> >> should I give same volume for both MDS/OSS ? >> >> > No. Usually OSS resides on RAID6 volume whereas MDS resides on RAID10 > volume. It can be from same storage box, ST2540 in your case. Lustre > benefits from having multiple OSS nodes where IO can be striped across OSTs. > > Cheers, > -Atul > >> >> >> / / >> >> cheers, >> >> __ >> >> /tharindu/ >> >> >> >> >> > **************************************************************************** > **************************************************************************** > *********** > >> "The information contained in this email including in any attachment >> is confidential and is meant to be read only by the person to whom it >> is addressed. If you are not the intended recipient(s), you are >> prohibited from printing, forwarding, saving or copying this email. If >> you have received this e-mail in error, please immediately notify the >> sender and delete this e-mail and its attachments from your computer." >> >> >> > **************************************************************************** > **************************************************************************** > *********** > >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > >-- =================================Atul Vidwansa Sun Microsystems Australia Pty Ltd Web: http://blogs.sun.com/atulvid Email: Atul.Vidwansa at Sun.COM