Forwarding. Begin forwarded message:> From: Brennan <James.E.Brennan at Sun.COM> > Date: March 6, 2008 2:36:44 AM PST > To: lustre-solutions at sun.com, hpc-aces at sun.com, hpc- > storage at sun.com, lustre-discuss at sun.com > Subject: Lustre Thumper Fault Tolerance > > IHAC that wants about 150 TB usable of Thumpers+Lustre specifically > to feed a compute cluster and use SAMFS to go to an SL3000. > They want the Lustre filesystem to be at least single fault > tolerant for a complete Thumper failure. They are willing to double > the number of > Thumpers to achieve this. What are the best practices for this > configuration? > > Jim Brennan > Digital Media Systems > Sun Systems Group > Universal City, CA > (310)901-8677-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080306/5fb0d447/attachment-0002.html
Hi, You cant do this right now. Network striping will be intruduced later. If you realy thing you need this kind of redundancy i reccomend you to wait for upcoming jbods. Normally lustre can fail off the nodes when required and on hpc applications speed moght be more important then reliability. Regards Mertol Sent from a mobile device Mertol Ozyoney On 06.Mar.2008, at 12:57, Brennan <James.E.Brennan at Sun.COM> wrote:> Forwarding. > > Begin forwarded message: > >> From: Brennan <James.E.Brennan at Sun.COM> >> Date: March 6, 2008 2:36:44 AM PST >> To: lustre-solutions at sun.com, hpc-aces at sun.com, hpc- >> storage at sun.com, lustre-discuss at sun.com >> Subject: Lustre Thumper Fault Tolerance >> >> IHAC that wants about 150 TB usable of Thumpers+Lustre specifically >> to feed a compute cluster and use SAMFS to go to an SL3000. >> They want the Lustre filesystem to be at least single fault >> tolerant for a complete Thumper failure. They are willing to double >> the number of >> Thumpers to achieve this. What are the best practices for this >> configuration? >> >> Jim Brennan >> Digital Media Systems >> Sun Systems Group >> Universal City, CA >> (310)901-8677 > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080306/f45dc2d3/attachment-0002.html
nathan at robotics.net
2008-Mar-06 14:30 UTC
[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance
On Thu, 6 Mar 2008, Mertol Ozyoney wrote:> Hi, > > You cant do this right now. Network striping will be intruduced later. > > If you realy thing you need this kind of redundancy i reccomend you to wait > for upcoming jbods. > > Normally lustre can fail off the nodes when required and on hpc applicationsIs there any way to get redundancy from lustre when disk is local to each node and not shared between nodes? So far the only way I can see to get a redundant system is to use shared storage and HA. What is the time frame on beta network striping? -Nathan
Brian J. Murrell
2008-Mar-06 14:48 UTC
[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance
On Thu, 2008-03-06 at 08:30 -0600, nathan at robotics.net wrote:> > Is there any way to get redundancy from lustre when disk is local to each > node and not shared between nodes?Achieving that is the entire subject of this thread. Go back to my first response. DRBD. I mistakenly said that DRBD requires more than a 2x total investment prior. That of course is wrong. For HA one is going to have the second node hardware anyway, whether it''s shared disk or otherwise. The extra cost of hardware to DRBD is 2x the disk plus the cost of the interconnect (for DRBD) between the nodes. This could effect a net savings depending on the cost of shared disk that you would use otherwise.> So far the only way I can see to get a > redundant system is to use shared storage and HA.DRBD is kinda like poor-man''s shared storage. It''s not really, but it achieves fairly close to the same goal. b.
Brennan wrote:> Forwarding. > > Begin forwarded message: > >> *From: *Brennan <James.E.Brennan at Sun.COM <mailto:James.E.Brennan at Sun.COM>> >> *Date: *March 6, 2008 2:36:44 AM PST >> *To: *lustre-solutions at sun.com <mailto:lustre-solutions at sun.com>, >> hpc-aces at sun.com <mailto:hpc-aces at sun.com>, hpc-storage at sun.com >> <mailto:hpc-storage at sun.com>, lustre-discuss at sun.com >> <mailto:lustre-discuss at sun.com> >> *Subject: **Lustre Thumper Fault Tolerance* >> >> IHAC that wants about 150 TB usable of Thumpers+Lustre specifically to >> feed a compute cluster and use SAMFS to go to an SL3000. >> They want the Lustre filesystem to be at least single fault tolerant >> for a complete Thumper failure. They are willing to double the number of >> Thumpers to achieve this. What are the best practices for this >> configuration? >> >> Jim Brennan >> Digital Media Systems >> Sun Systems Group >> Universal City, CA >> (310)901-8677 > >Why not use another Sun product other than Thumper to build a failover solution? Why not use the 6140''s in pairs with the servers to get redundancy. Is this still more expensive than buying 2x Thumper nodes? Craig -- Craig Tierney (craig.tierney at noaa.gov)
Currently there seems no way to do this. I think network striping will be released in version 2.0 , please check the road map in Lustre web site. Best regards Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +902123352222 Email mertol.ozyoney at Sun.COM -----Original Message----- From: nathan at robotics.net [mailto:nathan at robotics.net] Sent: 06 Mart 2008 Per?embe 16:30 To: Mertol Ozyoney Cc: Brennan; cluster-fs-interest at Sun.COM; lustre-discuss at lists.clusterfs.com Subject: Re: [Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance On Thu, 6 Mar 2008, Mertol Ozyoney wrote:> Hi, > > You cant do this right now. Network striping will be intruduced later. > > If you realy thing you need this kind of redundancy i reccomend you towait> for upcoming jbods. > > Normally lustre can fail off the nodes when required and on hpcapplications Is there any way to get redundancy from lustre when disk is local to each node and not shared between nodes? So far the only way I can see to get a redundant system is to use shared storage and HA. What is the time frame on beta network striping? -Nathan