thr3ads.net - Lustre discuss - [Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Brennan

2008-Mar-06 10:57 UTC

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

Forwarding.

Begin forwarded message:
> From: Brennan <James.E.Brennan at Sun.COM>
> Date: March 6, 2008 2:36:44 AM PST
> To: lustre-solutions at sun.com, hpc-aces at sun.com, hpc- 
> storage at sun.com, lustre-discuss at sun.com
> Subject: Lustre Thumper Fault Tolerance
>
> IHAC that wants about 150 TB usable of Thumpers+Lustre specifically  
> to feed a compute cluster and use SAMFS to go to an SL3000.
> They want the Lustre filesystem to be at least single fault  
> tolerant for a complete Thumper failure. They are willing to double  
> the number of
> Thumpers to achieve this. What are the best practices for this  
> configuration?
>
> Jim Brennan
> Digital Media Systems
> Sun Systems Group
> Universal City, CA
> (310)901-8677
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080306/5fb0d447/attachment-0002.html

Mertol Ozyoney

2008-Mar-06 12:40 UTC

head link

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

Hi,

You cant do this right now. Network striping will be intruduced later.

If you realy thing you need this kind of redundancy i reccomend you to  
wait for upcoming jbods.

Normally lustre can fail off the nodes when required and on hpc  
applications speed moght be more important then reliability.

Regards
Mertol

Sent from a mobile device

Mertol Ozyoney

On 06.Mar.2008, at 12:57, Brennan <James.E.Brennan at Sun.COM> wrote:
> Forwarding.
>
> Begin forwarded message:
>
>> From: Brennan <James.E.Brennan at Sun.COM>
>> Date: March 6, 2008 2:36:44 AM PST
>> To: lustre-solutions at sun.com, hpc-aces at sun.com, hpc- 
>> storage at sun.com, lustre-discuss at sun.com
>> Subject: Lustre Thumper Fault Tolerance
>>
>> IHAC that wants about 150 TB usable of Thumpers+Lustre specifically  
>> to feed a compute cluster and use SAMFS to go to an SL3000.
>> They want the Lustre filesystem to be at least single fault  
>> tolerant for a complete Thumper failure. They are willing to double  
>> the number of
>> Thumpers to achieve this. What are the best practices for this  
>> configuration?
>>
>> Jim Brennan
>> Digital Media Systems
>> Sun Systems Group
>> Universal City, CA
>> (310)901-8677
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080306/f45dc2d3/attachment-0002.html

nathan at robotics.net

2008-Mar-06 14:30 UTC

head link

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

On Thu, 6 Mar 2008, Mertol Ozyoney wrote:
> Hi,
>
> You cant do this right now. Network striping will be intruduced later.
>
> If you realy thing you need this kind of redundancy i reccomend you to wait
> for upcoming jbods.
>
> Normally lustre can fail off the nodes when required and on hpc
applications
Is there any way to get redundancy from lustre when disk is local to each 
node and not shared between nodes? So far the only way I can see to get a 
redundant system is to use shared storage and HA.

What is the time frame on beta network striping?

-Nathan

Brian J. Murrell

2008-Mar-06 14:48 UTC

head link

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

On Thu, 2008-03-06 at 08:30 -0600, nathan at robotics.net
wrote:> 
> Is there any way to get redundancy from lustre when disk is local to each 
> node and not shared between nodes?
Achieving that is the entire subject of this thread.  Go back to my
first response.  DRBD.

I mistakenly said that DRBD requires more than a 2x total investment
prior.  That of course is wrong.  For HA one is going to have the second
node hardware anyway, whether it''s shared disk or otherwise.  The extra
cost of hardware to DRBD is 2x the disk plus the cost of the
interconnect (for DRBD) between the nodes.  This could effect a net
savings depending on the cost of shared disk that you would use
otherwise.
> So far the only way I can see to get a 
> redundant system is to use shared storage and HA.
DRBD is kinda like poor-man''s shared storage.  It''s not
really, but it
achieves fairly close to the same goal.

b.

Craig Tierney

2008-Mar-06 17:00 UTC

head link

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

Brennan wrote:> Forwarding.
> 
> Begin forwarded message:
> 
>> *From: *Brennan <James.E.Brennan at Sun.COM
<mailto:James.E.Brennan at Sun.COM>>
>> *Date: *March 6, 2008 2:36:44 AM PST
>> *To: *lustre-solutions at sun.com <mailto:lustre-solutions at
sun.com>,
>> hpc-aces at sun.com <mailto:hpc-aces at sun.com>, hpc-storage at
sun.com
>> <mailto:hpc-storage at sun.com>, lustre-discuss at sun.com 
>> <mailto:lustre-discuss at sun.com>
>> *Subject: **Lustre Thumper Fault Tolerance*
>>
>> IHAC that wants about 150 TB usable of Thumpers+Lustre specifically to 
>> feed a compute cluster and use SAMFS to go to an SL3000.
>> They want the Lustre filesystem to be at least single fault tolerant 
>> for a complete Thumper failure. They are willing to double the number
of
>> Thumpers to achieve this. What are the best practices for this 
>> configuration?
>>
>> Jim Brennan
>> Digital Media Systems
>> Sun Systems Group
>> Universal City, CA
>> (310)901-8677
> 
>

Why not use another Sun product other than Thumper to build a failover
solution?  Why not use the 6140''s in pairs with the servers to get
redundancy.
Is this still more expensive than buying 2x Thumper nodes?

Craig


-- 
Craig Tierney (craig.tierney at noaa.gov)

Mertol Ozyoney

2008-Mar-06 22:01 UTC

head link

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

Currently there seems no way to do this. I think network striping will be
released in version 2.0 , please check the road map in Lustre web site. 

Best regards

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM

-----Original Message-----
From: nathan at robotics.net [mailto:nathan at robotics.net] 
Sent: 06 Mart 2008 Per?embe 16:30
To: Mertol Ozyoney
Cc: Brennan; cluster-fs-interest at Sun.COM; lustre-discuss at
lists.clusterfs.com
Subject: Re: [Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

On Thu, 6 Mar 2008, Mertol Ozyoney wrote:
> Hi,
>
> You cant do this right now. Network striping will be intruduced later.
>
> If you realy thing you need this kind of redundancy i reccomend you to
wait > for upcoming jbods.
>
> Normally lustre can fail off the nodes when required and on hpcapplications

Is there any way to get redundancy from lustre when disk is local to each 
node and not shared between nodes? So far the only way I can see to get a 
redundant system is to use shared storage and HA.

What is the time frame on beta network striping?

-Nathan

Lustre discuss - Mar 2008 - Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance

[Lustre-discuss] Fwd: Lustre Thumper Fault Tolerance