thr3ads.net - Lustre discuss - [Lustre-discuss] availability [May 2006]

If this information is useful, please help other people find it:
Share via:

Lee Ward

2006-May-19 07:36 UTC

[Lustre-discuss] availability

Hi Anselm,

On Thu, 2006-05-11 at 16:38 +0200, Anselm Strauss wrote:> hi.
> 
> i had the feeling most people are using lustre in striping mode to  
> reach high bandwidth. there are certainly good techniques in lustre  
> to reach also good availability, but i was still wondering what is  
> the typical setup for a lustre system striped over multiple servers  
> but not missing too much availability.
> so, if there are people willing to tell a bit about their lustre  
> setup and availability issues i would welcome this a lot, and they  
> might also get some feedback.
> is it, that
> 
> 1) you are not providing a high available storage for the users, e.g.  
> forcing them to store only short-lived data on the system which they  
> are archiving somewhere else?
Yes. In our case, the somewhere else is supposed to be IBM''s HPSS.
Other
clusters, personal storage, large servers, and NFS filers are also
leveraged by the users.
> 
> 2) you use one or more fail-overed hosts for each oss?
Yes. One fail-over host per OSS. Active-active.
> 
> 3) you use failover and even mirror your disks behind lustre, e.g. drbd?
Nothing else. Just the fail-over.
> 
> 4) you do not have redundancy but good soft- and hardware that is  
> highly available by itself?
Yes. Kind of. The disks are all attached to hardware RAID 5 controllers.
Host-controller link is FC2 but no switches are present. So, each host
is direct-attached to two RAID controllers. We''ve run two cable configs
at our site on various machines:

1)

[host A] ----- [ctlr 1] \
                         | - [storage]
[host B] ----- [ctlr 2] /

2)
                         | - [storage]
[host A] ----- [ctlr 2] /        
         \
          ---- [ctlr 1] \
                         | - [storage]
[host B] ----- [ctlr 2] /
         \
          ---- [ctlr 1] \
                         | - [storage]

In (2), above, the last OSS is connected to the last RAID and the first.
We don''t use that cabling anymore, anywhere, that I know of.
> 
> we are planning to use lustre for a linux cluster of about 150 nodes.  
> what we need for sure is a fast network storage on each node which is  
> only possible using multiple oss and striping. but our users are also  
> accustomed to store all their research data on the cluster and have  
> it regularly backed up. i was thinking of providing a second lustre  
> service for archiving data which is not striped therefore slow, but  
> highly available. users will have to copy their data from one system  
> to the other for computation, and copy back again their results to  
> archive them. i''m also not sure how the crash of one oss will
affect
> a striped system. i think already striped data is gone but new data  
> will be written to the remaining servers. i don''t know if lustre
is
> able to balance striped data if a crashed server comes back online,  
> as it is possible with lvm.
We use multiple Lustre file systems on the same machine to achieve some
form of non-stop availability. Certainly not high-availability of a
given FS, though. Altogether, 5 Lustre FS of various size and capability
are available for production use, ranging from 16 OSS to 64 OSS nodes in
size. There are about 10,000 Lustre clients. Lustre has been in use for
about a year.

We have never contemplated back-ups for our production parallel file
systems. Only home directories.

Right now, the death of an OSS is problematic due to a bug in Lustre on
our machine. A bug unique to our machine architecture, Cray XT3? Maybe
someone from CFS will weigh in and verify that? In any case, fail-over
does not work for us. We try to unmount what we can, remove knowledge of
the problem file system from the compute and login service nodes, and
continue production runs. Not wonderful but functional, mostly.

Depending on what went wrong, data may or may not be lost. If it was an
OSS, or that the network partitioned, the data is often still there.
When it''s storage hardware, we have lost data on the affected
volume(s).
We have never taken advantage of Lustre''s ability to continue operation
after replacing a failed storage volume -- We mount read-only, tell
users to get their surviving data off, reformat and restart the FS.

To my knowledge, Lustre does not rebalance striped data after faulty
OSS/storage is repaired. We''ve never tried it though, as I mentioned
above.

Hope something in there is useful.

		--Lee
> 
> cheers,
> anselm strauss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Peter J. Braam

2006-May-19 07:36 UTC

head link

[Lustre-discuss] availability

Hi Anselm

It is indeed true, as Lee said that there is a recovery bug specific to
the Lustre version running on the Catamount operating system.  

On Linux clients you can expect both MDS and OSS failover to work pretty
reasonably - it takes a few minutes though before the system is fully
operational again on the failover node, but applications continue
transparently.

- Peter -
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com 
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of 
> Anselm Strauss
> Sent: Thursday, May 11, 2006 8:38 AM
> To: lustre-discuss@clusterfs.com
> Subject: [Lustre-discuss] availability
> 
> hi.
> 
> i had the feeling most people are using lustre in striping 
> mode to reach high bandwidth. there are certainly good 
> techniques in lustre to reach also good availability, but i 
> was still wondering what is the typical setup for a lustre 
> system striped over multiple servers but not missing too much 
> availability.
> so, if there are people willing to tell a bit about their 
> lustre setup and availability issues i would welcome this a 
> lot, and they might also get some feedback.
> is it, that
> 
> 1) you are not providing a high available storage for the 
> users, e.g.  
> forcing them to store only short-lived data on the system 
> which they are archiving somewhere else?
> 
> 2) you use one or more fail-overed hosts for each oss?
> 
> 3) you use failover and even mirror your disks behind lustre, 
> e.g. drbd?
> 
> 4) you do not have redundancy but good soft- and hardware 
> that is highly available by itself?
> 
> we are planning to use lustre for a linux cluster of about 
> 150 nodes.  
> what we need for sure is a fast network storage on each node 
> which is only possible using multiple oss and striping. but 
> our users are also accustomed to store all their research 
> data on the cluster and have it regularly backed up. i was 
> thinking of providing a second lustre service for archiving 
> data which is not striped therefore slow, but highly 
> available. users will have to copy their data from one system 
> to the other for computation, and copy back again their 
> results to archive them. i''m also not sure how the crash of 
> one oss will affect a striped system. i think already striped 
> data is gone but new data will be written to the remaining 
> servers. i don''t know if lustre is able to balance striped 
> data if a crashed server comes back online, as it is possible 
> with lvm.
> 
> cheers,
> anselm strauss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> 
>

Anselm Strauss

2006-May-19 07:36 UTC

head link

[Lustre-discuss] availability

hi.

i had the feeling most people are using lustre in striping mode to  
reach high bandwidth. there are certainly good techniques in lustre  
to reach also good availability, but i was still wondering what is  
the typical setup for a lustre system striped over multiple servers  
but not missing too much availability.
so, if there are people willing to tell a bit about their lustre  
setup and availability issues i would welcome this a lot, and they  
might also get some feedback.
is it, that

1) you are not providing a high available storage for the users, e.g.  
forcing them to store only short-lived data on the system which they  
are archiving somewhere else?

2) you use one or more fail-overed hosts for each oss?

3) you use failover and even mirror your disks behind lustre, e.g. drbd?

4) you do not have redundancy but good soft- and hardware that is  
highly available by itself?

we are planning to use lustre for a linux cluster of about 150 nodes.  
what we need for sure is a fast network storage on each node which is  
only possible using multiple oss and striping. but our users are also  
accustomed to store all their research data on the cluster and have  
it regularly backed up. i was thinking of providing a second lustre  
service for archiving data which is not striped therefore slow, but  
highly available. users will have to copy their data from one system  
to the other for computation, and copy back again their results to  
archive them. i''m also not sure how the crash of one oss will affect  
a striped system. i think already striped data is gone but new data  
will be written to the remaining servers. i don''t know if lustre is  
able to balance striped data if a crashed server comes back online,  
as it is possible with lvm.

cheers,
anselm strauss

Lustre discuss - May 2006 - availability

[Lustre-discuss] availability

[Lustre-discuss] availability

[Lustre-discuss] availability