Hi, I''m quite new to lustre, studying all the documentation. A question I have not found an answer to so far: If the user data gets spread out in chunks to a number of OSTs, and one of the OSTs fails completely - say, all the disks on the fileserver behind that OST are gone for good - how does the cluster recover from that? It can''t be a backup replay from tape or keeping all fileservers as HA pairs, right? Is lustre doing some kind of RAID accross the OSTs? And where can I find some documentation on that? Thanks, Thomas
On Friday 08 June 2007 13:41:13 Thomas Roth wrote:> Hi, > > I''m quite new to lustre, studying all the documentation. A question I > have not found an answer to so far: > > If the user data gets spread out in chunks to a number of OSTs, and one > of the OSTs fails completely - say, all the disks on the fileserver > behind that OST are gone for good - how does the cluster recover from that? > > It can''t be a backup replay from tape or keeping all fileservers as HA > pairs, right? Is lustre doing some kind of RAID accross the OSTs? And > where can I find some documentation on that?Well, I think thats one of the weaknesses of lustre, it doesn''t support raid over the OSTs :( I asked Peter Braahm during the LUG about it and he told me its planned for 2008 (not sure anymore, if the year anymore). Peter also suggested to use something like nbd devices. Well, raid5 over enbd probably would work, but afaik enbd development has stalled and I don''t off know any better nbd (imho the kernel build-in nbd is by far too unstable for that purpose). Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Jun 08, 2007 13:41 +0200, Thomas Roth wrote:> If the user data gets spread out in chunks to a number of OSTs, and one > of the OSTs fails completely - say, all the disks on the fileserver > behind that OST are gone for good - how does the cluster recover from that?This is of course possible. The risk of losing each Lustre file is a function of how reliable the back-end storage is, and how many OSTs a file is striped over. We already recommend striping over as few OSTs as is needed to achieve the bandwidth needed for its usage, so if you have a relatively slow network compared to the storage, and don''t need access to a file by many clients at one time you can just stripe over a single OST by default.> It can''t be a backup replay from tape or keeping all fileservers as HA > pairs, right?Well, RAID is never really a substitute for a backup in any case, because RAID doesn''t protect you from "rm -rf *" and other user mistakes. That is true for Lustre as much as other filesystems.> Is lustre doing some kind of RAID accross the OSTs?This is something that we are working toward. However, it is far more difficult to implement reliably than it first appears. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.