thr3ads.net - Lustre discuss - [Lustre-discuss] OST failure, data recovery [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Thomas Roth

2007-Jun-08 05:41 UTC

[Lustre-discuss] OST failure, data recovery

Hi,

I''m quite new to lustre, studying all the documentation. A question I 
have not found an answer to so far:

If the user data gets spread out in chunks to a number of OSTs, and one 
of the OSTs fails completely - say, all the disks on the fileserver 
behind that OST are gone for good - how does the cluster recover from that?

It can''t be a backup replay from tape or keeping all fileservers as HA 
pairs, right? Is lustre doing some kind of RAID accross the OSTs? And 
where can I find some documentation on that?

Thanks,
Thomas

Bernd Schubert

2007-Jun-08 06:44 UTC

head link

[Lustre-discuss] OST failure, data recovery

On Friday 08 June 2007 13:41:13 Thomas Roth wrote:> Hi,
>
> I''m quite new to lustre, studying all the documentation. A
question I
> have not found an answer to so far:
>
> If the user data gets spread out in chunks to a number of OSTs, and one
> of the OSTs fails completely - say, all the disks on the fileserver
> behind that OST are gone for good - how does the cluster recover from that?
>
> It can''t be a backup replay from tape or keeping all fileservers
as HA
> pairs, right? Is lustre doing some kind of RAID accross the OSTs? And
> where can I find some documentation on that?
Well, I think thats one of the weaknesses of lustre, it doesn''t support
raid
over the OSTs :( 
I asked Peter Braahm during the LUG about it and he told me its planned for 
2008 (not sure anymore, if the year anymore). Peter also suggested to use 
something like nbd devices. Well, raid5 over enbd probably would work, but 
afaik enbd development has stalled and I don''t off know any better nbd
(imho
the kernel build-in nbd is by far too unstable for that purpose).


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

Andreas Dilger

2007-Jun-08 11:35 UTC

head link

[Lustre-discuss] OST failure, data recovery

On Jun 08, 2007  13:41 +0200, Thomas Roth wrote:> If the user data gets spread out in chunks to a number of OSTs, and one 
> of the OSTs fails completely - say, all the disks on the fileserver 
> behind that OST are gone for good - how does the cluster recover from that?
This is of course possible.  The risk of losing each Lustre file is a
function of how reliable the back-end storage is, and how many OSTs a
file is striped over.  We already recommend striping over as few OSTs
as is needed to achieve the bandwidth needed for its usage, so if you
have a relatively slow network compared to the storage, and don''t need
access to a file by many clients at one time you can just stripe over
a single OST by default.
> It can''t be a backup replay from tape or keeping all fileservers
as HA
> pairs, right?
Well, RAID is never really a substitute for a backup in any case, because
RAID doesn''t protect you from "rm -rf *" and other user
mistakes.  That is
true for Lustre as much as other filesystems.
> Is lustre doing some kind of RAID accross the OSTs?
This is something that we are working toward.  However, it is far more
difficult to implement reliably than it first appears.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Jun 2007 - OST failure, data recovery

[Lustre-discuss] OST failure, data recovery

[Lustre-discuss] OST failure, data recovery

[Lustre-discuss] OST failure, data recovery