thr3ads.net - Lustre discuss - [Lustre-discuss] software RAID1 in RHEL5 [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Adesanya, Adeyemi

2011-Apr-22 19:45 UTC

[Lustre-discuss] software RAID1 in RHEL5

I''m discussing the proposed architecture for two new Lustre 1.8.x
filesystems. We plan to use a failover pair of MDS nodes (active-active), with
each MDS serving an MDT.  The MDTs will be housed in external storage but we
would like to implement redundancy across more than one storage array by using
software RAID1.

The Lustre documentation mentions using linux md to set up software RAID1 or
RAID10 for MDTs. Does the RAID1 implementation in the Lustre 1.8.x RHEL5 kernel
do an adequate job of ensuring consistency across mirrored devices (compared to
a hardware RAID1 implementation)?

-------
Yemi

Kevin Van Maren

2011-May-19 16:44 UTC

head link

[Lustre-discuss] software RAID1 in RHEL5

Adesanya, Adeyemi wrote:> I''m discussing the proposed architecture for two new Lustre 1.8.x
filesystems. We plan to use a failover pair of MDS nodes (active-active), with
each MDS serving an MDT.  The MDTs will be housed in external storage but we
would like to implement redundancy across more than one storage array by using
software RAID1.
>
> The Lustre documentation mentions using linux md to set up software RAID1
or RAID10 for MDTs. Does the RAID1 implementation in the Lustre 1.8.x RHEL5
kernel do an adequate job of ensuring consistency across mirrored devices
(compared to a hardware RAID1 implementation)?
>   
Adequate, probably.  As correct as hardware raid, doubtful.  Without 
special hardware, or doing things that kill performance, there will 
always remain some corner cases.

The issue is what happens for writes that are in process when you have a 
crash/reboot/power loss: it is possible for them to make it to one disk, 
but not the other.  So it is possible to believe they are on disk, and 
proceed accordingly, when they are only on one copy, and are lost if 
that disk fails.  Even worse, Linux alternates reads, so in theory it 
could be there one time and gone the next.

The good news is that writes should(!) not be marked as "on disk"
until
both disks have said it is written.  So you could do an md "check",
and
if needed do a "repair" before eg, replaying the journal (mounting the
file system doing fsck, etc).  Even if the MD resync takes the older 
copy and undoes a write, it should not have been a write that was 
expected to have made it to stable storage, so the normal Lustre 
recovery mechanisms should be able to replay it.  Assuming, that is, 
that this is done _before_ you mount the device.

Kevin

Lustre discuss - Apr 2011 - software RAID1 in RHEL5

[Lustre-discuss] software RAID1 in RHEL5

[Lustre-discuss] software RAID1 in RHEL5