Adesanya, Adeyemi wrote:> I''m discussing the proposed architecture for two new Lustre 1.8.x
filesystems. We plan to use a failover pair of MDS nodes (active-active), with
each MDS serving an MDT. The MDTs will be housed in external storage but we
would like to implement redundancy across more than one storage array by using
software RAID1.
>
> The Lustre documentation mentions using linux md to set up software RAID1
or RAID10 for MDTs. Does the RAID1 implementation in the Lustre 1.8.x RHEL5
kernel do an adequate job of ensuring consistency across mirrored devices
(compared to a hardware RAID1 implementation)?
>
Adequate, probably. As correct as hardware raid, doubtful. Without
special hardware, or doing things that kill performance, there will
always remain some corner cases.
The issue is what happens for writes that are in process when you have a
crash/reboot/power loss: it is possible for them to make it to one disk,
but not the other. So it is possible to believe they are on disk, and
proceed accordingly, when they are only on one copy, and are lost if
that disk fails. Even worse, Linux alternates reads, so in theory it
could be there one time and gone the next.
The good news is that writes should(!) not be marked as "on disk"
until
both disks have said it is written. So you could do an md "check",
and
if needed do a "repair" before eg, replaying the journal (mounting the
file system doing fsck, etc). Even if the MD resync takes the older
copy and undoes a write, it should not have been a write that was
expected to have made it to stable storage, so the normal Lustre
recovery mechanisms should be able to replay it. Assuming, that is,
that this is done _before_ you mount the device.
Kevin