Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software. Thanks, Nataraj
Nataraj wrote:> Does software raid 1 compare checksums or otherwise verify that the same > bits are coming from both disks during reads? What I'm interested in, > is whether bit errors that were somehow undetected by the hardware would > be detected by the raid 1 software. >under normal operation, each read request goes to one or the other drive, this doubles the read throughput as both drives can be servicing different read requests at the same time. some raid does a scrub, where in the background, when the disks are otherwise idle, it gradually reads all the raid stripes and validates them. I honestly don't know if Linux built in raid does this or not. Of course, with RAID-1, if the two blocks disagree, there's no way of knowing which one is correct, only that there is a potential problem. Some raid (Sun ZFS, for instance) stores a checksum with every block so it can detect corruption immediately. Also, I know ZFS does this scrubbing.
Nataraj wrote:> Does software raid 1 compare checksums or otherwise verify that the same > bits are coming from both disks during reads? What I'm interested in, > is whether bit errors that were somehow undetected by the hardware would > be detected by the raid 1 software. > > Thanks, > NatarajI've been thinking about this as well. Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question. BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing . HTH a bit, Kay
On Sun, Sep 21, 2008 at 10:26 AM, Nataraj <incoming-centos at rjl.com> wrote:> Does software raid 1 compare checksums or otherwise verify that the same > bits are coming from both disks during reads? What I'm interested in,No. Reads are distributed over disks to increase performance.> is whether bit errors that were somehow undetected by the hardware would > be detected by the raid 1 software.Depends on the type of error. However, the sad thing is, if you use 3 disks for raid 1 the kernel does not do the right thing. Let me explain. Say you have 3 disks in a raid 1 array. If there is a mismatch then the smart thing to do would be to take a vote of the 3 disks. 2 out of 3 wins (assuming they are not all different). The odd man out should be corrected (if possible). But what actually happens is the highest numbered disk is copied to the others. I haven't looked at the latest kernel code but if this http://linas.org/linux/raid.html is correct then I think the kernel maintainers should address this issue. I don't think it would be hard to implement. -- Robert Arkiletian Eric Hamber Secondary, Vancouver, Canada Fl_TeacherTool http://www3.telus.net/public/robark/Fl_TeacherTool/ C++ GUI tutorial http://www3.telus.net/public/robark/