Hi Folks - Using CentOS on a server destined to have a dozen SATA drives in it. The server is fine, raid 5 is set up on groups of 4 SATA drives. Today we decide to disconnect one SATA drive to simulate a failure. The box trucked on fine... a little too fine. We waited some minutes but no problem was visible in /proc/mdstat or in /var/log/messages or on the console. I ran mdadm --monitor /dev/md0 and no problem was shown. We rebooted still without the drive and finally mdadm --monitor reported that the array was running in a degraded state. We reconnected the SATA drive and still nothing was reported and nothing happened with the raid state according to /proc/mdstat. I expected the box to keep on trucking but to become freaked out with warnings all over the shop. What should I have expected in this case and what should I do to become aware of evil events like the drive melting remotely? -Andy -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4492 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.centos.org/pipermail/centos/attachments/20060411/f92931f5/attachment-0001.bin>
On Tue, 11 Apr 2006 at 6:36pm, Andy Green wrote> Using CentOS on a server destined to have a dozen SATA drives in it. The > server is fine, raid 5 is set up on groups of 4 SATA drives. > > Today we decide to disconnect one SATA drive to simulate a failure. The box > trucked on fine... a little too fine. We waited some minutes but no problem > was visible in /proc/mdstat or in /var/log/messages or on the console. > > I ran mdadm --monitor /dev/md0 and no problem was shown.Did you try doing any I/O to the array? In my limited experience with software RAID, it won't notice a drive missing until it tries to do something with said drive. To really test it, I'd disconnect the drive while you have something disk intensive running. I like <http://people.redhat.com/dledford/memtest.html>, which unpacks and then diffs multiple copies of the Linux source tree. It'll have the system stressed *and* let you know if there any problems with the array running in degraded mode. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University