Hi -
We have had 3 rather major occurances of ext3 filesystem corruption
lately,
i.e. so bad we couldn't event mount, and fsck didn't help.
I am looking for pointers, that could help us investigate the root
cause.
In general...
We are running RedHat WS 3 Update 6, 2.4.21-40.2.ELsmp or
2.4.21-37.ELsmp
We have a small SAN system that looks like this
3 NFS servers each containing 2 Qlocic hba's connected to 2
qlogic switches
connected to an nstor (now xyratex) 6TB raid system containing
2 (active-active) controllers.
On the first 2 occasions one of the controllers was failed over.
On a 3rd occasion both SAN switches lost power, and the hosts and raid lost
communication.
On all occasions the qlocic failover driver tried to start up on the alternate
HBA.
On the first 2 instances we sort of tried to blame the controller.
On the 3rd, that was harder to do since the raid system and the hosts stayed
up
but lost communication.
I can provide more detail if anyone as any info on how to proceed.
Thanks
-Sev
--
Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev at bnl.gov