I am having problems when using a Dell PowerVault MD3000 with multipath from a Dell PowerEdge 1950. I have 2 cables connected and mount the partition on the DAS Array. I am using RHEL 4.4 with RHCS and a two node cluster. Only one node is "Active" at a time, it creates a mount to the partition, and if there is an issue RHCS will fence the device and then the other node will mount the partition. I have now run into a problem twice where my ext3 (with Journaling) has corrupt inodes. This actually has resulted in a filesystem with #xxxxxxxxx files and directories. Am I missing something here? Should I be using a different files system? Any help would be appreciated. Paul Fitzmaurice Aveksa, Inc "Where Security meets Compliance" -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070403/77a99248/attachment.htm>
I don't know much about RHCS, but I'm think that this is more likely to be a Red Hat problem than an ext3 problem.. 1) *IF* RHCS properly locks out the 'dead' system, and it doesn't manage (at some time after the backup system takes over) to write cashes to the shared drive, 2) and *IF* the failover software isn't too stupid to do things like run the journal, and otherwise do sane FSCK things before mounting, then you shouldn't have a problem. My best guess is that 2) is relatively unlikely which leaves 1) as probable cause. If your primary system does *ANY* writes after the failover starts, then you can probably expect problems like you've seen here. (does RHCS _physically_ lock out the second system, or is it a software lockout?) The other question I have is: why is the system failing over? Other than testing, a well built HA system should almost *never* actually need to fail over. (we're not talking Windows servers here :-} ) HA should be like insurance ... You pay up front for it and work to make sure that you never actually have to use what you've paid for. On 4/3/07, Paul Fitzmaurice <pfitzmaurice at aveksa.com> wrote:> I am having problems when using a Dell PowerVault MD3000 with multipath from > a Dell PowerEdge 1950. I have 2 cables connected and mount the partition on > the DAS Array. I am using RHEL 4.4 with RHCS and a two node cluster. Only > one node is "Active" at a time, it creates a mount to the partition, and if > there is an issue RHCS will fence the device and then the other node will > mount the partition. > > I have now run into a problem twice where my ext3 (with Journaling) has > corrupt inodes. This actually has resulted in a filesystem with #xxxxxxxxx > files and directories.
Thanks for the info, if you could help to confirm, it appears that in some fail-over situations, we are mounting the shared partition as the the node going down has not completely shut down and done the umount! So having one node in rw mode when shutting down, and one node mounting and starting up... Could this cause inode and journal corruption? ----- Original Message ----- From: Stephen Samuel <darkonc at gmail.com> To: Paul Fitzmaurice Cc: ext3-users at redhat.com <ext3-users at redhat.com> Sent: Tue Apr 03 15:40:03 2007 Subject: Re: Corrupt inodes on shared disk... I don't know much about RHCS, but I'm think that this is more likely to be a Red Hat problem than an ext3 problem.. 1) *IF* RHCS properly locks out the 'dead' system, and it doesn't manage (at some time after the backup system takes over) to write cashes to the shared drive, 2) and *IF* the failover software isn't too stupid to do things like run the journal, and otherwise do sane FSCK things before mounting, then you shouldn't have a problem. My best guess is that 2) is relatively unlikely which leaves 1) as probable cause. If your primary system does *ANY* writes after the failover starts, then you can probably expect problems like you've seen here. (does RHCS _physically_ lock out the second system, or is it a software lockout?) The other question I have is: why is the system failing over? Other than testing, a well built HA system should almost *never* actually need to fail over. (we're not talking Windows servers here :-} ) HA should be like insurance ... You pay up front for it and work to make sure that you never actually have to use what you've paid for. On 4/3/07, Paul Fitzmaurice <pfitzmaurice at aveksa.com> wrote:> I am having problems when using a Dell PowerVault MD3000 with multipath from > a Dell PowerEdge 1950. I have 2 cables connected and mount the partition on > the DAS Array. I am using RHEL 4.4 with RHCS and a two node cluster. Only > one node is "Active" at a time, it creates a mount to the partition, and if > there is an issue RHCS will fence the device and then the other node will > mount the partition. > > I have now run into a problem twice where my ext3 (with Journaling) has > corrupt inodes. This actually has resulted in a filesystem with #xxxxxxxxx > files and directories.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070403/2c95b57e/attachment.htm>