Hello fellows,
I have a problem with a 2 node RHCS cluster (CentOS 4) where node 1
failed and node 2 became active. That happened already last year and due
to holidays the customer didn't recognize it. The cluster is just a
failover for Apache and has no shared storage space.
Customer now saw the situation, tried to fix it by rebooting node 1,
which then failed to come back up. As service ccsd started but couldn't
get full cluster information the followup service cman hangs forever -
bootup hangs in this state. Omitting cluster service starts at boot time
by being selective (boot parameter confirm) brings up the box.
ccsd starts up (by service or by hand and with parameter -n), but
syslogs that it fails to get cluster infrastructure information. So the
cluster is in inquorate state. Anyone experienced with the RHCS knows
whether I can avoid switching down node 2 and the Apache service for
which the cluster runs? Documentation (manual and FAQ) is silent about
this. I verified that there is no network / NIC problem. How to get the
2 node cluster back into quorate state?
Thanks for helping.
Cheers
Alexander