Hi Guys,
I had a strange problem yesterday and I'm curious as to what everyone
thinks.
I have a client with a Red Hat Enterprise 2.1 cluster. All quality HP
equipment with an MSA 500 storage array acting as the shared storage
between the two nodes in the cluster.
This cluster is configured for reliability and not load balancing. All
work is handled by one node or the other not both.
There are two 100GB RAID 5 logical drives in the MSA500. Linux sees them
as /dev/md2 and /dev/md3 respectively. Running cat /proc/mdstat shows
them as "active multipath" and otherwise healthy.
There is a nightly shell script that runs and backs up information via tar
to a USB external drive. The last thing the script does before unmounting
the USB drive is to run the sync command.
Yesterday it was noticed that the backup script was hung. A quick check
via "ps aux" showed the backup script and a sync process still hanging
around from Monday night.
All attempts to stop these processed failed and it was decided a reboot was
the best fix. All production services where shut down as much as
possible. Because of the locked processes the machine would hot shut down
properly and was physically turned off.
On reboot the drives in the server were checked via e2fsck with no
problems.
The shared storage: md2 and md3 also mounted without errors.
All services started properly.
No errors about md2 or md3 were reported in dmesg or /var/log/messages
Strangely, all files added via Samba after Monday are gone. This is limited
to only one device: md3. Everything else is fine.
Checking the two drives/partitions that make up md3 show none of the
missing files.
Any brilliant thoughts as to where those files might have gone would be
appreciated. The files lost are not critical so there are no major
problems but it's a puzzle I can't quite figure out.
Shawn