Well I thought someone might be interested in this. I have a Compaq server with a cpq array in a raid 5 configuration with some bog standard video card no sound card, NO X installed, no applications/modules not included with the distro, and 256 megs of memory. I upgraded it (yes I know not supported but...) from Roswell to 7.2 in the middle of last week and installed all applicable errata updates including the 2.4.9-13 kernel. All seemed to be running well. Monday morning at 4:02 am it stopped communicating with my network monitor. We were able to ping it but when trying to login it would hang after the username was entered (no passwd prompt). I had a guy push the "big red button" but it did not come up (the machine is 300 miles away). What he got was the grub prompt. We rebooted to rescue mode from anaconda and it was unable to detect any linux partitions. fdisk shows what appear to be correct partition sizes and types but ALL partitions start and stop on 1. Fdisk also complains about partitions overlapping and not starting on correct boundries. The good news is this is not a production machine. We were just getting ready to start testing it in preparation for production. As a result it is of no hardship if we have to reinstall. Can someone suggest a good way to try to figure out what happened? Is there any useful info to be gained for either the developers or myself. I would like to know what happened. I cc the ext3 list on this because the partitions were converted from ext2 to ext3 during the Roswell upgrade from 7.1. Ideas, suggestions and comments are welcome, -- ......Tom Dysfunction The Only Consistent Feature of All tdiehl@rogueind.com of Your Dissatisfying Relationships is You.
On Tue, 2001-11-13 at 17:54, Tom Diehl wrote:> I cc the ext3 list on this because the partitions were converted > from ext2 to ext3 during the Roswell upgrade from 7.1.Its incredibly unlikely (in fact I'd say its impossible) for it to be an ext3 issue. Your most likely candidates are the cpqarray hardware, which has some bizzarre behaviours occaisionally, or possibly the kernel has scribbled on the partition table - VM fault of some sort maybe. Do you have the compaq monitoring stuff on that machine? If so thats almost certainly your problem - compaq supply those as binary kernel modules which they hack on install to match the kernel versioning. I suggest anyone considering using them not do so and tell the compaq sales droids to provide source. [I have evidence that they do scribble on random memory - most likely due to a kernel or toolchain version mismatch]. For the record, I have had compaq servers with cpqarrays running ext3 for well over a year now. Nigel.