Hi all. I'm wondering if anyone can shed some light on a strange crashing/rebooting problem I'm having. First, the specs: Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon, 5 Gb memory Hard Drives: LSILogic PERC 4e/Di, configured as RAID 5, with 3 X 40 Gb disks OS: FreeBSD 5.4-RELEASE-p6 for amd64 Other related software: mysql Ver 14.7 Distrib 4.1.14, for portbld- freebsd5.4 (amd64) using 4.3 I currently have hyperthreading enabled, since I'm not too concerned about the security of the system (it's on an internal-only network, with no user accounts other than the administrator, and figure that if the security issue associated with hyperthreading is the only problem, it wouldn't hurt to get a bit more speed). It's intended to be a single-purpose MySQL server to other client machines via TCP/IP, and supposed to be a high reliability, fast as possible machine. But the problem is this. I have it set to run mysqlhotcopy a couple of times during the day to back up the databases. And twice now in the last month or so, when it starts to run, it brings down the server. But the odd thing is that it doesn't lock up indefinitely, or even reboot itself normally. Instead, it suddenly seems to quit as though someone unplugged it and then goes through the boot sequence. It's at a remote location from me, so I haven't been able to see the console while it goes through its problems, but according to /var/log/messages, everything is running fine, and then suddenly, starts to write its initial boot messages: sql syslogd: kernel boot file is /boot/kernel/kernel sql kernel: Copyright (c) 1992-2005 The FreeBSD Project. etc.. There are no logs of any "shutting down" variety, and sure enough, I get sql kernel: Mounting root from ufs:/dev/amrd0s1a sql kernel: WARNING: / was not properly dismounted sql kernel: WARNING: /usr was not properly dismounted messages written a bit later in the boot sequence. What gets me is that if the machine was "really" locking up due to a kernel panic or something, I would expect it to stay frozen and not restart itself. But within a couple of minutes of going down hard, it has rebooted itself. There isn't any kind of watchdog timer that reboots itself after a lockup that I'm not aware of, is there? Because of this, I sometimes don't even realize it's happened until I found that the odd MySQL database needs to be repaired, and then I checked the logs and see what's happened. According to the logs, it's almost as though it's getting physically unplugged midstream, then plugged back in and boots from there. But it's in a locked cabinet in a colocation centre with other machines of mine which aren't having the problem, and it's happened twice now at exactly the same time - just right as mysqlhotcopy is about to run. Considering that this machine is supposed to be high availability, being down for even a couple of minutes like this is a problem. Plus, I really don't like not understanding what's making it go down like it does, and I'm obviously concerned about data corruption to the databases when something like this happens. Does anyone have any advice on what may be wrong, or something to try? I really have no idea even how to begin to troubleshoot this problem. If you need any more information at all, please let me know. Thanks for your help! Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213
Hello, You should be using the IA-64 version of FreeBSD as these are Intel chips and nor AMD chips? FreeBSD 5.4 IA-64 is for the Intel EM64T/Xeon/Itainium CPU's, FreeBSD 5.4 AMD64 is for the AMD 64/x2/Opteron/FX CPU's. You will have more luck using the IA-64 release, I am not sure if you can cvsup the source and rebuild the IA-64 version so you might have to do a fresh install, and since you have backups of your databases it should not take you longer than 1hour to get it all running smoothly again. Kind regards, Jayton Garnett Dan Charrois wrote:> Hi all. I'm wondering if anyone can shed some light on a strange > crashing/rebooting problem I'm having. First, the specs: > > Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon, > 5 Gb memory > Hard Drives: LSILogic PERC 4e/Di, configured as RAID 5, with 3 X 40 > Gb disks > OS: FreeBSD 5.4-RELEASE-p6 for amd64 > Other related software: mysql Ver 14.7 Distrib 4.1.14, for portbld- > freebsd5.4 (amd64) using 4.3 > > I currently have hyperthreading enabled, since I'm not too concerned > about the security of the system (it's on an internal-only network, > with no user accounts other than the administrator, and figure that > if the security issue associated with hyperthreading is the only > problem, it wouldn't hurt to get a bit more speed). It's intended to > be a single-purpose MySQL server to other client machines via TCP/IP, > and supposed to be a high reliability, fast as possible machine. > > But the problem is this. I have it set to run mysqlhotcopy a couple > of times during the day to back up the databases. And twice now in > the last month or so, when it starts to run, it brings down the > server. But the odd thing is that it doesn't lock up indefinitely, > or even reboot itself normally. Instead, it suddenly seems to quit > as though someone unplugged it and then goes through the boot > sequence. It's at a remote location from me, so I haven't been able > to see the console while it goes through its problems, but according > to /var/log/messages, everything is running fine, and then suddenly, > starts to write its initial boot messages: > > sql syslogd: kernel boot file is /boot/kernel/kernel > sql kernel: Copyright (c) 1992-2005 The FreeBSD Project. > etc.. > > There are no logs of any "shutting down" variety, and sure enough, I get > > sql kernel: Mounting root from ufs:/dev/amrd0s1a > sql kernel: WARNING: / was not properly dismounted > sql kernel: WARNING: /usr was not properly dismounted > > messages written a bit later in the boot sequence. > > What gets me is that if the machine was "really" locking up due to a > kernel panic or something, I would expect it to stay frozen and not > restart itself. But within a couple of minutes of going down hard, > it has rebooted itself. There isn't any kind of watchdog timer that > reboots itself after a lockup that I'm not aware of, is there? > Because of this, I sometimes don't even realize it's happened until I > found that the odd MySQL database needs to be repaired, and then I > checked the logs and see what's happened. According to the logs, > it's almost as though it's getting physically unplugged midstream, > then plugged back in and boots from there. But it's in a locked > cabinet in a colocation centre with other machines of mine which > aren't having the problem, and it's happened twice now at exactly the > same time - just right as mysqlhotcopy is about to run. > > Considering that this machine is supposed to be high availability, > being down for even a couple of minutes like this is a problem. > Plus, I really don't like not understanding what's making it go down > like it does, and I'm obviously concerned about data corruption to > the databases when something like this happens. > > Does anyone have any advice on what may be wrong, or something to > try? I really have no idea even how to begin to troubleshoot this > problem. If you need any more information at all, please let me know. > > Thanks for your help! > > Dan > -- > Syzygy Research & Technology > Box 83, Legal, AB T0G 1L0 Canada > Phone: 780-961-2213 >
On 10/25/05, Dan Charrois <dan@syz.com> wrote:> Hi all. I'm wondering if anyone can shed some light on a strange > crashing/rebooting problem I'm having. First, the specs: > > Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon, > 5 Gb memory[snip] You didn't mention, have you run Dell diagnostics on the machine to rule out hardware issues? Bryan
> > On 10/25/05, Dan Charrois <dan@syz.com> wrote: > >> Hi all. I'm wondering if anyone can shed some light on a strange >> crashing/rebooting problem I'm having. First, the specs: >> >> Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon, >> 5 Gb memory >> > [snip] > > You didn't mention, have you run Dell diagnostics on the machine to > rule out hardware issues?No, I haven't been able to run diagnostics and rule out the hardware for two reasons.. First, the server is located about an hour's drive away, and I haven't had the chance to get to it yet. Of course, this can be fixed. But secondly, I have no idea *how* to run Dell Diagnostics. The "Dell PowerEdge Service and Diagnostic Utilities, Version 4.4" CD that I have insists on being run from Windows, right down to a setup.exe in the root directory and a ReadMe that starts describing how to use the CD as: 1. Insert the Service and Diagnostic Utilities CD into the CD drive on a system running Windows. The setup program should start automatically. If it does not, click the Start button, click Run, and then type x:setup.exe (where x is the drive letter of your CD drive). This isn't a dual boot machine - it's sole task is running FreeBSD for an SQL server, so that's not an option for me. You'd think that they'd have a self-booting CD that would be able to diagnose the machine, since for the life of me, even if I *were* running Windows, I wouldn't be able to figure out how to diagnose a problem if part of the problem made Windows unbootable... I must be missing something. Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213
Taken from the digest form, so hopefully I won't whack the formatting too badly..... Dan Charrois <dan@syz.com> wrote: [snip]>No, I haven't been able to run diagnostics and rule out the hardware >for two reasons.. First, the server is located about an hour's drive >away, and I haven't had the chance to get to it yet. Of course, this >can be fixed. But secondly, I have no idea *how* to run Dell >Diagnostics. The "Dell PowerEdge Service and Diagnostic Utilities, >Version 4.4" CD that I have insists on being run from Windows, right >down to a setup.exe in the root directory and a ReadMe that starts >describing how to use the CD as: > >[snip] This is pretty much classic Dell. We've purchased a number of systems without operating systems on which we run FreeBSD. However, they continually operate under the assumption that we are running Windoze or Linux, and expect those to do things like BIOS updates. I'm trying to work it out with them, but it's been pretty painful so far (enough that I'm starting to look at other hardware vendors). However, in the case of the diagnostics utilities, Dell's a bit more enlightened. These links ftp://ftp.dell.com/diags/ED5061A0.tar.gz ftp://ftp.dell.com/diags/EI5061A0.ZIP ftp://ftp.dell.com/diags/MP1038A0.tar.gz ftp://ftp.us.dell.com/diags/MP1038A0.zip point at the Dell 32-bit diagnostics (first pair) and the memory diagnostics utilities (second pair). The .tar.gz files containg raw floppy images suitable for writing to floppy with a command like cat file.img | dd of=/dev/fd0 obs=18k or something like that. The .ZIP files contain what appear to be ISO images suitable for burning to a CD. Figuring that command out is left as an exercise for the reader. :-) -- Alan Amesbury University of Minnesota