Hi all, One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it. I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this? Here's a snipped from the log, around the time of the reboot: Jun 2 14:59:59 usaxen02 kernel: EXT3-fs: mounted filesystem with ordered data mode. Jun 2 15:00:06 usaxen02 kernel: kjournald starting. Commit interval 5 seconds Jun 2 15:00:06 usaxen02 kernel: EXT3 FS on dm-8, internal journal Jun 2 15:00:06 usaxen02 kernel: EXT3-fs: mounted filesystem with ordered data mode. Jun 2 15:00:39 usaxen02 kernel: device vifvenu0 entered promiscuous mode Jun 2 15:00:39 usaxen02 kernel: ADDRCONF(NETDEV_UP): vifvenu0: link is not ready Jun 2 21:00:39 usaxen02 logger: /etc/xen/scripts/vif-bridge: iptables -A FORWARD -m physdev --physdev-in vifvenu0 -s 72.9.241.226 72.9.241.227 72.9.2 41.232 72.9.247.207 -j ACCEPT failed. If you are using iptables, this may affect networking for guest domains. Jun 2 15:00:43 usaxen02 kernel: blkback: ring-ref 8, event-channel 6, protocol 1 (x86_64-abi) Jun 2 15:00:43 usaxen02 kernel: blkback: ring-ref 9, event-channel 7, protocol 1 (x86_64-abi) Jun 2 15:00:43 usaxen02 kernel: ADDRCONF(NETDEV_CHANGE): vifvenu0: link becomes ready Jun 2 15:00:43 usaxen02 kernel: xenbr1: topology change detected, propagating Jun 2 15:00:43 usaxen02 kernel: xenbr1: port 5(vifvenu0) entering forwarding state Jun 2 17:30:22 usaxen02 syslogd 1.4.1: restart. Jun 2 17:30:22 usaxen02 kernel: klogd 1.4.1, log source = /proc/kmsg started. Jun 2 17:30:22 usaxen02 kernel: Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol01 ide0=noprobe) Jun 2 17:30:22 usaxen02 kernel: Linux version 2.6.18-128.1.10.el5xen (mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Thu May 7 11:07:18 EDT 2009 Jun 2 17:30:22 usaxen02 kernel: BIOS-provided physical RAM map: Jun 2 17:30:22 usaxen02 kernel: Xen: 0000000000000000 - 00000001de804000 (usable) Jun 2 17:30:22 usaxen02 kernel: DMI 2.4 present. Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) Jun 2 17:30:22 usaxen02 kernel: ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) Jun 2 17:30:22 usaxen02 kernel: IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 Jun 2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Jun 2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Jun 2 17:30:22 usaxen02 kernel: Setting APIC routing to xen Jun 2 17:30:22 usaxen02 kernel: Using ACPI (MADT) for SMP configuration information Jun 2 17:30:22 usaxen02 kernel: Allocating PCI resources starting at d4000000 (gap: d0000000:2ff00000) -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:> Hi all, > > One of our CentOS 5.3 randomly reboots, at different times of the day, > and I can't see why it's doing it. > > I have looked through the logs, but don't see any thing in there that > shows me why it has rebooted. How can I debug this? > > Here's a snipped from the log, around the time of the reboot: > ><snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things. But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis. Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos/attachments/20090602/a676be03/attachment-0001.sig>
On 6/2/09, Scott Silva <ssilva at sgvwater.com> wrote:> on 6-2-2009 2:30 PM Rudi Ahlers spake the following: >> Hi all, >> >> One of our CentOS 5.3 randomly reboots, at different times of the day, >> and I can't see why it's doing it. >> >> I have looked through the logs, but don't see any thing in there that >> shows me why it has rebooted. How can I debug this? >> >> Here's a snipped from the log, around the time of the reboot: >> >> > <snip> > Random reboots can happen fast enough that nothing gets into the logs. You > can > try setting up a console and have the system post there. It sometimes > catches > things. > > But until then I would do the obvious... Make sure the system is clean and > not > overheating from "dust bunnies" filling up the chassis. > > Remove and re-seat all cards and ram. Make sure all fans are working. Run > memtest overnight if possible. Look back to when the reboots started and see > if something was added or upgraded. > >Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to get the IDC to look into the problem, but they're not very helpful and recon I need to check my software. I know the "server" runs desktop hardware, so it could be a hardware problem, but they don't seem to think so. So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out. -- Kind Regards Rudi Ahlers CEO, SoftDux Hosting Web: http://www.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532
Rudi Ahlers ?????:> Hi all, > > One of our CentOS 5.3 randomly reboots, at different times of the day, > and I can't see why it's doing it. > > I have looked through the logs, but don't see any thing in there that > shows me why it has rebooted. How can I debug this? > >Hi, try to enable kdump to get kernel dump, if this software-related issue. http://download.swsoft.com/virtuozzo/virtuozzo4.0/docs/en/lin/VzLinuxUG/20027.htm Using Kexec and Kdump For System Troubleshooting yum install kexec-tools edit /etc/grub.conf and append to the end of the kernel line: "crashkernel=128M at 16M" chkconfig kdump on reboot Also look this: http://kbase.redhat.com/faq/docs/DOC-6039 How do I configure kexec/kdump on Red Hat Enterprise Linux 5? http://kbase.redhat.com/faq/docs/DOC-2119 How can I voluntarily crash my machine to test if netdump/diskdump/kdump I configured works? http://kbase.redhat.com/faq/docs/DOC-5413 My server crashes once in awhile. How can I debug it? http://kbase.redhat.com/faq/docs/DOC-1742 My system has started to hang randomly. What information does Red Hats technical support need to diagnose the problem? http://kbase.redhat.com/faq/docs/DOC-10828 My Red Hat Enterprise Linux 2.1 system had a kernel panic, an oops message, or is freezing for no apparent reason. How can I find out what is causing this? Next, I recommend you setup and run memtest86+.x86_64 : Stand-alone memory tester for x86 and x86-64 computers You should ask the support to reboot machine for a night and chose the memtest in grub loader. If DC has ipkvm - ask it. Also what a network card on your server ? I had some troubles with non-brand network card.. -- Best wishes, Sergej Kandyla ?????? ?????????? ????? ? ????? ?????? ????????? ???!