thr3ads.net - CentOS - [CentOS] how to debug random server reboots [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Rudi Ahlers

2009-Jun-02 21:30 UTC

[CentOS] how to debug random server reboots

Hi all,

One of our CentOS 5.3 randomly reboots, at different times of the day,
and I can't see why it's doing it.

I have looked through the logs, but don't see any thing in there that
shows me why it has rebooted. How can I debug this?

Here's a snipped from the log, around the time of the reboot:


Jun  2 14:59:59 usaxen02 kernel: EXT3-fs: mounted filesystem with
ordered data mode.
Jun  2 15:00:06 usaxen02 kernel: kjournald starting.  Commit interval 5 seconds
Jun  2 15:00:06 usaxen02 kernel: EXT3 FS on dm-8, internal journal
Jun  2 15:00:06 usaxen02 kernel: EXT3-fs: mounted filesystem with
ordered data mode.
Jun  2 15:00:39 usaxen02 kernel: device vifvenu0 entered promiscuous mode
Jun  2 15:00:39 usaxen02 kernel: ADDRCONF(NETDEV_UP): vifvenu0: link
is not ready
Jun  2 21:00:39 usaxen02 logger: /etc/xen/scripts/vif-bridge: iptables
-A FORWARD -m physdev --physdev-in vifvenu0 -s 72.9.241.226
72.9.241.227 72.9.2
41.232 72.9.247.207 -j ACCEPT failed. If you are using iptables, this
may affect networking for guest domains.
Jun  2 15:00:43 usaxen02 kernel: blkback: ring-ref 8, event-channel 6,
protocol 1 (x86_64-abi)
Jun  2 15:00:43 usaxen02 kernel: blkback: ring-ref 9, event-channel 7,
protocol 1 (x86_64-abi)
Jun  2 15:00:43 usaxen02 kernel: ADDRCONF(NETDEV_CHANGE): vifvenu0:
link becomes ready
Jun  2 15:00:43 usaxen02 kernel: xenbr1: topology change detected, propagating
Jun  2 15:00:43 usaxen02 kernel: xenbr1: port 5(vifvenu0) entering
forwarding state
Jun  2 17:30:22 usaxen02 syslogd 1.4.1: restart.
Jun  2 17:30:22 usaxen02 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun  2 17:30:22 usaxen02 kernel: Bootdata ok (command line is ro
root=/dev/VolGroup00/LogVol01 ide0=noprobe)
Jun  2 17:30:22 usaxen02 kernel: Linux version 2.6.18-128.1.10.el5xen
(mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44))
 #1 SMP Thu May 7 11:07:18 EDT 2009
Jun  2 17:30:22 usaxen02 kernel: BIOS-provided physical RAM map:
Jun  2 17:30:22 usaxen02 kernel:  Xen: 0000000000000000 -
00000001de804000 (usable)
Jun  2 17:30:22 usaxen02 kernel: DMI 2.4 present.
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x01]
lapic_id[0x00] enabled)
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x03]
lapic_id[0x02] enabled)
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x02]
lapic_id[0x01] enabled)
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x04]
lapic_id[0x03] enabled)
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl
dfl lint[0x1])
Jun  2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl
dfl lint[0x1])
Jun  2 17:30:22 usaxen02 kernel: ACPI: IOAPIC (id[0x02]
address[0xfec00000] gsi_base[0])
Jun  2 17:30:22 usaxen02 kernel: IOAPIC[0]: apic_id 2, version 32,
address 0xfec00000, GSI 0-23
Jun  2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0
global_irq 2 dfl dfl)
Jun  2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9
global_irq 9 high level)
Jun  2 17:30:22 usaxen02 kernel: Setting APIC routing to xen
Jun  2 17:30:22 usaxen02 kernel: Using ACPI (MADT) for SMP
configuration information
Jun  2 17:30:22 usaxen02 kernel: Allocating PCI resources starting at
d4000000 (gap: d0000000:2ff00000)


-- 
Kind Regards
Rudi Ahlers
CEO, SoftDux Hosting
Web: http://www.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532

Scott Silva

2009-Jun-02 21:38 UTC

head link

[CentOS] how to debug random server reboots

on 6-2-2009 2:30 PM Rudi Ahlers spake the following:> Hi all,
> 
> One of our CentOS 5.3 randomly reboots, at different times of the day,
> and I can't see why it's doing it.
> 
> I have looked through the logs, but don't see any thing in there that
> shows me why it has rebooted. How can I debug this?
> 
> Here's a snipped from the log, around the time of the reboot:
> 
> <snip>
Random reboots can happen fast enough that nothing gets into the logs. You can
try setting up a console and have the system post there. It sometimes catches
things.

But until then I would do the obvious... Make sure the system is clean and not
overheating from "dust bunnies" filling up the chassis.

Remove and re-seat all cards and ram. Make sure all fans are working. Run
memtest overnight if possible. Look back to when the reboots started and see
if something was added or upgraded.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.centos.org/pipermail/centos/attachments/20090602/a676be03/attachment-0001.sig>

Rudi Ahlers

2009-Jun-02 21:46 UTC

head link

[CentOS] how to debug random server reboots

On 6/2/09, Scott Silva <ssilva at sgvwater.com>
wrote:> on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
>> Hi all,
>>
>> One of our CentOS 5.3 randomly reboots, at different times of the day,
>> and I can't see why it's doing it.
>>
>> I have looked through the logs, but don't see any thing in there
that
>> shows me why it has rebooted. How can I debug this?
>>
>> Here's a snipped from the log, around the time of the reboot:
>>
>>
> <snip>
> Random reboots can happen fast enough that nothing gets into the logs. You
> can
> try setting up a console and have the system post there. It sometimes
> catches
> things.
>
> But until then I would do the obvious... Make sure the system is clean and
> not
> overheating from "dust bunnies" filling up the chassis.
>
> Remove and re-seat all cards and ram. Make sure all fans are working. Run
> memtest overnight if possible. Look back to when the reboots started and
see
> if something was added or upgraded.
>
>
Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to
get the IDC to look into the problem, but they're not very helpful and
recon I need to check my software. I know the "server" runs desktop
hardware, so it could be a hardware problem, but they don't seem to
think so.

So, I'm trying todo everything I can, from my side, via SSH to see if
I can figure it out.

-- 
Kind Regards
Rudi Ahlers
CEO, SoftDux Hosting
Web: http://www.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532

Sergej Kandyla

2009-Jun-03 09:15 UTC

head link

[CentOS] how to debug random server reboots

Rudi Ahlers ?????:> Hi all,
>
> One of our CentOS 5.3 randomly reboots, at different times of the day,
> and I can't see why it's doing it.
>
> I have looked through the logs, but don't see any thing in there that
> shows me why it has rebooted. How can I debug this?
>
>   
Hi,

try to enable kdump to get kernel dump, if this software-related issue.

http://download.swsoft.com/virtuozzo/virtuozzo4.0/docs/en/lin/VzLinuxUG/20027.htm
Using Kexec and Kdump For System Troubleshooting

yum install kexec-tools
edit /etc/grub.conf and append to the end of the kernel line: 
"crashkernel=128M at 16M"
chkconfig kdump on
reboot

Also look this:

http://kbase.redhat.com/faq/docs/DOC-6039
How do I configure kexec/kdump on Red Hat Enterprise Linux 5?

http://kbase.redhat.com/faq/docs/DOC-2119
How can I voluntarily crash my machine to test if netdump/diskdump/kdump 
I configured works?

http://kbase.redhat.com/faq/docs/DOC-5413
My server crashes once in awhile. How can I debug it?

http://kbase.redhat.com/faq/docs/DOC-1742
My system has started to hang randomly. What information does Red Hats 
technical support need to diagnose the problem?

http://kbase.redhat.com/faq/docs/DOC-10828
My Red Hat Enterprise Linux 2.1 system had a kernel panic, an oops 
message, or is freezing for no apparent reason. How can I find out what 
is causing this?


Next, I recommend you setup and run
memtest86+.x86_64 : Stand-alone memory tester for x86 and x86-64 computers

You should ask the support to reboot machine for a night and chose the 
memtest in grub loader.
If DC has ipkvm - ask it.

Also what a network card on your server ?
I had some troubles with non-brand network card..



-- 
Best wishes, Sergej Kandyla
?????? ?????????? ????? ? ????? ?????? ????????? ???!

Seemingly Similar Threads

Search for more possibly parallel threads

CentOS - Jun 2009 - how to debug random server reboots

[CentOS] how to debug random server reboots

[CentOS] how to debug random server reboots

[CentOS] how to debug random server reboots

[CentOS] how to debug random server reboots

Seemingly Similar Threads