Hi, We have a server which locks up about once a week (for the past 3 weeks now), without any warning, and the only way to recover it, is to reset the server. This causes unwanted downtime, and often software loss as well. How do I debug the server, which runs CentOS 5.2 to see why it locks up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel Motherboard The last few entries before the server froze, is: Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:59008 Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: [127.0.0.1]:59008 Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47729 Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: [127.0.0.1]:47729 Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47890 Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: [127.0.0.1]:47890 Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:50023 Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: [127.0.0.1]:50023 Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:58459 Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: [127.0.0.1]:58459 Nov 15 10:10:10 saturn syslogd 1.4.1: restart. Nov 15 10:10:11 saturn kernel: klogd 1.4.1, log source = /proc/kmsg started. Nov 15 10:10:11 saturn kernel: Bootdata ok (command line is ro root=/dev/System/root) Nov 15 10:10:11 saturn kernel: Linux version 2.6.18-92.1.17.el5xen (mockbuild@builder10.centos.org) (gcc version 4.1.2 20071124 (Red Hat 4.1 .2-42)) #1 SMP Tue Nov 4 14:13:09 EST 2008 Nov 15 10:10:11 saturn kernel: BIOS-provided physical RAM map: Nov 15 10:10:11 saturn kernel: Xen: 0000000000000000 - 00000001ef958000 (usable) Nov 15 10:10:11 saturn kernel: DMI 2.4 present. Nov 15 10:10:11 saturn kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Nov 15 10:10:11 saturn kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Nov 15 10:10:11 saturn kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Nov 15 10:10:11 saturn kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Nov 15 10:10:11 saturn kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) Nov 15 10:10:11 saturn kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) Nov 15 10:10:11 saturn kernel: ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) Nov 15 10:10:11 saturn kernel: IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 Nov 15 10:10:11 saturn kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Nov 15 10:10:11 saturn kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) -- Kind Regards Rudi Ahlers _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers schrieb:> Hi, > > We have a server which locks up about once a week (for the past 3 > weeks now), without any warning, and the only way to recover it, is to > reset the server. This causes unwanted downtime, and often software > loss as well. > > How do I debug the server, which runs CentOS 5.2 to see why it locks > up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel > Motherboard > > The last few entries before the server froze, is: > > > Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:59008 > Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: > [127.0.0.1]:59008 > Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47729 > Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: > [127.0.0.1]:47729 > Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47890 > Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: > [127.0.0.1]:47890 > Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:50023 > Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: > [127.0.0.1]:50023 > Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:58459 > Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: > [127.0.0.1]:58459 > Nov 15 10:10:10 saturn syslogd 1.4.1: restart.Rudi, i just wondering what these SNMP-Packages are for? I know from some of my Accesspoints, that they are able to get rebootet by snmp. Regards -- stefan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Hi, > > We have a server which locks up about once a week (for the past 3 > weeks now), without any warning, and the only way to recover it, is to > reset the server. This causes unwanted downtime, and often software > loss as well. > > How do I debug the server, which runs CentOS 5.2 to see why it locks > up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel > Motherboard >Those last few lines before the server froze are going to be the last write it managed to commit to disk, which may not relate to the crash. If there is nothing useful on the screen, try setting up serial consoles... maybe one for xen and one for the linux kernel. That should capture the servers dying words, assuming there are any. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Nov 15, 2008 at 10:33 AM, Stefan Bauer <stefan.bauer@cubewerk.de> wrote:> Rudi Ahlers schrieb: >> Hi, >> >> We have a server which locks up about once a week (for the past 3 >> weeks now), without any warning, and the only way to recover it, is to >> reset the server. This causes unwanted downtime, and often software >> loss as well. >> >> How do I debug the server, which runs CentOS 5.2 to see why it locks >> up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel >> Motherboard >> >> The last few entries before the server froze, is: >> >> >> Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:59008 >> Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:59008 >> Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47729 >> Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:47729 >> Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:47890 >> Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:47890 >> Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:50023 >> Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:50023 >> Nov 15 07:15:20 saturn snmpd[2527]: Connection from UDP: [127.0.0.1]:58459 >> Nov 15 07:15:20 saturn snmpd[2527]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:58459 >> Nov 15 10:10:10 saturn syslogd 1.4.1: restart. > > Rudi, > > i just wondering what these SNMP-Packages are for? I know from some of > my Accesspoints, that they are able to get rebootet by snmp. > > Regards > > -- > stefan > > _______________________________________________The server has Cacti installed to monitor bandwidwth of the switches, and XEN VPS''s, hence the 5 minute snmp entries, but there''s nothing configured that can shut it down. The server literally just freezes up, and we need to send someone to the datacentre to reboot it. -- Kind Regards Rudi Ahlers _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Nov 15, 2008 at 11:48 AM, James Harper <james.harper@bendigoit.com.au> wrote:>> Hi, >> >> We have a server which locks up about once a week (for the past 3 >> weeks now), without any warning, and the only way to recover it, is to >> reset the server. This causes unwanted downtime, and often software >> loss as well. >> >> How do I debug the server, which runs CentOS 5.2 to see why it locks >> up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel >> Motherboard >> > > Those last few lines before the server froze are going to be the last > write it managed to commit to disk, which may not relate to the crash. > > If there is nothing useful on the screen, try setting up serial > consoles... maybe one for xen and one for the linux kernel. That should > capture the servers dying words, assuming there are any. > > James >Due to the fact that the server doesn''t have a serial port, and is in a shared cabinet in a 3rd party datacentre, I can''t log the output to another server, or to a monitor. So, how do I capture everything to a log file instead? -- Kind Regards Rudi Ahlers _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users