nan del bosc
2013-Apr-11 15:36 UTC
[CentOS] How to determine why a server is not responding
Hi to all!
We're using CentOS 5.5 64bits for our Plesk 11.
This week we had the following problem 3 times...
Suddenly, the server stops responding in all services (SSH, Apache,
Postfix, ...) but ping works!
After wait a few minutes (or 2 hours some times) the server continues
unresponsive until we reboot. After reboot we search on /var/log/messages
but cannot find useful information...
Apr 11 14:56:05 s1 postfix/smtpd[8263]: SQL engine 'intentionally
disabled'
not supported
Apr 11 14:56:05 s1 postfix/smtpd[8263]: auxpropfunc error no mechanism
available
Apr 11 14:56:42 s1 postfix/smtpd[8370]: SQL engine 'intentionally
disabled'
not supported
Apr 11 14:56:42 s1 postfix/smtpd[8370]: auxpropfunc error no mechanism
available
Apr 11 14:56:47 s1 postfix/smtpd[8391]: SQL engine 'intentionally
disabled'
not supported
Apr 11 14:56:47 s1 postfix/smtpd[8391]: auxpropfunc error no mechanism
available
Apr 11 14:56:47 s1 postfix/smtpd[8392]: SQL engine 'intentionally
disabled'
not supported
Apr 11 14:56:47 s1 postfix/smtpd[8392]: auxpropfunc error no mechanism
available
Apr 11 16:55:42 s1 syslogd 1.4.1: restart.
Apr 11 16:55:42 s1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 11 16:55:42 s1 kernel: Bootdata ok (command line is ro root=/dev/xvda1
console=xvc0 console=hvc0 xencons=hvc)
Apr 11 16:55:42 s1 kernel: Linux version 2.6.18-194.26.1.el5xen (
mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-48)) #1 SMP Tue Nov 9 13:35:30 EST 2010
Apr 11 16:55:42 s1 kernel: BIOS-provided physical RAM map:
Apr 11 16:55:42 s1 kernel: Xen: 0000000000000000 - 0000000080000000
(usable)
Apr 11 16:55:42 s1 kernel: No mptable found.
Apr 11 16:55:42 s1 kernel: Built 1 zonelists. Total pages: 524288
Apr 11 16:55:42 s1 kernel: Kernel command line: ro root=/dev/xvda1
console=xvc0 console=hvc0 xencons=hvc
Apr 11 16:55:42 s1 kernel: Initializing CPU#0
Apr 11 16:55:42 s1 kernel: PID hash table entries: 4096 (order: 12, 32768
bytes)
Apr 11 16:55:42 s1 kernel: Xen reported: 2009.260 MHz processor.
Apr 11 16:55:42 s1 kernel: Console: colour dummy device 80x25
Apr 11 16:55:42 s1 kernel: Dentry cache hash table entries: 262144 (order:
9, 2097152 bytes)
Apr 11 16:55:42 s1 kernel: Inode-cache hash table entries: 131072 (order:
8, 1048576 bytes)
Apr 11 16:55:42 s1 kernel: Software IO TLB disabled
Apr 11 16:55:42 s1 kernel: Memory: 2043384k/2097152k available (2513k
kernel code, 53108k reserved, 1395k data, 184k init)
Apr 11 16:55:42 s1 kernel: Calibrating delay using timer specific routine..
5025.13 BogoMIPS (lpj=10050261)
Apr 11 16:55:42 s1 kernel: Security Framework v1.0.0 initialized
Apr 11 16:55:42 s1 kernel: SELinux: Initializing.
Apr 11 16:55:42 s1 kernel: selinux_register_security: Registering
secondary module capability
Apr 11 16:55:42 s1 kernel: Capability LSM initialized as secondary
Apr 11 16:55:42 s1 kernel: Mount-cache hash table entries: 256
Apr 11 16:55:42 s1 kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache
64K (64 bytes/line)
Apr 11 16:55:42 s1 kernel: CPU: L2 Cache: 512K (64 bytes/line)
Apr 11 16:55:42 s1 kernel: CPU: Physical Processor ID: 0
Apr 11 16:55:42 s1 kernel: CPU: Processor Core ID: 0
Apr 11 16:55:42 s1 kernel: (SMP-)alternatives turned off
Apr 11 16:55:42 s1 kernel: Brought up 1 CPUs
Apr 11 16:55:42 s1 kernel: checking if image is initramfs... it is
Apr 11 16:55:42 s1 kernel: Grant table initialized
Apr 11 16:55:42 s1 kernel: NET: Registered protocol family 16
Apr 11 16:55:42 s1 kernel: Brought up 1 CPUs
Apr 11 16:55:42 s1 kernel: PCI: setting up Xen PCI frontend stub
Apr 11 16:55:42 s1 kernel: ACPI: Interpreter disabled.
Apr 11 16:55:42 s1 kernel: Linux Plug and Play Support v0.97 (c) Adam Belay
Apr 11 16:55:42 s1 kernel: pnp: PnP ACPI: disabled
Apr 11 16:55:42 s1 kernel: xen_mem: Initialising balloon driver.
Apr 11 16:55:42 s1 kernel: usbcore: registered new driver usbfs
Apr 11 16:55:42 s1 kernel: usbcore: registered new driver hub
Apr 11 16:55:42 s1 kernel: PCI: System does not support PCI
Apr 11 16:55:42 s1 kernel: PCI: System does not support PCI
Apr 11 16:55:42 s1 kernel: NetLabel: Initializing
Apr 11 16:55:42 s1 kernel: NetLabel: domain hash size = 128
Apr 11 16:55:42 s1 kernel: NetLabel: protocols = UNLABELED CIPSOv4
Apr 11 16:55:42 s1 kernel: NetLabel: unlabeled traffic allowed by default
Apr 11 16:55:42 s1 kernel: NET: Registered protocol family 2
Apr 11 16:55:42 s1 kernel: IP route cache hash table entries: 65536 (order:
7, 524288 bytes)
Apr 11 16:55:42 s1 kernel: TCP established hash table entries: 262144
(order: 10, 4194304 bytes)
Apr 11 16:55:42 s1 kernel: TCP bind hash table entries: 65536 (order: 8,
1048576 bytes)
Apr 11 16:55:42 s1 kernel: TCP: Hash tables configured (established 262144
bind 65536)
Apr 11 16:55:42 s1 kernel: TCP reno registered
Apr 11 16:55:42 s1 kernel: audit: initializing netlink socket (disabled)
Apr 11 16:55:42 s1 kernel: type=2000 audit(1365692095.507:1): initialized
Apr 11 16:55:42 s1 kernel: VFS: Disk quotas dquot_6.5.1
Apr 11 16:55:42 s1 kernel: Dquot-cache hash table entries: 512 (order 0,
4096 bytes)
Apr 11 16:55:42 s1 kernel: Initializing Cryptographic API
Apr 11 16:55:42 s1 kernel: alg: No test for crc32c (crc32c-generic)
Apr 11 16:55:42 s1 kernel: ksign: Installing public key data
Apr 11 16:55:42 s1 kernel: Loading keyring
Apr 11 16:55:42 s1 kernel: - Added public key 12AFE3EA6A14161C
Apr 11 16:55:42 s1 kernel: - User ID: CentOS (Kernel Module GPG key)
Apr 11 16:55:42 s1 kernel: io scheduler noop registered
Apr 11 16:55:42 s1 kernel: io scheduler anticipatory registered
Apr 11 16:55:42 s1 kernel: io scheduler deadline registered
Apr 11 16:55:42 s1 kernel: io scheduler cfq registered (default)
Apr 11 16:55:42 s1 kernel: pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Apr 11 16:55:42 s1 kernel: rtc: IRQ 8 is not free.
Apr 11 16:55:42 s1 kernel: Non-volatile memory driver v1.2
Apr 11 16:55:42 s1 kernel: Linux agpgart interface v0.101 (c) Dave Jones
Apr 11 16:55:42 s1 kernel: brd: module loaded
Apr 11 16:55:42 s1 kernel: Xen virtual console successfully installed as
xvc0
Apr 11 16:55:42 s1 kernel: Event-channel device installed.
Apr 11 16:55:42 s1 kernel: Uniform Multi-Platform E-IDE driver Revision:
7.00alpha2
Apr 11 16:55:42 s1 kernel: ide: Assuming 50MHz system bus speed for PIO
modes; override with idebus=xx
Apr 11 16:55:42 s1 kernel: ide-floppy driver 0.99.newide
Apr 11 16:55:42 s1 kernel: usbcore: registered new driver hiddev
Apr 11 16:55:42 s1 kernel: usbcore: registered new driver usbhid
Apr 11 16:55:42 s1 kernel: drivers/usb/input/hid-core.c: v2.6:USB HID core
driver
Apr 11 16:55:42 s1 kernel: PNP: No PS/2 controller found. Probing ports
directly.
Apr 11 16:55:42 s1 kernel: i8042.c: No controller found.
Apr 11 16:55:42 s1 kernel: mice: PS/2 mouse device common for all mice
Apr 11 16:55:42 s1 kernel: md: md driver 0.90.3 MAX_MD_DEVS=256,
MD_SB_DISKS=27
Apr 11 16:55:42 s1 kernel: md: bitmap version 4.39
Apr 11 16:55:42 s1 kernel: TCP bic registered
Apr 11 16:55:42 s1 kernel: Initializing IPsec netlink socket
Apr 11 16:55:42 s1 kernel: NET: Registered protocol family 1
Apr 11 16:55:42 s1 kernel: NET: Registered protocol family 17
Apr 11 16:55:42 s1 kernel: XENBUS: Device with no driver: device/vbd/51712
Apr 11 16:55:42 s1 kernel: XENBUS: Device with no driver: device/vif/0
Apr 11 16:55:42 s1 kernel: Initalizing network drop monitor service
Apr 11 16:55:42 s1 kernel: Write protecting the kernel read-only data: 483k
Apr 11 16:55:42 s1 kernel: USB Universal Host Controller Interface driver
v3.0
Apr 11 16:55:42 s1 kernel: Fusion MPT base driver 3.04.13rh
Apr 11 16:55:42 s1 kernel: Copyright (c) 1999-2008 LSI Corporation
Apr 11 16:55:42 s1 kernel: SCSI subsystem initialized
Apr 11 16:55:42 s1 kernel: Fusion MPT SPI Host driver 3.04.13rh
Apr 11 16:55:42 s1 kernel: device-mapper: uevent: version 1.0.3
Apr 11 16:55:42 s1 kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12)
initialised: dm-devel at redhat.com
Apr 11 16:55:42 s1 kernel: device-mapper: dm-raid45: initialized v0.2594l
Apr 11 16:55:42 s1 kernel: SGI XFS with ACLs, security attributes, large
block/inode numbers, no debug enabled
Apr 11 16:55:42 s1 kernel: SGI XFS Quota Management subsystem
Apr 11 16:55:42 s1 kernel: Registering block device major 202
Apr 11 16:55:42 s1 kernel: xvda: xvda1 xvda2 xvda3
Apr 11 16:55:42 s1 kernel: netfront: Initialising virtual ethernet driver.
Apr 11 16:55:42 s1 kernel: netfront: device eth0 has copying receive path.
Apr 11 16:55:42 s1 kernel: EXT3-fs: INFO: recovery required on readonly
filesystem.
Apr 11 16:55:42 s1 kernel: EXT3-fs: write access will be enabled during
recovery.
Apr 11 16:55:42 s1 kernel: kjournald starting. Commit interval 5 seconds
Apr 11 16:55:42 s1 kernel: EXT3-fs: recovery complete.
Apr 11 16:55:42 s1 kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Apr 11 16:55:42 s1 kernel: SELinux: Disabled at runtime.
Apr 11 16:55:42 s1 kernel: type=1404 audit(1365692123.036:2): selinux=0
auid=4294967295 ses=4294967295
Apr 11 16:55:42 s1 kernel: input: PC Speaker as /class/input/input0
Apr 11 16:55:42 s1 kernel: Floppy drive(s): fd0 is unknown type 15 (usb?),
fd1 is unknown type 15 (usb?)
Apr 11 16:55:42 s1 kernel: Failed to obtain physical IRQ 6
Apr 11 16:55:42 s1 kernel: floppy0: no floppy controllers found
Apr 11 16:55:42 s1 kernel: lp: driver loaded but no devices found
Apr 11 16:55:42 s1 kernel: md: Autodetecting RAID arrays.
Apr 11 16:55:42 s1 kernel: md: autorun ...
Apr 11 16:55:42 s1 kernel: md: ... autorun DONE.
Apr 11 16:55:42 s1 kernel: device-mapper: multipath: version 1.0.5 loaded
Apr 11 16:55:42 s1 kernel: EXT3 FS on xvda1, internal journal
Apr 11 16:55:42 s1 kernel: Filesystem "dm-0": Disabling barriers,
trial
barrier write failed
Apr 11 16:55:42 s1 kernel: XFS mounting filesystem dm-0
Apr 11 16:55:42 s1 kernel: Starting XFS recovery on filesystem: dm-0
(logdev: internal)
Apr 11 16:55:42 s1 kernel: Ending XFS recovery on filesystem: dm-0 (logdev:
internal)
Apr 11 16:55:42 s1 kernel: Filesystem "dm-1": Disabling barriers,
trial
barrier write failed
Apr 11 16:55:42 s1 kernel: XFS mounting filesystem dm-1
Apr 11 16:55:42 s1 kernel: Starting XFS recovery on filesystem: dm-1
(logdev: internal)
Apr 11 16:55:42 s1 kernel: Ending XFS recovery on filesystem: dm-1 (logdev:
internal)
Apr 11 16:55:42 s1 kernel: Filesystem "dm-2": Disabling barriers,
trial
barrier write failed
Apr 11 16:55:42 s1 kernel: XFS mounting filesystem dm-2
Apr 11 16:55:42 s1 kernel: Adding 1959920k swap on /dev/xvda2. Priority:-1
extents:1 across:1959920k
Apr 11 16:55:43 s1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Apr 11 16:55:43 s1 kernel: Netfilter messages via NETLINK v0.30.
Apr 11 16:55:43 s1 kernel: ip_conntrack version 2.4 (8192 buckets, 65536
max) - 304 bytes per conntrack
Apr 11 16:55:44 s1 kernel: NET: Registered protocol family 10
Apr 11 16:55:44 s1 kernel: lo: Disabled Privacy Extensions
Apr 11 16:55:44 s1 kernel: IPv6 over IPv4 tunneling driver
Apr 11 16:55:44 s1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Apr 11 16:55:46 s1 xinetd[1426]: xinetd Version 2.3.14 started with libwrap
loadavg labeled-networking options compiled in.
Apr 11 16:55:46 s1 xinetd[1426]: Started working: 2 available services
What can we do? what can we test?
More information...
Kernel:
Linux 2.6.18-194.26.1.el5xen #1 SMP Tue Nov 9 13:35:30 EST 2010 x86_64
x86_64 x86_64 GNU/Linux
RAM (in MB):
total used free shared buffers cached
Mem: 2048 1662 385 0 13 506
-/+ buffers/cache: 1142 905
Swap: 1913 0 1913
Disc usage:
S. fitxers Mida En ?s Lliure %?s Muntat en
/dev/xvda1 3.7G 1.9G 1.8G 52% /
/dev/mapper/vg00-usr 13G 1.4G 12G 11% /usr
/dev/mapper/vg00-var 66G 22G 45G 33% /var
/dev/mapper/vg00-home
4.0G 4.2M 4.0G 1% /home
none 1.0G 0 1.0G 0% /tmp
CPU:
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 8
model name : Six-Core AMD Opteron(tm) Processor 2423 HE
stepping : 0
cpu MHz : 2009.260
cache size : 512 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu de tsc msr pae cx8 apic cmov pat clflush mmx fxsr sse
sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc pni cx16
popcnt lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse
bogomips : 5025.13
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc [6] [7] [8]
Thank's!
--
---
Salut!
m.roth at 5-cent.us
2013-Apr-11 15:42 UTC
[CentOS] How to determine why a server is not responding
nan del bosc wrote:> Hi to all! > > We're using CentOS 5.5 64bits for our Plesk 11. > > This week we had the following problem 3 times... > > Suddenly, the server stops responding in all services (SSH, Apache, > Postfix, ...) but ping works! > > After wait a few minutes (or 2 hours some times) the server continues > unresponsive until we reboot. After reboot we search on /var/log/messages > but cannot find useful information...<snip> A quick google shows me that the postfix messages are just that, and you might want to fix it so it's not asking for it. HOWEVER, the important thing is that it appears to have just gone completely unresponsive. I've seen that happen to some servers here, and we've never found any clues.... On the other hand, IIRC, they tended to be boxes that we've had other problems with, and have had a number rebuilt under warranty (mostly Penguins, and the problems I've had with them, as they're all Supermicro m/b's, told me to NEVER buy a Supermicro m/b). The only thing I can suggest trying might be to use ipmitool (assuming you don't want to bring them down and look in the BIOS) to read the SEL (system event log), to look for hardware errors. mark
Dale Dellutri
2013-Apr-11 16:02 UTC
[CentOS] How to determine why a server is not responding
On Thu, Apr 11, 2013 at 10:36 AM, nan del bosc <nandelbosc at gmail.com> wrote:> Hi to all! > > We're using CentOS 5.5 64bits for our Plesk 11. > > This week we had the following problem 3 times... > > Suddenly, the server stops responding in all services (SSH, Apache, > Postfix, ...) but ping works! > > After wait a few minutes (or 2 hours some times) the server continues > unresponsive until we reboot. After reboot we search on /var/log/messages > but cannot find useful information... >...> What can we do? what can we test?Are you running sysstat / sar ? Perhaps the sa / sar database that's left after reboot can show if some resource was over capacity.
Александр Кириллов
2013-Apr-11 16:04 UTC
[CentOS] How to determine why a server is not responding
> We're using CentOS 5.5 64bits for our Plesk 11. > > This week we had the following problem 3 times... > > Suddenly, the server stops responding in all services (SSH, Apache, > Postfix, ...) but ping works! > > After wait a few minutes (or 2 hours some times) the server continues > unresponsive until we reboot. After reboot we search on > /var/log/messages > but cannot find useful information......> > What can we do? what can we test?Could be something related to disk access or RAM, runaway process or whatever. Do you have any system monitoring tools installed? Like munin, atop, sysstat? Any kernel errors in the logs?
Alexander Dalloz
2013-Apr-11 16:38 UTC
[CentOS] How to determine why a server is not responding
Am 11.04.2013 17:36, schrieb nan del bosc:> Hi to all! > > We're using CentOS 5.5 64bits for our Plesk 11.That's insane! Why on earth do you run a 2,5 years old unpatched public system? You are asking for trouble and innocent third will be the victims of your hacked system.> This week we had the following problem 3 times... > > Suddenly, the server stops responding in all services (SSH, Apache, > Postfix, ...) but ping works! > > After wait a few minutes (or 2 hours some times) the server continues > unresponsive until we reboot. After reboot we search on /var/log/messages > but cannot find useful information... > > Apr 11 14:56:05 s1 postfix/smtpd[8263]: SQL engine 'intentionally disabled' > not supported > Apr 11 14:56:05 s1 postfix/smtpd[8263]: auxpropfunc error no mechanism > available > Apr 11 14:56:42 s1 postfix/smtpd[8370]: SQL engine 'intentionally disabled' > not supported > Apr 11 14:56:42 s1 postfix/smtpd[8370]: auxpropfunc error no mechanism > available > Apr 11 14:56:47 s1 postfix/smtpd[8391]: SQL engine 'intentionally disabled' > not supported > Apr 11 14:56:47 s1 postfix/smtpd[8391]: auxpropfunc error no mechanism > available > Apr 11 14:56:47 s1 postfix/smtpd[8392]: SQL engine 'intentionally disabled' > not supported > Apr 11 14:56:47 s1 postfix/smtpd[8392]: auxpropfunc error no mechanism > available > Apr 11 16:55:42 s1 syslogd 1.4.1: restart. > Apr 11 16:55:42 s1 kernel: klogd 1.4.1, log source = /proc/kmsg started. > Apr 11 16:55:42 s1 kernel: Bootdata ok (command line is ro root=/dev/xvda1 > console=xvc0 console=hvc0 xencons=hvc) > Apr 11 16:55:42 s1 kernel: Linux version 2.6.18-194.26.1.el5xen ( > mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat > 4.1.2-48)) #1 SMP Tue Nov 9 13:35:30 EST 2010That's a Xen Domain. So IPMI, as suggested by others, will not work.> Apr 11 16:55:42 s1 kernel: BIOS-provided physical RAM map: > Apr 11 16:55:42 s1 kernel: Xen: 0000000000000000 - 0000000080000000 > (usable) > Apr 11 16:55:42 s1 kernel: No mptable found. > Apr 11 16:55:42 s1 kernel: Built 1 zonelists. Total pages: 524288 > Apr 11 16:55:42 s1 kernel: Kernel command line: ro root=/dev/xvda1 > console=xvc0 console=hvc0 xencons=hvc[ ... ]> What can we do? what can we test?First, update your system to the latest 5.9 + updates! Talk to your hoster. If your Xen VM has issues other guests on the same hardware may have too. Or another VM on the hosts consumes so much resources that your VM does not respond any longer. [ ... ]> Thank's! > > -- > --- > Salut!Alexander