thr3ads.net - Xen users - [Xen-users] Why does my DomU keep going mad? [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Lyle

2010-Jul-26 13:28 UTC

[Xen-users] Why does my DomU keep going mad?

Hi All,
   I''ve got a DomU that sometimes goes mad. I can''t ssh or
usually even
console to it. The time I did manage to console I got a load of dumps 
about being out of memory and swap, but couldn''t run any commands to 
find out which process had gone mad :(
   From Dom0 I can see the DomU at 100% CPU and can only stop it with a 
destroy. What can I do/check to find out why this happens? Sometimes 
it''ll be fine for weeks on end, others it''ll go wrong almost
every day.
The servers average load is very low, around 0.1. I assume there is a 
process that goes wild for whatever reason, but no idea where to start 
to track it down :(
   I''m running the latest CentOS, any help much appreciated.


Lyle


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steve Spencer

2010-Jul-26 13:36 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

Lyle wrote:> Hi All,
>   I''ve got a DomU that sometimes goes mad. I can''t ssh or
usually even
> console to it. The time I did manage to console I got a load of dumps
> about being out of memory and swap, but couldn''t run any commands
to
> find out which process had gone mad :(
>   From Dom0 I can see the DomU at 100% CPU and can only stop it with a
> destroy. What can I do/check to find out why this happens? Sometimes
> it''ll be fine for weeks on end, others it''ll go wrong
almost every day.
> The servers average load is very low, around 0.1. I assume there is a
> process that goes wild for whatever reason, but no idea where to start
> to track it down :(
>   I''m running the latest CentOS, any help much appreciated.
> 
> 
> Lyle
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
> 
> 
Lyle,

What services does this DomU run?  In other words is it a mail server,
web server, radius, etc?  What can you tell us about the DomU that would
be of help to us helping you?

-- 
--
Steven G. Spencer, Network Administrator
KSC Corporate - The Kelly Supply Family of Companies
Office 308-382-8764 Ext. 231
Mobile 308-380-7957

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Adi Kriegisch

2010-Jul-26 13:44 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

Hi!
>   I''ve got a DomU that sometimes goes mad. I can''t ssh or
usually even
> console to it. The time I did manage to console I got a load of dumps 
> about being out of memory and swap, but couldn''t run any commands
to
> find out which process had gone mad :(You could monitor your services for memory consumption?
Something like
  ps -e -orss=,args= | sort -b -k1,1n
or
  ps -auxf | sort -nr -k 4
maybe with
  ps -auxf | sort -nr -k 4 | head -10
shows sorted memory consumption by process or you might rather want to use
a monitoring tool like sar, nagios, or whatever to find out which process
causes this...

-- Adi

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Lyle

2010-Jul-26 13:48 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

On 26/07/2010 14:36, Steve Spencer wrote:> Lyle wrote:
>    
>> Hi All,
>>    I''ve got a DomU that sometimes goes mad. I can''t
ssh or usually even
>> console to it. The time I did manage to console I got a load of dumps
>> about being out of memory and swap, but couldn''t run any
commands to
>> find out which process had gone mad :(
>>    From Dom0 I can see the DomU at 100% CPU and can only stop it with a
>> destroy. What can I do/check to find out why this happens? Sometimes
>> it''ll be fine for weeks on end, others it''ll go wrong
almost every day.
>> The servers average load is very low, around 0.1. I assume there is a
>> process that goes wild for whatever reason, but no idea where to start
>> to track it down :(
>>    I''m running the latest CentOS, any help much appreciated.
>>
>>
>> Lyle
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>>
>>      
> Lyle,
>
> What services does this DomU run?  In other words is it a mail server,
> web server, radius, etc?  What can you tell us about the DomU that would
> be of help to us helping you?
>    
Here is an abridged ps aux, I cut out what look like duplicates. Is 
there a way of setting some kind of process logging to trigger once the 
CPU % goes over 90%?

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  10348   600 ?        Ss   06:14   0:00 init [3]
root         2  0.0  0.0      0     0 ?        S<   06:14   0:00 
[migration/0]
root         3  0.0  0.0      0     0 ?        SN   06:14   0:00 
[ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<   06:14   0:00 
[watchdog/0]
root         5  0.0  0.0      0     0 ?        S<   06:14   0:00 [events/0]
root         6  0.0  0.0      0     0 ?        S<   06:14   0:00 [khelper]
root         7  0.0  0.0      0     0 ?        S<   06:14   0:00 [kthread]
root         9  0.0  0.0      0     0 ?        S<   06:14   0:00 [xenwatch]
root        10  0.0  0.0      0     0 ?        S<   06:14   0:00 [xenbus]
root        14  0.0  0.0      0     0 ?        S<   06:14   0:00 
[migration/1]
root        15  0.0  0.0      0     0 ?        SN   06:14   0:00 
[ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S<   06:14   0:00 
[watchdog/1]
root        17  0.0  0.0      0     0 ?        S<   06:14   0:00 [events/1]
root        20  0.0  0.0      0     0 ?        S<   06:14   0:00 [kblockd/0]
root        21  0.0  0.0      0     0 ?        S<   06:14   0:00 [kblockd/1]
root        22  0.0  0.0      0     0 ?        S<   06:14   0:00 [cqueue/0]
root        23  0.0  0.0      0     0 ?        S<   06:14   0:00 [cqueue/1]
root        27  0.0  0.0      0     0 ?        S<   06:14   0:00 [khubd]
root        29  0.0  0.0      0     0 ?        S<   06:14   0:00 [kseriod]
root        94  0.0  0.0      0     0 ?        S    06:14   0:00 
[khungtaskd]
root        95  0.0  0.0      0     0 ?        S    06:14   0:00 [pdflush]
root        96  0.0  0.0      0     0 ?        S    06:14   0:00 [pdflush]
root        97  0.0  0.0      0     0 ?        S<   06:14   0:01 [kswapd0]
root        98  0.0  0.0      0     0 ?        S<   06:14   0:00 [aio/0]
root        99  0.0  0.0      0     0 ?        S<   06:14   0:00 [aio/1]
root       229  0.0  0.0      0     0 ?        S<   06:14   0:00 [kpsmoused]
root       254  0.0  0.0      0     0 ?        S<   06:14   0:00 [kstriped]
root       267  0.0  0.0      0     0 ?        S<   06:14   0:00 [ksnapd]
root       282  0.0  0.0      0     0 ?        S<   06:14   0:00 [kjournald]
root       304  0.0  0.0      0     0 ?        S<   06:14   0:00 [kauditd]
root       332  0.0  0.0  12604   348 ?        S<s  06:14   0:00 
/sbin/udevd -d
root       664  0.0  0.0      0     0 ?        S<   06:14   0:00 [kmpathd/0]
root       665  0.0  0.0      0     0 ?        S<   06:14   0:00 [kmpathd/1]
root       666  0.0  0.0      0     0 ?        S<   06:14   0:00 
[kmpath_handle]
root       688  0.0  0.0      0     0 ?        S<   06:14   0:00 [kjournald]
root      1067  0.0  0.1  27348   696 ?        S<sl 06:15   0:00 auditd
root      1069  0.0  0.1  81800   760 ?        S<sl 06:15   0:00 
/sbin/audispd
root      1089  0.0  0.1   5908   532 ?        Ss   06:15   0:00 syslogd 
-m 0
root      1092  0.0  0.0   3804   324 ?        Ss   06:15   0:00 klogd -x
root      1101  0.0  0.0  10760   316 ?        Ss   06:15   0:00 irqbalance
named     1138  0.0  1.2 166536  6728 ?        Ssl  06:15   0:01 
/usr/sbin/named
rpc       1171  0.0  0.0   8052   408 ?        Ss   06:15   0:00 portmap
root      1215  0.0  0.0      0     0 ?        S<   06:15   0:00 [rpciod/0]
root      1216  0.0  0.0      0     0 ?        S<   06:15   0:00 [rpciod/1]
rpcuser   1223  0.0  0.1  10160   564 ?        Ss   06:15   0:00 rpc.statd
root      1245  0.0  0.0  55180   236 ?        Ss   06:15   0:00 rpc.idmapd
dbus      1258  0.0  0.1  21356   852 ?        Ss   06:15   0:00 
dbus-daemon --s
root      1266  0.0  0.0  10432   376 ?        Ss   06:15   0:00 
/usr/sbin/hcid
root      1272  0.0  0.0   5936   392 ?        Ss   06:15   0:00 
/usr/sbin/sdpd
root      1294  0.0  0.0      0     0 ?        S<   06:15   0:00 [krfcommd]
root      1329  0.0  0.0  21040   524 ?        Ssl  06:15   0:00 pcscd
root      1347  0.0  0.0   8516   364 ?        Ss   06:15   0:00 
/usr/bin/hidd -
root      1380  0.0  0.1  54396   836 ?        Ssl  06:15   0:00 automount
root      1399  0.0  0.1  63516   532 ?        Ss   06:15   0:00 
/usr/sbin/sshd
root      1407  0.0  0.1 134096   952 ?        Ss   06:15   0:00 cupsd
root      1419  0.0  0.1  21644   540 ?        Ss   06:15   0:00 xinetd 
-stayali
root      1430  0.0  0.0  44268   188 ?        Ss   06:15   0:00 
/usr/sbin/vsftp
root      1462  0.0  0.1  65980   996 ?        S    06:15   0:00 /bin/sh 
/usr/bi
mysql     1509  0.0  0.8 191260  4308 ?        Sl   06:15   0:00 
/usr/libexec/my
postgres  1589  0.0  0.2 120740  1344 ?        S    06:15   0:00 
/usr/bin/postma
root      1600  0.0  0.0   6060   500 ?        Ss   06:15   0:00 
/usr/sbin/dovec
root      1608  0.0  0.2  62500  1300 ?        S    06:15   0:00 
dovecot-auth
dovecot   1612  0.0  0.2  33892  1300 ?        S    06:15   0:00 imap-login
postgres  1615  0.0  0.0 109920   176 ?        S    06:15   0:00 
postgres: logge
nobody    1622  0.0 31.0 212288 163000 ?       Ssl  06:15   0:06 
clamd.virtualmi
postgrey  1632  0.0  1.0 111480  5380 ?        Ss   06:15   0:00 
/usr/sbin/postg
root      1684  0.0  0.3  54144  1828 ?        Ss   06:15   0:00 
/usr/libexec/po
postfix   1691  0.0  0.3  55160  1932 ?        S    06:15   0:00 qmgr -l 
-t fifo
root      1701  0.0  0.0   6452   256 ?        Ss   06:15   0:00 gpm -m 
/dev/inp
postfix   1733  0.0  0.3  54204  1868 ?        S    06:15   0:00 tlsmgr 
-l -t un
root      1743  0.0  0.6 319152  3244 ?        Ss   06:15   0:00 
/usr/sbin/httpd
apache    1746  0.0  0.0 249564   444 ?        S    06:15   0:00 
/usr/sbin/httpd
root      1752  0.0  0.1  74860   724 ?        Ss   06:15   0:00 crond
root      1763  0.0  0.0  49764   420 ?        Ss   06:15   0:00 squid -D
squid     1765  0.0  0.5  52236  3128 ?        S    06:15   0:00 (squid) -D
squid     1767  0.0  0.0   3644   184 ?        Ss   06:15   0:00 (unlinkd)
apache    1779  0.0  0.0 319064   424 ?        S    06:15   0:00 
/usr/sbin/fcgi-
sympa     1780  0.0  4.3 258180 22672 ?        S    06:15   0:01 
/usr/bin/perl -
xfs       1796  0.0  0.1  20260   568 ?        Ss   06:15   0:00 xfs 
-droppriv -
root      1811  0.0  0.0  18732   352 ?        Ss   06:15   0:00 
/usr/sbin/atd
root      1819  0.0  0.0  46740   304 ?        Ss   06:15   0:00 
/usr/sbin/sasla
sympa     1833  0.0  4.1 230640 21652 ?        S    06:15   0:01 
/usr/bin/perl -
avahi     1840  0.0  0.1  24172  1032 ?        Ss   06:15   0:00 
avahi-daemon: r
68        1849  0.0  0.1  30428   976 ?        Ss   06:15   0:00 hald
root      1850  0.0  0.1  21692   532 ?        S    06:15   0:00 hald-runner
mailman   1866  0.0  0.1 149556   692 ?        Ss   06:15   0:00 
/usr/bin/python
root      1898  0.0  0.7 257084  3736 ?        SN   06:15   0:00 
/usr/bin/python
root      1900  0.0  0.1  12916   852 ?        SN   06:15   0:00 
/usr/libexec/ga
root      1927  0.0  0.0      0     0 ?        Z    06:16   0:00 [sen] 
<defunct>
root      2046  0.0  0.3 125808  1740 ?        Ss   06:16   0:00 
/usr/libexec/we
root      2056  0.0  0.0  18416   240 ?        S    06:16   0:00 
/usr/sbin/smart
root      2069  0.0  0.1  52108   892 ?        Ss   06:16   0:00 login 
-- root
postfix  13030  0.0  0.5  54348  2636 ?        S    08:12   0:00 
trivial-rewrite
postfix  17743  0.0  0.4  54208  2240 ?        S    09:14   0:00 pickup 
-l -t fi
postfix  18474  0.0  0.4  54204  2288 ?        S    09:24   0:00 anvil 
-l -t uni
postfix  19504  0.0  0.5  54428  2660 ?        S    09:35   0:00 local 
-t unix
dovecot  19540  0.0  0.3  33884  1632 ?        S    09:35   0:00 pop3-login
postfix  19543  0.0  0.8  72672  4492 ?        S    09:35   0:00 smtpd 
-n smtp -
postfix  19660  0.0  0.5  54468  2684 ?        S    09:40   0:00 cleanup 
-z -t u
postfix  19947  0.0  0.8  72672  4484 ?        S    09:41   0:00 smtpd 
-n smtp -


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Lyle

2010-Jul-26 14:26 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

On 26/07/2010 14:44, Adi Kriegisch wrote:> Hi!
>
>    
>>    I''ve got a DomU that sometimes goes mad. I can''t
ssh or usually even
>> console to it. The time I did manage to console I got a load of dumps
>> about being out of memory and swap, but couldn''t run any
commands to
>> find out which process had gone mad :(
>>      
> You could monitor your services for memory consumption?
> Something like
>    ps -e -orss=,args= | sort -b -k1,1n
> or
>    ps -auxf | sort -nr -k 4
> maybe with
>    ps -auxf | sort -nr -k 4 | head -10
> shows sorted memory consumption by process or you might rather want to use
> a monitoring tool like sar, nagios, or whatever to find out which process
> causes this...
>    
These are useful thanks, although ps doesn''t use - (just to be awkward,
everything else does).

I looked at nagios a few years ago, it looked great, but like I''d have 
to take a week out to set it up. If there anything lightweight I could 
make? I guess I could write a Perl daemon that runs that ps command 
every 10 seconds or something and logs the output to a file... Seems 
like the sort of thing that should all ready be available though...
   Anyone else had an issue like this?


Lyle


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steve Spencer

2010-Jul-26 15:17 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

Lyle wrote:> On 26/07/2010 14:44, Adi Kriegisch wrote:
>> Hi!
>>
>>   
>>>    I''ve got a DomU that sometimes goes mad. I
can''t ssh or usually even
>>> console to it. The time I did manage to console I got a load of
dumps
>>> about being out of memory and swap, but couldn''t run any
commands to
>>> find out which process had gone mad :(
>>>      
>> You could monitor your services for memory consumption?
>> Something like
>>    ps -e -orss=,args= | sort -b -k1,1n
>> or
>>    ps -auxf | sort -nr -k 4
>> maybe with
>>    ps -auxf | sort -nr -k 4 | head -10
>> shows sorted memory consumption by process or you might rather want to
>> use
>> a monitoring tool like sar, nagios, or whatever to find out which
process
>> causes this...
>>    
> 
> These are useful thanks, although ps doesn''t use - (just to be
awkward,
> everything else does).
> 
> I looked at nagios a few years ago, it looked great, but like I''d
have
> to take a week out to set it up. If there anything lightweight I could
> make? I guess I could write a Perl daemon that runs that ps command
> every 10 seconds or something and logs the output to a file... Seems
> like the sort of thing that should all ready be available though...
>   Anyone else had an issue like this?
> 
> 
> Lyle
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
> 
> I used psmon a few years ago for something similar.  Perhaps it would
work for you as well.  It looks as though this is a mail server
(postfix), so it could be something there that is causing your problem.

Here''s the link to psmon if you want to give that a try:

http://www.psmon.com/

-- 
--
Steven G. Spencer, Network Administrator
KSC Corporate - The Kelly Supply Family of Companies
Office 308-382-8764 Ext. 231
Mobile 308-380-7957

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Adi Kriegisch

2010-Jul-26 15:24 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

Hi!
> >You could monitor your services for memory consumption?
[SNIP] > These are useful thanks, although ps doesn''t use - (just to be
awkward,
> everything else does).man ps:
[SNIP]
  1   UNIX options, which may be grouped and must be preceded by a dash.
  2   BSD options, which may be grouped and must not be used with a dash.
  3   GNU long options, which are preceded by two dashes.
[SNAP]
> I looked at nagios a few years ago, it looked great, but like I''d
have
> to take a week out to set it up. If there anything lightweight I could 
> make? I guess I could write a Perl daemon that runs that ps command 
> every 10 seconds or something and logs the output to a file... Seems 
> like the sort of thing that should all ready be available though...My suggestion was not about setting up nagios if you''re not already
using
it. You could start using sar[1] or just write a plain shell script doing
the monitoring for you:
while /bin/true; do
  WHATEVER_PS_COMMAND_YOU_LIKE_BEST > \
           /var/log/mymemlog/$(date +%Y-%m-%d_-_%H.%M.%S)
  sleep 10
done
...and you''ll get memstats every 10 seconds saved in a log file for
further
analysis.

Another option would be to check your already existing log files for
"oomkiller" messages. They could give hints on the processes eating up
all
your memory.

Further this is a general issue with Linux servers running out of memory
and is not related to Xen or a Xen issue. You might as well want to have a
look at sites serverfault[2] or you might want to do it the other way
around and limit memory for the available applications and users. Just have
a look at /etc/security/limits.conf for example. Then sit down and wait for
the first service dying... ;-)
Another option could be to add more swap space (as this is usually cheaper
than ram). That way your problem might "disappear". On the other hand
you
should plan your (virtual) machines with expected memory consumption in
mind so that using swap space will not happen at all (or just in case of
emergency preventing the oomkiller to snap in).

-- Adi

[1] http://pagesperso-orange.fr/sebastien.godard/
[2] http://www.serverfault.com

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Lyle

2010-Jul-26 18:15 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

On 26/07/2010 16:17, Steve Spencer wrote:> I used psmon a few years ago for something similar.  Perhaps it would
> work for you as well.  It looks as though this is a mail server
> (postfix), so it could be something there that is causing your problem.
>
> Here''s the link to psmon if you want to give that a try:
>
> http://www.psmon.com/
>    
Thanks for the link I''ll take a look :)


Lyle



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Lyle

2010-Jul-26 18:19 UTC

head link

Re: [Xen-users] Why does my DomU keep going mad?

On 26/07/2010 16:24, Adi Kriegisch wrote:> Hi!
>
>    
>>> You could monitor your services for memory consumption?
>>>        
> [SNIP]
>    
>> These are useful thanks, although ps doesn''t use - (just to be
awkward,
>> everything else does).
>>      
> man ps:
> [SNIP]
>    1   UNIX options, which may be grouped and must be preceded by a dash.
>    2   BSD options, which may be grouped and must not be used with a dash.
>    3   GNU long options, which are preceded by two dashes.
> [SNAP]
>    
Just that I was getting the error message
"Warning: bad syntax, perhaps a bogus ''-''? See 
/usr/share/doc/procps-3.2.5/FAQ"
Taking the - off seemed to cure it.
>> I looked at nagios a few years ago, it looked great, but like
I''d have
>> to take a week out to set it up. If there anything lightweight I could
>> make? I guess I could write a Perl daemon that runs that ps command
>> every 10 seconds or something and logs the output to a file... Seems
>> like the sort of thing that should all ready be available though...
>>      
> My suggestion was not about setting up nagios if you''re not
already using
> it. You could start using sar[1] or just write a plain shell script doing
> the monitoring for you:
> while /bin/true; do
>    WHATEVER_PS_COMMAND_YOU_LIKE_BEST>  \
>             /var/log/mymemlog/$(date +%Y-%m-%d_-_%H.%M.%S)
>    sleep 10
> done
>    
Beautiful thank you :)
> ...and you''ll get memstats every 10 seconds saved in a log file
for further
> analysis.
>
> Another option would be to check your already existing log files for
> "oomkiller" messages. They could give hints on the processes
eating up all
> your memory.
>    
Will do.
> Further this is a general issue with Linux servers running out of memory
> and is not related to Xen or a Xen issue. You might as well want to have a
> look at sites serverfault[2] or you might want to do it the other way
> around and limit memory for the available applications and users. Just have
> a look at /etc/security/limits.conf for example. Then sit down and wait for
> the first service dying... ;-)
>    
I wasn''t sure if there was something common in xen that I needed to 
setup to stop this.
> Another option could be to add more swap space (as this is usually cheaper
> than ram). That way your problem might "disappear". On the other
hand you
> should plan your (virtual) machines with expected memory consumption in
> mind so that using swap space will not happen at all (or just in case of
> emergency preventing the oomkiller to snap in).
>    
I don''t want to throw more memory at it, I''d rather figure out
what''s
going wrong and why.


Thanks for the detailed responce :)


Lyle


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Apparently Analagous Threads

Search for more reasonably related threads

Xen users - Jul 2010 - Why does my DomU keep going mad?

[Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Re: [Xen-users] Why does my DomU keep going mad?

Apparently Analagous Threads