thr3ads.net - Xen users - [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Rudi Ahlers

2010-Jan-28 11:30 UTC

[Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Hi,

I would like to get some input from people who have used these options for
mounting a remote server to a local server. Basically, I need to replicate /
backup data from one server to another, but over the internet (i.e. insecure
channels)

Currently we have been mounting an SMB share over SSH, but it''s got
it''s own
set of problems. And I don''t know if this is optimal, or if I could
setup
something better. We don''t have much control over the remote server, so
I
couldn''t setup a VPN, or iSCSI or anything else. My options was FTP
& SMB.

But I want to move the backups in-house, to save bandwidth and have more
control over what we do.

So, with a new CentOS server & 2x1TB HDD''s in RAID1 configuration,
I can do
pretty much whatever I want. The backup server(s) will serve backups for
multiple servers, in different data centers (possible in different counties
as well, I still need to think about this), so my biggest concern is
security.

We mainly use cPanel & DotNetPanel (Windows ServerS) , but also WebMin &
VirtualMin, so I need to stick with their native backup procedures and
don''t
really want to use a too technical backup system.

The end users need access to the data 24/7, so having the remote share
permanently mounted seems to be the best for this, then our support staff
don''t need to SSH into the servers and download the backups. With the
mount,
I can also use rsync backups, so an end user could restore only a single
file if need be.



NOW, the question is: Which protocol would be best for this? I can only
think of SMB, NFS & iSCSI
The SMB mounts have worked well so far, but it''s not as safe, and once
the
SMB share is mounted, I can''t unmount it until the server reboots. This
isn''t necessarily a bad thing, but sometime the backup script will
mount the
share again (I think this is a bug in cPanel) and we end up with 4 or 5 open
connection to the remote server.

NFS - last time I looked at it was on V3, which was IMO rather slow &
insecure.

iSCSI - this doesn''t allow for more than one connect to the same share.
Sometimes I user might want to download a backup directly from the backup
server via FTP / SSH / a web interface, which I don''t think will work.
We
also sometimes need to restore a backup on a different server (if for
example the HDD on the initial server is too full), so this isn''t
possible.

The remote shares also need to be mounted inside XEN domU''s, or
directly on
CentOS / Windows servers.


what would be my best option for this?




-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

francisco javier funes nieto

2010-Jan-28 11:47 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Maybe SSHFS?  http://fuse.sourceforge.net/sshfs.html

I didn''t used it, but it''s here! ;-)

J.


2010/1/28 Rudi Ahlers <Rudi@softdux.com>:>
> Hi,
>
> I would like to get some input from people who have used these options for
> mounting a remote server to a local server. Basically, I need to replicate
/
> backup data from one server to another, but over the internet (i.e.
insecure
> channels)
>
> Currently we have been mounting an SMB share over SSH, but it''s
got it''s own
> set of problems. And I don''t know if this is optimal, or if I
could setup
> something better. We don''t have much control over the remote
server, so I
> couldn''t setup a VPN, or iSCSI or anything else. My options was
FTP & SMB.
>
> But I want to move the backups in-house, to save bandwidth and have more
> control over what we do.
>
> So, with a new CentOS server & 2x1TB HDD''s in RAID1
configuration, I can do
> pretty much whatever I want. The backup server(s) will serve backups for
> multiple servers, in different data centers (possible in different counties
> as well, I still need to think about this), so my biggest concern is
> security.
>
> We mainly use cPanel & DotNetPanel (Windows ServerS) , but also WebMin
&
> VirtualMin, so I need to stick with their native backup procedures and
don''t
> really want to use a too technical backup system.
>
> The end users need access to the data 24/7, so having the remote share
> permanently mounted seems to be the best for this, then our support staff
> don''t need to SSH into the servers and download the backups. With
the mount,
> I can also use rsync backups, so an end user could restore only a single
> file if need be.
>
>
>
> NOW, the question is: Which protocol would be best for this? I can only
> think of SMB, NFS & iSCSI
> The SMB mounts have worked well so far, but it''s not as safe, and
once the
> SMB share is mounted, I can''t unmount it until the server reboots.
This
> isn''t necessarily a bad thing, but sometime the backup script will
mount the
> share again (I think this is a bug in cPanel) and we end up with 4 or 5
open
> connection to the remote server.
>
> NFS - last time I looked at it was on V3, which was IMO rather slow &
> insecure.
>
> iSCSI - this doesn''t allow for more than one connect to the same
share.
> Sometimes I user might want to download a backup directly from the backup
> server via FTP / SSH / a web interface, which I don''t think will
work. We
> also sometimes need to restore a backup on a different server (if for
> example the HDD on the initial server is too full), so this isn''t
possible.
>
> The remote shares also need to be mounted inside XEN domU''s, or
directly on
> CentOS / Windows servers.
>
>
> what would be my best option for this?
>
>
>
>
> --
> Kind Regards
> Rudi Ahlers
> SoftDux
>
> Website: http://www.SoftDux.com
> Technical Blog: http://Blog.SoftDux.com
> Office: 087 805 9573
> Cell: 082 554 7532
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>


-- 
_____________________________________________

Francisco Javier Funes Nieto [esencia@gmail.com]
CANONIGOS
Servicios Informáticos para PYMES.
Cl. Cruz 2, 1º Oficina 7
Tlf: 958.536759 / 661134556
Fax: 958.521354
GRANADA - 18002

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2010-Jan-28 11:59 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

On Thu, Jan 28, 2010 at 1:47 PM, francisco javier funes nieto <
esencia@gmail.com> wrote:
> Maybe SSHFS?  http://fuse.sourceforge.net/sshfs.html
>
> I didn''t used it, but it''s here! ;-)
>
> J.
>
>
>  ________
>
> Francisco Javier Funes Nieto [esencia@gmail.com]
> CANONIGOS
> Servicios Informáticos para PYMES.
> Cl. Cruz 2, 1º Oficina 7
> Tlf: 958.536759 / 661134556
> Fax: 958.521354
> GRANADA - 18002
>
> _______________________________________________
>

Thanx, I did try it, and it needs extra kernel modules to be installed,
which is sometimes a problem with the client''s domU''s. i.e.
when they
rebuild the domU''s, then we need to manually add the extra kernel
modules -
so it creates extra load ofn the support techs.
-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

francisco javier funes nieto

2010-Jan-28 12:06 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

You can use Bacula too (for a complete backup solution) with encrypted
communication provided by TLS.

It''s a great piece of software! I''ve worked with it since 1.38
version.

http://www.bacula.org
http://bacula.org/5.0.x-manuals/en/main/main/Bacula_TLS_Communications.html


2010/1/28 Rudi Ahlers <Rudi@softdux.com>:> On Thu, Jan 28, 2010 at 1:47 PM, francisco javier funes nieto
> <esencia@gmail.com> wrote:
>>
>> Maybe SSHFS?  http://fuse.sourceforge.net/sshfs.html
>>
>> I didn''t used it, but it''s here! ;-)
>>
>> J.
>>
>>
>>  ________
>>
>> Francisco Javier Funes Nieto [esencia@gmail.com]
>> CANONIGOS
>> Servicios Informáticos para PYMES.
>> Cl. Cruz 2, 1º Oficina 7
>> Tlf: 958.536759 / 661134556
>> Fax: 958.521354
>> GRANADA - 18002
>>
>> _______________________________________________
>
>
> Thanx, I did try it, and it needs extra kernel modules to be installed,
> which is sometimes a problem with the client''s domU''s.
i.e. when they
> rebuild the domU''s, then we need to manually add the extra kernel
modules -
> so it creates extra load ofn the support techs.
> --
> Kind Regards
> Rudi Ahlers
> SoftDux
>
> Website: http://www.SoftDux.com
> Technical Blog: http://Blog.SoftDux.com
> Office: 087 805 9573
> Cell: 082 554 7532
>


-- 
_____________________________________________

Francisco Javier Funes Nieto [esencia@gmail.com]
CANONIGOS
Servicios Informáticos para PYMES.
Cl. Cruz 2, 1º Oficina 7
Tlf: 958.536759 / 661134556
Fax: 958.521354
GRANADA - 18002

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Simon Hobson

2010-Jan-28 12:09 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

For Linux machines you have to look hard to beat Rsync - it''s very 
efficient and supports encryption and rate limiting. Sadly not too 
much help for Windows machines.

Regarding the options you''ve listed, iSCSI doesn''t support
mounting a
volume from multiple places unless you use a shared filesystem. That 
measns you''d need to either use a shared filesystem, use separate 
volumes for each client, or use some sort of locking mechanism to 
prevent multiple mounts. It also doesn''t do encryption, so for remote 
sites you''d have to use a VPN - but I''d suggest doing that
anyway as
having a unified network has many advantages.

If you do put a VPN in place, then for Windows stuff it might be 
worth you looking at Microsoft''s DPM (Data Protection Manager). I 
know nothing about it, but the guys at work who deal with the MS 
stuff have been raving about it like someone''s invented sliced bread.

-- 
Simon Hobson

WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon 
part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a 
dead HD and need this CD so I can replace the disk and re-install the 
printer OS on it. If anyone knows where I might get hold of one I''d 
be grateful - requests to Canon drew a blank, it''s been out of 
support for years.
Alternatively, if anyone has one of these and would let me image 
their hard disk ...

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2010-Jan-28 13:30 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

On Thu, Jan 28, 2010 at 2:06 PM, francisco javier funes nieto <
esencia@gmail.com> wrote:
> You can use Bacula too (for a complete backup solution) with encrypted
> communication provided by TLS.
>
> It''s a great piece of software! I''ve worked with it since
1.38 version.
>
> http://www.bacula.org
> http://bacula.org/5.0.x-manuals/en/main/main/Bacula_TLS_Communications.html
>
>
>
I don''t want to replace the backup system I have on the clients. My
question
is rather related to the type of remote system I backup to, and in terms of
speed / reliability / security.

The server has RAID, and will be rsynced to a 2nd backup server, so
that''s
not my concern either.




-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Javier Guerra

2010-Jan-28 14:51 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

On Thu, Jan 28, 2010 at 6:30 AM, Rudi Ahlers <Rudi@softdux.com>
wrote:> Basically, I need to replicate /
> backup data from one server to another, but over the internet (i.e.
insecure
> channels)

it would be _really_ hard to find anything better than rsync.  Both
because of safety (uses ssh by default) and efficiency (copies only
what''s needed)

if you need point-in-time snapshots while the servers are running, the
simplest way is to do an LVM snapshot, mount it (as read-only) and
rsync from this to the remote server.  afterwards simply destroy the
snapshot.

-- 
Javier

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2010-Jan-28 22:19 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

On Thu, Jan 28, 2010 at 4:51 PM, Javier Guerra <javier@guerrag.com> wrote:
> On Thu, Jan 28, 2010 at 6:30 AM, Rudi Ahlers <Rudi@softdux.com>
wrote:
> > Basically, I need to replicate /
> > backup data from one server to another, but over the internet (i.e.
> insecure
> > channels)
>
>
> it would be _really_ hard to find anything better than rsync.  Both
> because of safety (uses ssh by default) and efficiency (copies only
> what''s needed)
>
> if you need point-in-time snapshots while the servers are running, the
> simplest way is to do an LVM snapshot, mount it (as read-only) and
> rsync from this to the remote server.  afterwards simply destroy the
> snapshot.
>
>
ok, forget about rsync. forget about how I get the data onto the order
server. WHICH filesystem would be best for this type of operation? SMB, NFS,
or iSCSI?

-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

francisco javier funes nieto

2010-Jan-28 22:35 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

SMB, NFS and iSCSI are protocols, not filesystems.




2010/1/28 Rudi Ahlers <Rudi@softdux.com>:>
>
> On Thu, Jan 28, 2010 at 4:51 PM, Javier Guerra <javier@guerrag.com>
wrote:
>>
>> On Thu, Jan 28, 2010 at 6:30 AM, Rudi Ahlers <Rudi@softdux.com>
wrote:
>> > Basically, I need to replicate /
>> > backup data from one server to another, but over the internet
(i.e.
>> > insecure
>> > channels)
>>
>>
>> it would be _really_ hard to find anything better than rsync.  Both
>> because of safety (uses ssh by default) and efficiency (copies only
>> what''s needed)
>>
>> if you need point-in-time snapshots while the servers are running, the
>> simplest way is to do an LVM snapshot, mount it (as read-only) and
>> rsync from this to the remote server.  afterwards simply destroy the
>> snapshot.
>>
>
>
> ok, forget about rsync. forget about how I get the data onto the order
> server. WHICH filesystem would be best for this type of operation? SMB,
NFS,
> or iSCSI?
> --
> Kind Regards
> Rudi Ahlers
> SoftDux
>
> Website: http://www.SoftDux.com
> Technical Blog: http://Blog.SoftDux.com
> Office: 087 805 9573
> Cell: 082 554 7532
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>


-- 
_____________________________________________

Francisco Javier Funes Nieto [esencia@gmail.com]
CANONIGOS
Servicios Informáticos para PYMES.
Cl. Cruz 2, 1º Oficina 7
Tlf: 958.536759 / 661134556
Fax: 958.521354
GRANADA - 18002

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Simon Hobson

2010-Jan-29 09:15 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Rudi Ahlers wrote:
>ok, forget about rsync. forget about how I get the data onto the 
>order server. WHICH filesystem would be best for this type of 
>operation? SMB, NFS, or iSCSI?
As already stated, iSCSI is **NOT** a filesystem or a share of a 
filesystem - it is a network block device. You CANNOT share an iSCSI 
volume between multiple guests without running a cluster filesystem. 
If you use iSCSI, then you need to do one of three things :
1) Use a cluster file system on all the guests
2) Use a separate volume for each guest
3) Come up with some form of locking mechanism to allow one guest at 
a time to mount the volume
Mounting a volume on two guests without a cluster file system is 
guaranteed to trash the filesystem on the volume.

As to SMB vs NFS, a lot depends on the filesystem semantics your 
backup process needs. SMB should support WIndows file system 
sematics/metadata, NFS only supports Unix file system 
semantics/metadata. If that matters then the decision is made for you 
- eg if the backup is storing Windows files natively on the backup 
filesystem then you''ll have to use SMB in order to retain the file 
metadata.

Also, when comparing (or asking about) file system performance, you 
need to specify the conditions. Performance is likely to be different 
between a setup storing individual files (ie lots of 
create,write,close,update directory operations) and a single large 
archive setup (ie where the backup program creates a big file and 
streams the backup data into it).
I don''t personally have any data on this either way.

-- 
Simon Hobson

WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon 
part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a 
dead HD and need this CD so I can replace the disk and re-install the 
printer OS on it. If anyone knows where I might get hold of one I''d 
be grateful - requests to Canon drew a blank, it''s been out of 
support for years.
Alternatively, if anyone has one of these and would let me image 
their hard disk ...

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Rudi Ahlers

2010-Jan-29 09:57 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

On Fri, Jan 29, 2010 at 11:15 AM, Simon Hobson
<linux@thehobsons.co.uk>wrote:
> Rudi Ahlers wrote:
>
>  ok, forget about rsync. forget about how I get the data onto the order
>> server. WHICH filesystem would be best for this type of operation? SMB,
NFS,
>> or iSCSI?
>>
>
> As already stated, iSCSI is **NOT** a filesystem or a share of a filesystem
> - it is a network block device. You CANNOT share an iSCSI volume between
> multiple guests without running a cluster filesystem. If you use iSCSI,
then
> you need to do one of three things :
> 1) Use a cluster file system on all the guests
> 2) Use a separate volume for each guest
> 3) Come up with some form of locking mechanism to allow one guest at a time
> to mount the volume
> Mounting a volume on two guests without a cluster file system is guaranteed
> to trash the filesystem on the volume.
>
>Fair enough, but iSCSI is commonly used on NAS devices, and then export
whatever filesystem is being used to the host. Which is why I am considering
it.


> As to SMB vs NFS, a lot depends on the filesystem semantics your backup
> process needs. SMB should support WIndows file system sematics/metadata,
NFS
> only supports Unix file system semantics/metadata. If that matters then the
> decision is made for you - eg if the backup is storing Windows files
> natively on the backup filesystem then you''ll have to use SMB in
order to
> retain the file metadata.
>
There is a mixture of Windows & Linux data, but would NFS give me better
performance for the Linux hosts?


>
Also, when comparing (or asking about) file system performance, you need
to> specify the conditions. Performance is likely to be different between a
> setup storing individual files (ie lots of create,write,close,update
> directory operations) and a single large archive setup (ie where the backup
> program creates a big file and streams the backup data into it).
> I don''t personally have any data on this either way.
>
> sure, understandable, but this is almost a different subject :) The datathat goes on there will be a mix of smalls files & large files


>
> --
> Simon Hobson
>
>
>

-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Simon Hobson

2010-Jan-29 11:55 UTC

head link

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Rudi Ahlers wrote:
>Fair enough, but iSCSI is commonly used on NAS devices, and then 
>export whatever filesystem is being used to the host. Which is why I 
>am considering it.
In which case it''s not a case of iSCSI vs <something>,
it''s NAS+iSCSI
vs <something>.
>There is a mixture of Windows & Linux data, but would NFS give me 
>better performance for the Linux hosts?
Personally, I would choose NFS over SMB for storing Linux files - for 
the simple reason that the file system semantics are directly 
compatible with the files being stored. I''ve no idea how it compares 
with SMB though - and in any case again it''s not a case of NFS vs 
SMB, it''s <implementation of NFS> vs <implementation of
NFS>. It''s
entirely possible that vendor A''s box does SMB better while vendor 
B''s box does NFS better.

I''d still choose Rsync if available for a Linux client.

-- 
Simon Hobson

WANTED: "Software CD ROM Kit" for Canon CLBP 360-PS printer (Canon 
part no RH6-3612, or possibly RH6-3810, or RH6-3610 might do). I''ve a 
dead HD and need this CD so I can replace the disk and re-install the 
printer OS on it. If anyone knows where I might get hold of one I''d 
be grateful - requests to Canon drew a blank, it''s been out of 
support for years.
Alternatively, if anyone has one of these and would let me image 
their hard disk ...

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dana Rawding

2010-Jan-31 16:23 UTC

head link

[Xen-users] BUG: soft lockup

Hi all,

I''ve been experiencing a rash of CPU lockups on a number of
domU''s recently.  It''s been happening on two different
servers.  About a year ago I had this problem every once in a while but it was
not frequent.  I was running Ubuntu with Xen 3.1 and 2.6.24-18 back then. 
I''m now running Xen 3.3 and 2.6.24-26.

What I have noticed is that just prior to the lockups the domU''s had
high cpu loads.  The domU that I have the most problems with is a Zimbra server.
My guess is that a rash of spam comes through and cpu loads get high, then the
cpu''s lock up.  Originally I had it running with 1 cpu but have since
upped it 2 then 3 cpu''s.

I have been collecting the lockup messages and have posed a few below.  Any
ideas?  Recommendations?

Thanks,
Dana


[138077.172283]  ======================[138075.147398] BUG: soft lockup - CPU#0
stuck for 11s! [kswapd0:97]
[138075.147411] 
[138075.147419] Pid: 97, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
[138075.147426] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 0
[138075.147441] EIP is at _spin_lock+0x7/0x10
[138075.147447] EAX: c1da48ec EBX: 00000000 ECX: 220c7000 EDX: 00000000
[138075.147453] ESI: 8b804067 EDI: c1da48ec EBP: 00000f28 ESP: ed707dec
[138075.147459]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[138075.147471] CR0: 8005003b CR2: 080f0010 CR3: 2213b000 CR4: 00000660
[138075.147482] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[138075.147488] DR6: ffff0ff0 DR7: 00000400
[138075.147495]  [<c01773cb>] page_check_address+0x1cb/0x3c0
[138075.147514]  [<c0119868>] xen_invlpg_mask+0x38/0x40
[138075.147529]  [<c017762e>] page_referenced_one+0x6e/0x190
[138075.147541]  [<c017875c>] page_referenced+0xec/0x130
[138075.147552]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[138075.147567]  [<c016826d>] shrink_zone+0xdd/0x100
[138075.147578]  [<c01688cc>] kswapd+0x44c/0x490
[138075.147589]  [<c013bb00>] autoremove_wake_function+0x0/0x40
[138075.147603]  [<c011e270>] complete+0x40/0x60
[138075.147614]  [<c0168480>] kswapd+0x0/0x490
[138075.147625]  [<c013b842>] kthread+0x42/0x70
[138075.147635]  [<c013b800>] kthread+0x0/0x70
[138075.147646]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[138075.147658]  ======================[138088.987826] BUG: soft lockup - CPU#1
stuck for 11s! [java:23215]
[138088.987841] 
[138088.987846] Pid: 23215, comm: java Tainted: G      D (2.6.24-26-xen #1)
[138088.987850] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
[138088.987862] EIP is at _spin_lock+0x7/0x10
[138088.987866] EAX: c1da48ec EBX: 00000000 ECX: c1da48e0 EDX: 00000ca8
[138088.987870] ESI: 8b804067 EDI: 00000000 EBP: e20c7ca8 ESP: e226be04
[138088.987873]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[138088.987883] CR0: 80050033 CR2: 940ef020 CR3: 2211f000 CR4: 00000660
[138088.987891] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[138088.987896] DR6: ffff0ff0 DR7: 00000400
[138088.987901]  [<c016d88d>] unmap_vmas+0x43d/0xae0
[138088.987922]  [<c011959c>] kmap_atomic+0x1c/0x30
[138088.987941]  [<c01192fd>] kunmap_atomic+0x3d/0x60
[138088.987957]  [<c0173ee8>] vma_adjust+0x1c8/0x440
[138088.987967]  [<c0173765>] unmap_region+0x95/0x120
[138088.987975]  [<c0174387>] do_munmap+0x147/0x1f0
[138088.987983]  [<c0174c90>] mmap_region+0x70/0x450
[138088.987991]  [<c01db3b7>] security_file_mmap+0x27/0x30
[138088.988001]  [<c0175472>] do_mmap_pgoff+0x312/0x330
[138088.988008]  [<c010a02b>] sys_mmap2+0xbb/0xd0
[138088.988016]  [<c0105832>] syscall_call+0x7/0xb
[138088.988023]  [<c0320000>] svc_accept+0x150/0x410
[138088.988032]  ======================

[66916.451144] BUG: soft lockup - CPU#0 stuck for 11s! [java:2758]
[66928.193453] BUG: soft lockup - CPU#1 stuck for 11s! [java:3419]


[336990.703192] BUG: soft lockup - CPU#1 stuck for 11s! [ps:32586]
[336990.703206] 
[336990.703214] Pid: 32586, comm: ps Tainted: G      D (2.6.24-26-xen #1)
[336990.703221] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
[336990.703235] EIP is at _spin_lock+0x7/0x10
[336990.703241] EAX: c1dbc72c EBX: 00000000 ECX: c1dbc720 EDX: 00000007
[336990.703247] ESI: 57b51067 EDI: 00000001 EBP: e2cb93c8 ESP: e2033e4c
[336990.703253]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[336990.703266] CR0: 80050033 CR2: 08079004 CR3: 23651000 CR4: 00000660
[336990.703275] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[336990.703282] DR6: ffff0ff0 DR7: 00000400
[336990.703288]  [<c0171646>] handle_mm_fault+0xae6/0x1360
[336990.703307]  [<c020e057>] rb_insert_color+0x77/0xe0
[336990.703325]  [<c032a27e>] do_page_fault+0x35e/0xe70
[336990.703337]  [<c01745d4>] vma_merge+0x144/0x1d0
[336990.703349]  [<c0174b75>] do_brk+0x195/0x240
[336990.703362]  [<c0175126>] sys_brk+0xb6/0xf0
[336990.703374]  [<c0329f20>] do_page_fault+0x0/0xe70
[336990.703384]  [<c0328bc5>] error_code+0x35/0x40
[336990.703396]  ======================[337005.938292] BUG: soft lockup - CPU#2
stuck for 11s! [zmlocalconfig:11371]
[337005.938306] 
[337005.938312] Pid: 11371, comm: zmlocalconfig Tainted: G      D (2.6.24-26-xen
#1)
[337005.938318] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 2
[337005.938330] EIP is at _spin_lock+0x7/0x10
[337005.938335] EAX: ec64a870 EBX: ec64a870 ECX: 00000002 EDX: ec64a871
[337005.938339] ESI: 00000000 EDI: c03fe800 EBP: c1261e38 ESP: c1261d7c
[337005.938343]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[337005.938357] CR0: 8005003b CR2: 08128000 CR3: 25d8e000 CR4: 00000660
[337005.938364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[337005.938370] DR6: ffff0ff0 DR7: 00000400
[337005.938376]  [<c01771f0>] page_lock_anon_vma+0x20/0x30
[337005.938391]  [<c01786fd>] page_referenced+0x8d/0x130
[337005.938401]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[337005.938411]  [<c0164286>] get_dirty_limits+0x16/0x200
[337005.938421]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
[337005.938435]  [<c016826d>] shrink_zone+0xdd/0x100
[337005.938444]  [<c0168d72>] try_to_free_pages+0x152/0x250
[337005.938453]  [<c0162fcb>] __alloc_pages+0x14b/0x390
[337005.938463]  [<c01855c5>] do_sync_read+0xd5/0x120
[337005.938475]  [<c0163247>] __get_free_pages+0x37/0x50
[337005.938483]  [<c0124496>] copy_process+0xa6/0x1210
[337005.938493]  [<c0197c34>] d_alloc+0x114/0x1a0
[337005.938503]  [<c0125830>] do_fork+0x40/0x260
[337005.938511]  [<c0210f00>] copy_to_user+0x30/0x60
[337005.938523]  [<c0103226>] sys_clone+0x36/0x40
[337005.938530]  [<c0105832>] syscall_call+0x7/0xb
[337005.938542]  ======================[336990.803889] BUG: soft lockup - CPU#0
stuck for 11s! [kswapd0:103]
[336990.803907] 
[336990.803915] Pid: 103, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
[336990.803922] EIP: 0061:[<c03286ea>] EFLAGS: 00000286 CPU: 0
[336990.803940] EIP is at _spin_lock+0xa/0x10
[336990.803948] EAX: c1dbc86c EBX: 00000000 ECX: 22cc3000 EDX: 00000000
[336990.803955] ESI: 57b47067 EDI: c1dbc86c EBP: 00000ff0 ESP: ed725dec
[336990.803961]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[336990.803976] CR0: 8005003b CR2: b791e6d9 CR3: 23e3b000 CR4: 00000660
[336990.803986] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[336990.803992] DR6: ffff0ff0 DR7: 00000400
[336990.804001]  [<c01773cb>] page_check_address+0x1cb/0x3c0
[336990.804026]  [<c017762e>] page_referenced_one+0x6e/0x190
[336990.804039]  [<c017875c>] page_referenced+0xec/0x130
[336990.804049]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[336990.804064]  [<c0210556>] memmove+0x36/0x40
[336990.804079]  [<c0164286>] get_dirty_limits+0x16/0x200
[336990.804089]  [<c0139857>] call_rcu+0x97/0xa0
[336990.804102]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
[336990.804120]  [<c016826d>] shrink_zone+0xdd/0x100
[336990.804132]  [<c01688cc>] kswapd+0x44c/0x490
[336990.804145]  [<c013bb00>] autoremove_wake_function+0x0/0x40
[336990.804160]  [<c011e270>] complete+0x40/0x60
[336990.804172]  [<c0168480>] kswapd+0x0/0x490
[336990.804183]  [<c013b842>] kthread+0x42/0x70
[336990.804194]  [<c013b800>] kthread+0x0/0x70
[336990.804206]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[336990.804218] 
======================_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-Jan-31 18:12 UTC

head link

Re: [Xen-users] BUG: soft lockup

On Sun, Jan 31, 2010 at 11:23 PM, Dana Rawding <dana@twc-inc.net>
wrote:> Hi all,
>
> I''ve been experiencing a rash of CPU lockups on a number of
domU''s recently.  It''s been happening on two different
servers.  About a year ago I had this problem every once in a while but it was
not frequent.  I was running Ubuntu with Xen 3.1 and 2.6.24-18 back then.
 I''m now running Xen 3.3 and 2.6.24-26.
Since you''re using Ubuntu''s kernel, the Ubuntu-way would be to
report
this bug on bugs.ubuntu.com, and wait until they come up with a fix :P
> What I have noticed is that just prior to the lockups the domU''s
had high cpu loads.  The domU that I have the most problems with is a Zimbra
server.  My guess is that a rash of spam comes through and cpu loads get high,
then the cpu''s lock up.  Originally I had it running with 1 cpu but
have since upped it 2 then 3 cpu''s.
>
> I have been collecting the lockup messages and have posed a few below.  Any
ideas?  Recommendations?
You might want to try using newer kernel. Both vanilla kernel and
Suse''s Xen kernel should work for domU. See
http://wiki.xensource.com/xenwiki/XenDom0Kernels. T

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2010-Feb-01 07:27 UTC

head link

Re: [Xen-users] BUG: soft lockup

On Sun, Jan 31, 2010 at 11:23:36AM -0500, Dana Rawding
wrote:> Hi all,
> 
> I''ve been experiencing a rash of CPU lockups on a number of
domU''s recently.  It''s been happening on two different
servers.  About a year ago I had this problem every once in a while but it was
not frequent.  I was running Ubuntu with Xen 3.1 and 2.6.24-18 back then. 
I''m now running Xen 3.3 and 2.6.24-26.
> 
> What I have noticed is that just prior to the lockups the domU''s
had high cpu loads.  The domU that I have the most problems with is a Zimbra
server.  My guess is that a rash of spam comes through and cpu loads get high,
then the cpu''s lock up.  Originally I had it running with 1 cpu but
have since upped it 2 then 3 cpu''s.
> 
> I have been collecting the lockup messages and have posed a few below.  Any
ideas?  Recommendations?
> 
Please check this wiki page:
http://wiki.xensource.com/xenwiki/XenBestPractices

Are all those OK on your setup?

After those I''d upgrade the dom0 kernel, since Ubuntu''s 2.6.24
is known to be buggy.

-- Pasi
> Thanks,
> Dana
> 
> 
> [138077.172283]  ======================> [138075.147398] BUG: soft
lockup - CPU#0 stuck for 11s! [kswapd0:97]
> [138075.147411] 
> [138075.147419] Pid: 97, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
> [138075.147426] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 0
> [138075.147441] EIP is at _spin_lock+0x7/0x10
> [138075.147447] EAX: c1da48ec EBX: 00000000 ECX: 220c7000 EDX: 00000000
> [138075.147453] ESI: 8b804067 EDI: c1da48ec EBP: 00000f28 ESP: ed707dec
> [138075.147459]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [138075.147471] CR0: 8005003b CR2: 080f0010 CR3: 2213b000 CR4: 00000660
> [138075.147482] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [138075.147488] DR6: ffff0ff0 DR7: 00000400
> [138075.147495]  [<c01773cb>] page_check_address+0x1cb/0x3c0
> [138075.147514]  [<c0119868>] xen_invlpg_mask+0x38/0x40
> [138075.147529]  [<c017762e>] page_referenced_one+0x6e/0x190
> [138075.147541]  [<c017875c>] page_referenced+0xec/0x130
> [138075.147552]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [138075.147567]  [<c016826d>] shrink_zone+0xdd/0x100
> [138075.147578]  [<c01688cc>] kswapd+0x44c/0x490
> [138075.147589]  [<c013bb00>] autoremove_wake_function+0x0/0x40
> [138075.147603]  [<c011e270>] complete+0x40/0x60
> [138075.147614]  [<c0168480>] kswapd+0x0/0x490
> [138075.147625]  [<c013b842>] kthread+0x42/0x70
> [138075.147635]  [<c013b800>] kthread+0x0/0x70
> [138075.147646]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
> [138075.147658]  ======================> [138088.987826] BUG: soft
lockup - CPU#1 stuck for 11s! [java:23215]
> [138088.987841] 
> [138088.987846] Pid: 23215, comm: java Tainted: G      D (2.6.24-26-xen #1)
> [138088.987850] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
> [138088.987862] EIP is at _spin_lock+0x7/0x10
> [138088.987866] EAX: c1da48ec EBX: 00000000 ECX: c1da48e0 EDX: 00000ca8
> [138088.987870] ESI: 8b804067 EDI: 00000000 EBP: e20c7ca8 ESP: e226be04
> [138088.987873]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [138088.987883] CR0: 80050033 CR2: 940ef020 CR3: 2211f000 CR4: 00000660
> [138088.987891] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [138088.987896] DR6: ffff0ff0 DR7: 00000400
> [138088.987901]  [<c016d88d>] unmap_vmas+0x43d/0xae0
> [138088.987922]  [<c011959c>] kmap_atomic+0x1c/0x30
> [138088.987941]  [<c01192fd>] kunmap_atomic+0x3d/0x60
> [138088.987957]  [<c0173ee8>] vma_adjust+0x1c8/0x440
> [138088.987967]  [<c0173765>] unmap_region+0x95/0x120
> [138088.987975]  [<c0174387>] do_munmap+0x147/0x1f0
> [138088.987983]  [<c0174c90>] mmap_region+0x70/0x450
> [138088.987991]  [<c01db3b7>] security_file_mmap+0x27/0x30
> [138088.988001]  [<c0175472>] do_mmap_pgoff+0x312/0x330
> [138088.988008]  [<c010a02b>] sys_mmap2+0xbb/0xd0
> [138088.988016]  [<c0105832>] syscall_call+0x7/0xb
> [138088.988023]  [<c0320000>] svc_accept+0x150/0x410
> [138088.988032]  ======================> 
> 
> [66916.451144] BUG: soft lockup - CPU#0 stuck for 11s! [java:2758]
> [66928.193453] BUG: soft lockup - CPU#1 stuck for 11s! [java:3419]
> 
> 
> [336990.703192] BUG: soft lockup - CPU#1 stuck for 11s! [ps:32586]
> [336990.703206] 
> [336990.703214] Pid: 32586, comm: ps Tainted: G      D (2.6.24-26-xen #1)
> [336990.703221] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
> [336990.703235] EIP is at _spin_lock+0x7/0x10
> [336990.703241] EAX: c1dbc72c EBX: 00000000 ECX: c1dbc720 EDX: 00000007
> [336990.703247] ESI: 57b51067 EDI: 00000001 EBP: e2cb93c8 ESP: e2033e4c
> [336990.703253]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [336990.703266] CR0: 80050033 CR2: 08079004 CR3: 23651000 CR4: 00000660
> [336990.703275] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [336990.703282] DR6: ffff0ff0 DR7: 00000400
> [336990.703288]  [<c0171646>] handle_mm_fault+0xae6/0x1360
> [336990.703307]  [<c020e057>] rb_insert_color+0x77/0xe0
> [336990.703325]  [<c032a27e>] do_page_fault+0x35e/0xe70
> [336990.703337]  [<c01745d4>] vma_merge+0x144/0x1d0
> [336990.703349]  [<c0174b75>] do_brk+0x195/0x240
> [336990.703362]  [<c0175126>] sys_brk+0xb6/0xf0
> [336990.703374]  [<c0329f20>] do_page_fault+0x0/0xe70
> [336990.703384]  [<c0328bc5>] error_code+0x35/0x40
> [336990.703396]  ======================> [337005.938292] BUG: soft
lockup - CPU#2 stuck for 11s! [zmlocalconfig:11371]
> [337005.938306] 
> [337005.938312] Pid: 11371, comm: zmlocalconfig Tainted: G      D
(2.6.24-26-xen #1)
> [337005.938318] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 2
> [337005.938330] EIP is at _spin_lock+0x7/0x10
> [337005.938335] EAX: ec64a870 EBX: ec64a870 ECX: 00000002 EDX: ec64a871
> [337005.938339] ESI: 00000000 EDI: c03fe800 EBP: c1261e38 ESP: c1261d7c
> [337005.938343]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [337005.938357] CR0: 8005003b CR2: 08128000 CR3: 25d8e000 CR4: 00000660
> [337005.938364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [337005.938370] DR6: ffff0ff0 DR7: 00000400
> [337005.938376]  [<c01771f0>] page_lock_anon_vma+0x20/0x30
> [337005.938391]  [<c01786fd>] page_referenced+0x8d/0x130
> [337005.938401]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [337005.938411]  [<c0164286>] get_dirty_limits+0x16/0x200
> [337005.938421]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
> [337005.938435]  [<c016826d>] shrink_zone+0xdd/0x100
> [337005.938444]  [<c0168d72>] try_to_free_pages+0x152/0x250
> [337005.938453]  [<c0162fcb>] __alloc_pages+0x14b/0x390
> [337005.938463]  [<c01855c5>] do_sync_read+0xd5/0x120
> [337005.938475]  [<c0163247>] __get_free_pages+0x37/0x50
> [337005.938483]  [<c0124496>] copy_process+0xa6/0x1210
> [337005.938493]  [<c0197c34>] d_alloc+0x114/0x1a0
> [337005.938503]  [<c0125830>] do_fork+0x40/0x260
> [337005.938511]  [<c0210f00>] copy_to_user+0x30/0x60
> [337005.938523]  [<c0103226>] sys_clone+0x36/0x40
> [337005.938530]  [<c0105832>] syscall_call+0x7/0xb
> [337005.938542]  ======================> [336990.803889] BUG: soft
lockup - CPU#0 stuck for 11s! [kswapd0:103]
> [336990.803907] 
> [336990.803915] Pid: 103, comm: kswapd0 Tainted: G      D (2.6.24-26-xen
#1)
> [336990.803922] EIP: 0061:[<c03286ea>] EFLAGS: 00000286 CPU: 0
> [336990.803940] EIP is at _spin_lock+0xa/0x10
> [336990.803948] EAX: c1dbc86c EBX: 00000000 ECX: 22cc3000 EDX: 00000000
> [336990.803955] ESI: 57b47067 EDI: c1dbc86c EBP: 00000ff0 ESP: ed725dec
> [336990.803961]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [336990.803976] CR0: 8005003b CR2: b791e6d9 CR3: 23e3b000 CR4: 00000660
> [336990.803986] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [336990.803992] DR6: ffff0ff0 DR7: 00000400
> [336990.804001]  [<c01773cb>] page_check_address+0x1cb/0x3c0
> [336990.804026]  [<c017762e>] page_referenced_one+0x6e/0x190
> [336990.804039]  [<c017875c>] page_referenced+0xec/0x130
> [336990.804049]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [336990.804064]  [<c0210556>] memmove+0x36/0x40
> [336990.804079]  [<c0164286>] get_dirty_limits+0x16/0x200
> [336990.804089]  [<c0139857>] call_rcu+0x97/0xa0
> [336990.804102]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
> [336990.804120]  [<c016826d>] shrink_zone+0xdd/0x100
> [336990.804132]  [<c01688cc>] kswapd+0x44c/0x490
> [336990.804145]  [<c013bb00>] autoremove_wake_function+0x0/0x40
> [336990.804160]  [<c011e270>] complete+0x40/0x60
> [336990.804172]  [<c0168480>] kswapd+0x0/0x490
> [336990.804183]  [<c013b842>] kthread+0x42/0x70
> [336990.804194]  [<c013b800>] kthread+0x0/0x70
> [336990.804206]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
> [336990.804218]  ======================>
_______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

alex

2010-Feb-02 20:58 UTC

head link

Re: [Xen-users] BUG: soft lockup

I have this problem too.
Xen 3.3.1 Debian Lenny.
LA on server up to 10-15, all domUs freeze and I can''t do anything.
Please test I fix this problem by xm sched-credit -d 0 -w 512 .

[787717.425090] BUG: soft lockup - CPU#0 stuck for 61s! [watchdog/0:5]
[787717.425090] Modules linked in: xt_tcpudp xt_physdev iptable_filter
ip_tables x_tables tun bridge ipv6 nfsd auth_rpcgss exportfs nfs lockd
nfs_acl sunrpc loop joydev igb psmouse pcspkr i2c_i801 serio_raw button
i2c_core evdev dca ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg
sr_mod cdrom ata_generic usbhid hid ff_memless ata_piix libata dock sd_mod
ide_pci_generic ide_core ehci_hcd uhci_hcd 3w_9xxx scsi_mod thermal
processor fan thermal_sys [last unloaded: scsi_wait_scan]
[787717.432148] CPU 0:
[787717.432148] Modules linked in: xt_tcpudp xt_physdev iptable_filter
ip_tables x_tables tun bridge ipv6 nfsd auth_rpcgss exportfs nfs lockd
nfs_acl sunrpc loop joydev igb psmouse pcspkr i2c_i801 serio_raw button
i2c_core evdev dca ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg
sr_mod cdrom ata_generic usbhid hid ff_memless ata_piix libata dock sd_mod
ide_pci_generic ide_core ehci_hcd uhci_hcd 3w_9xxx scsi_mod thermal
processor fan thermal_sys [last unloaded: scsi_wait_scan]
[787717.436173] Pid: 5, comm: watchdog/0 Not tainted 2.6.26-1-xen-amd64 #1
[787717.436173] RIP: e030:[<ffffffff8025ed13>]  [<ffffffff8025ed13>]
watchdog+0xbe/0x1cf
[787717.436173] RSP: e02b:ffff880bce0d9ef0  EFLAGS: 00000207
[787717.436173] RAX: 0000000000000001 RBX: ffff880bcb4e5400 RCX:
0002cc64939f91fe
[787717.436173] RDX: ffff880081656000 RSI: ffffffff804fe460 RDI:
ffffffff8053a000
[787717.436173] RBP: ffff880bcb4e5400 R08: ffff880001be3040 R09:
ffff880bce0d9e30
[787717.436173] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000399
[787717.436173] R13: 00000000000b3192 R14: 0000000000000000 R15:
0000000000000000
[787717.436173] FS:  00007f0cfbb3e6e0(0000) GS:ffffffff80539000(0000)
knlGS:0000000000000000
[787717.436173] CS:  e033 DS: 0000 ES: 0000
[787717.436173] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[787717.436173] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[787717.436173]
[787717.436173] Call Trace:
[787717.436173]  [<ffffffff8025ec55>] ? watchdog+0x0/0x1cf
[787717.436173]  [<ffffffff8023f56b>] ? kthread+0x47/0x74
[787717.436173]  [<ffffffff8022839f>] ? schedule_tail+0x27/0x5c
[787717.436173]  [<ffffffff8020be28>] ? child_rip+0xa/0x12
[787717.436173]  [<ffffffff8023f524>] ? kthread+0x0/0x74
[787717.436173]  [<ffffffff8020be1e>] ? child_rip+0x0/0x12
[787717.436173]



I fix this problem by xm sched-credit -d 0 -w 512 .


2010/1/31 Dana Rawding <dana@twc-inc.net>
> Hi all,
>
> I''ve been experiencing a rash of CPU lockups on a number of
domU''s
> recently.  It''s been happening on two different servers.  About a
year ago I
> had this problem every once in a while but it was not frequent.  I was
> running Ubuntu with Xen 3.1 and 2.6.24-18 back then.  I''m now
running Xen
> 3.3 and 2.6.24-26.
>
> What I have noticed is that just prior to the lockups the domU''s
had high
> cpu loads.  The domU that I have the most problems with is a Zimbra server.
>  My guess is that a rash of spam comes through and cpu loads get high, then
> the cpu''s lock up.  Originally I had it running with 1 cpu but
have since
> upped it 2 then 3 cpu''s.
>
> I have been collecting the lockup messages and have posed a few below.  Any
> ideas?  Recommendations?
>
> Thanks,
> Dana
>
>
> [138077.172283]  ======================> [138075.147398] BUG: soft
lockup - CPU#0 stuck for 11s! [kswapd0:97]
> [138075.147411]
> [138075.147419] Pid: 97, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
> [138075.147426] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 0
> [138075.147441] EIP is at _spin_lock+0x7/0x10
> [138075.147447] EAX: c1da48ec EBX: 00000000 ECX: 220c7000 EDX: 00000000
> [138075.147453] ESI: 8b804067 EDI: c1da48ec EBP: 00000f28 ESP: ed707dec
> [138075.147459]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [138075.147471] CR0: 8005003b CR2: 080f0010 CR3: 2213b000 CR4: 00000660
> [138075.147482] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [138075.147488] DR6: ffff0ff0 DR7: 00000400
> [138075.147495]  [<c01773cb>] page_check_address+0x1cb/0x3c0
> [138075.147514]  [<c0119868>] xen_invlpg_mask+0x38/0x40
> [138075.147529]  [<c017762e>] page_referenced_one+0x6e/0x190
> [138075.147541]  [<c017875c>] page_referenced+0xec/0x130
> [138075.147552]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [138075.147567]  [<c016826d>] shrink_zone+0xdd/0x100
> [138075.147578]  [<c01688cc>] kswapd+0x44c/0x490
> [138075.147589]  [<c013bb00>] autoremove_wake_function+0x0/0x40
> [138075.147603]  [<c011e270>] complete+0x40/0x60
> [138075.147614]  [<c0168480>] kswapd+0x0/0x490
> [138075.147625]  [<c013b842>] kthread+0x42/0x70
> [138075.147635]  [<c013b800>] kthread+0x0/0x70
> [138075.147646]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
> [138075.147658]  ======================> [138088.987826] BUG: soft
lockup - CPU#1 stuck for 11s! [java:23215]
> [138088.987841]
> [138088.987846] Pid: 23215, comm: java Tainted: G      D (2.6.24-26-xen #1)
> [138088.987850] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
> [138088.987862] EIP is at _spin_lock+0x7/0x10
> [138088.987866] EAX: c1da48ec EBX: 00000000 ECX: c1da48e0 EDX: 00000ca8
> [138088.987870] ESI: 8b804067 EDI: 00000000 EBP: e20c7ca8 ESP: e226be04
> [138088.987873]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [138088.987883] CR0: 80050033 CR2: 940ef020 CR3: 2211f000 CR4: 00000660
> [138088.987891] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [138088.987896] DR6: ffff0ff0 DR7: 00000400
> [138088.987901]  [<c016d88d>] unmap_vmas+0x43d/0xae0
> [138088.987922]  [<c011959c>] kmap_atomic+0x1c/0x30
> [138088.987941]  [<c01192fd>] kunmap_atomic+0x3d/0x60
> [138088.987957]  [<c0173ee8>] vma_adjust+0x1c8/0x440
> [138088.987967]  [<c0173765>] unmap_region+0x95/0x120
> [138088.987975]  [<c0174387>] do_munmap+0x147/0x1f0
> [138088.987983]  [<c0174c90>] mmap_region+0x70/0x450
> [138088.987991]  [<c01db3b7>] security_file_mmap+0x27/0x30
> [138088.988001]  [<c0175472>] do_mmap_pgoff+0x312/0x330
> [138088.988008]  [<c010a02b>] sys_mmap2+0xbb/0xd0
> [138088.988016]  [<c0105832>] syscall_call+0x7/0xb
> [138088.988023]  [<c0320000>] svc_accept+0x150/0x410
> [138088.988032]  ======================>
>
> [66916.451144] BUG: soft lockup - CPU#0 stuck for 11s! [java:2758]
> [66928.193453] BUG: soft lockup - CPU#1 stuck for 11s! [java:3419]
>
>
> [336990.703192] BUG: soft lockup - CPU#1 stuck for 11s! [ps:32586]
> [336990.703206]
> [336990.703214] Pid: 32586, comm: ps Tainted: G      D (2.6.24-26-xen #1)
> [336990.703221] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
> [336990.703235] EIP is at _spin_lock+0x7/0x10
> [336990.703241] EAX: c1dbc72c EBX: 00000000 ECX: c1dbc720 EDX: 00000007
> [336990.703247] ESI: 57b51067 EDI: 00000001 EBP: e2cb93c8 ESP: e2033e4c
> [336990.703253]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [336990.703266] CR0: 80050033 CR2: 08079004 CR3: 23651000 CR4: 00000660
> [336990.703275] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [336990.703282] DR6: ffff0ff0 DR7: 00000400
> [336990.703288]  [<c0171646>] handle_mm_fault+0xae6/0x1360
> [336990.703307]  [<c020e057>] rb_insert_color+0x77/0xe0
> [336990.703325]  [<c032a27e>] do_page_fault+0x35e/0xe70
> [336990.703337]  [<c01745d4>] vma_merge+0x144/0x1d0
> [336990.703349]  [<c0174b75>] do_brk+0x195/0x240
> [336990.703362]  [<c0175126>] sys_brk+0xb6/0xf0
> [336990.703374]  [<c0329f20>] do_page_fault+0x0/0xe70
> [336990.703384]  [<c0328bc5>] error_code+0x35/0x40
> [336990.703396]  ======================> [337005.938292] BUG: soft
lockup - CPU#2 stuck for 11s!
> [zmlocalconfig:11371]
> [337005.938306]
> [337005.938312] Pid: 11371, comm: zmlocalconfig Tainted: G      D
> (2.6.24-26-xen #1)
> [337005.938318] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 2
> [337005.938330] EIP is at _spin_lock+0x7/0x10
> [337005.938335] EAX: ec64a870 EBX: ec64a870 ECX: 00000002 EDX: ec64a871
> [337005.938339] ESI: 00000000 EDI: c03fe800 EBP: c1261e38 ESP: c1261d7c
> [337005.938343]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [337005.938357] CR0: 8005003b CR2: 08128000 CR3: 25d8e000 CR4: 00000660
> [337005.938364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [337005.938370] DR6: ffff0ff0 DR7: 00000400
> [337005.938376]  [<c01771f0>] page_lock_anon_vma+0x20/0x30
> [337005.938391]  [<c01786fd>] page_referenced+0x8d/0x130
> [337005.938401]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [337005.938411]  [<c0164286>] get_dirty_limits+0x16/0x200
> [337005.938421]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
> [337005.938435]  [<c016826d>] shrink_zone+0xdd/0x100
> [337005.938444]  [<c0168d72>] try_to_free_pages+0x152/0x250
> [337005.938453]  [<c0162fcb>] __alloc_pages+0x14b/0x390
> [337005.938463]  [<c01855c5>] do_sync_read+0xd5/0x120
> [337005.938475]  [<c0163247>] __get_free_pages+0x37/0x50
> [337005.938483]  [<c0124496>] copy_process+0xa6/0x1210
> [337005.938493]  [<c0197c34>] d_alloc+0x114/0x1a0
> [337005.938503]  [<c0125830>] do_fork+0x40/0x260
> [337005.938511]  [<c0210f00>] copy_to_user+0x30/0x60
> [337005.938523]  [<c0103226>] sys_clone+0x36/0x40
> [337005.938530]  [<c0105832>] syscall_call+0x7/0xb
> [337005.938542]  ======================> [336990.803889] BUG: soft
lockup - CPU#0 stuck for 11s! [kswapd0:103]
> [336990.803907]
> [336990.803915] Pid: 103, comm: kswapd0 Tainted: G      D (2.6.24-26-xen
> #1)
> [336990.803922] EIP: 0061:[<c03286ea>] EFLAGS: 00000286 CPU: 0
> [336990.803940] EIP is at _spin_lock+0xa/0x10
> [336990.803948] EAX: c1dbc86c EBX: 00000000 ECX: 22cc3000 EDX: 00000000
> [336990.803955] ESI: 57b47067 EDI: c1dbc86c EBP: 00000ff0 ESP: ed725dec
> [336990.803961]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [336990.803976] CR0: 8005003b CR2: b791e6d9 CR3: 23e3b000 CR4: 00000660
> [336990.803986] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [336990.803992] DR6: ffff0ff0 DR7: 00000400
> [336990.804001]  [<c01773cb>] page_check_address+0x1cb/0x3c0
> [336990.804026]  [<c017762e>] page_referenced_one+0x6e/0x190
> [336990.804039]  [<c017875c>] page_referenced+0xec/0x130
> [336990.804049]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
> [336990.804064]  [<c0210556>] memmove+0x36/0x40
> [336990.804079]  [<c0164286>] get_dirty_limits+0x16/0x200
> [336990.804089]  [<c0139857>] call_rcu+0x97/0xa0
> [336990.804102]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
> [336990.804120]  [<c016826d>] shrink_zone+0xdd/0x100
> [336990.804132]  [<c01688cc>] kswapd+0x44c/0x490
> [336990.804145]  [<c013bb00>] autoremove_wake_function+0x0/0x40
> [336990.804160]  [<c011e270>] complete+0x40/0x60
> [336990.804172]  [<c0168480>] kswapd+0x0/0x490
> [336990.804183]  [<c013b842>] kthread+0x42/0x70
> [336990.804194]  [<c013b800>] kthread+0x0/0x70
> [336990.804206]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
> [336990.804218]  ======================>
_______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>


-- 
Best Regards,
alex.faq8@gmail.com


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dana Rawding

2010-Feb-03 02:52 UTC

head link

Re: [Xen-users] BUG: soft lockup

On Feb 1, 2010, at 2:27 AM, Pasi Kärkkäinen wrote:
> Please check this wiki page:
> http://wiki.xensource.com/xenwiki/XenBestPractices

I have1.5 GB RAM dedicated to the dom0''s.  It''s probably more
RAM than necessary.  Is there a suggestion as to what this number should be?

The sched-credit was the default 256. I have upped it to 512 per the best
practices and Alex''s suggestion.  I''m hoping this calms things
down.  If not I plan to try a different kernel.

Thanks to everyone for the suggestions.
Dana
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Jan 2010 - NFS vs SMb vs iSCSI for remote backup mounts

[Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

Re: [Xen-users] NFS vs SMb vs iSCSI for remote backup mounts

[Xen-users] BUG: soft lockup

Re: [Xen-users] BUG: soft lockup

Re: [Xen-users] BUG: soft lockup

Re: [Xen-users] BUG: soft lockup

Re: [Xen-users] BUG: soft lockup