Hi, We have a very annoying issue we''ve been unable to iron out. I''ve googled and browsed through the complete xen-devel and xen-user archives, but it seems we''re the only ones having this issue, although I somehow doubt we really are. Once in a while xend simply crashes. When we don''t touch anything it will keep running without problems. But sometimes, usually when we run ''xm create'', xend crashes, leaving a ''Domain-Unnamed'' behind, which we then have to destroy manually. Quite annoying, especially when your provisioning runs completely unattended. We don''t have any idea where to look. The xen logs are completely useless, no clue what could be wrong whatsoever. A strace didn''t provide much useful info and nothing related in the output of ''xm dmesg'' either. I suspect the problem is caused by an I/O bottleneck. I noticed that ever since I moved the xenstore to a ramdisk (tmpfs), xend crashes less often, but it still happens.. If our I/O bottleneck is indeed the problem, how can I verify that? And shouldn''t xend be more resilient against these types of issues? Have there been any patches in xend related to such issues? Can I increase the verbosity of xend logging perhaps? Some background info: We run Xen 3.3.2 on Fedora 11 with Linux 2.6.30.7 (Fedora 11 kernel with forward ported Xen patches from OpenSUSE). Maybe it''s an unusual setup, but apart from this issue it''s actually perfectly stable. We didn''t have much luck with Xen 3.4, so we decided to stick with 3.3.2 for now. The dom0 has 1Gb memory, which should be enough (and most of it is unused). Can anyone point us into the right direction on how to debug this issue? I don''t have much knowledge about xen internals, so I''d appreciate any pointers. Thanks! :) -- Dennis Krul <dweazle@gmail.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 30, 2009 at 2:51 PM, Dennis Krul <dweazle@gmail.com> wrote:> Hi, > > We have a very annoying issue we''ve been unable to iron out. I''ve googled > and browsed through the complete xen-devel and xen-user archives, but it > seems we''re the only ones having this issue, although I somehow doubt we > really are. > > Once in a while xend simply crashes. When we don''t touch anything it will > keep running without problems. But sometimes, usually when we run ''xm > create'', xend crashes, leaving a ''Domain-Unnamed'' behind, which we then have > to destroy manually. Quite annoying, especially when your provisioning runs > completely unattended. > > We don''t have any idea where to look. The xen logs are completely useless, > no clue what could be wrong whatsoever. A strace didn''t provide much useful > info and nothing related in the output of ''xm dmesg'' either. > > I suspect the problem is caused by an I/O bottleneck. I noticed that ever > since I moved the xenstore to a ramdisk (tmpfs), xend crashes less often, > but it still happens.. If our I/O bottleneck is indeed the problem, how can > I verify that? And shouldn''t xend be more resilient against these types of > issues? Have there been any patches in xend related to such issues? Can I > increase the verbosity of xend logging perhaps? > > Some background info: We run Xen 3.3.2 on Fedora 11 with Linux 2.6.30.7 > (Fedora 11 kernel with forward ported Xen patches from OpenSUSE). Maybe it''s > an unusual setup, but apart from this issue it''s actually perfectly stable. > We didn''t have much luck with Xen 3.4, so we decided to stick with 3.3.2 for > now. The dom0 has 1Gb memory, which should be enough (and most of it is > unused). > > Can anyone point us into the right direction on how to debug this issue? I > don''t have much knowledge about xen internals, so I''d appreciate any > pointers. Thanks! :) > > -- > Dennis Krul <dweazle@gmail.com> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >The Xenserver product uses a different xend which is written in ocaml and is supposed to be a lot more robust, I believe somebody rewrote the open source xend in ocaml for the same reasons so you could try that. I seem to recall threads about running xend in a stubdom so that might be a option. Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 30, 2009 at 03:32:28PM +0000, Andrew Lyon wrote:> On Mon, Nov 30, 2009 at 2:51 PM, Dennis Krul <dweazle@gmail.com> wrote: > > Hi, > > > > We have a very annoying issue we''ve been unable to iron out. I''ve googled > > and browsed through the complete xen-devel and xen-user archives, but it > > seems we''re the only ones having this issue, although I somehow doubt we > > really are. > > > > Once in a while xend simply crashes. When we don''t touch anything it will > > keep running without problems. But sometimes, usually when we run ''xm > > create'', xend crashes, leaving a ''Domain-Unnamed'' behind, which we then have > > to destroy manually. Quite annoying, especially when your provisioning runs > > completely unattended. > > > > We don''t have any idea where to look. The xen logs are completely useless, > > no clue what could be wrong whatsoever. A strace didn''t provide much useful > > info and nothing related in the output of ''xm dmesg'' either. > > > > I suspect the problem is caused by an I/O bottleneck. I noticed that ever > > since I moved the xenstore to a ramdisk (tmpfs), xend crashes less often, > > but it still happens.. If our I/O bottleneck is indeed the problem, how can > > I verify that? And shouldn''t xend be more resilient against these types of > > issues? Have there been any patches in xend related to such issues? Can I > > increase the verbosity of xend logging perhaps? > > > > Some background info: We run Xen 3.3.2 on Fedora 11 with Linux 2.6.30.7 > > (Fedora 11 kernel with forward ported Xen patches from OpenSUSE). Maybe it''s > > an unusual setup, but apart from this issue it''s actually perfectly stable. > > We didn''t have much luck with Xen 3.4, so we decided to stick with 3.3.2 for > > now. The dom0 has 1Gb memory, which should be enough (and most of it is > > unused). > > > > Can anyone point us into the right direction on how to debug this issue? I > > don''t have much knowledge about xen internals, so I''d appreciate any > > pointers. Thanks! :) > > > > -- > > Dennis Krul <dweazle@gmail.com> > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > > The Xenserver product uses a different xend which is written in ocaml > and is supposed to be a lot more robust, I believe somebody rewrote > the open source xend in ocaml for the same reasons so you could try > that. > > I seem to recall threads about running xend in a stubdom so that might > be a option. >And XenServer is opensource now.. so you could always try the ocaml xend. Althought I''m not sure if it''s easy/direct replacement.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The observation that speeding up xenstore reduces the frequency of crashes is interesting. Perhaps the failure happens when a concurrent transaction causes an abort? Maybe you could provoke it by running ''xm create'' in a loop while also writing somewhere in xenstore? IIRC (although I could be mistaken) the standard C xenstore considers all concurrent transactions to be conflicting even if they operate on disjoint parts of the tree so provoking an abort would be easy.> And XenServer is opensource now.. so you could always try the ocaml > xend. > > Althought I''m not sure if it''s easy/direct replacement..Sorry, couldn''t resist: <begin advert> Feel free to give it a go. Although it''s still in development (it''s in a bit of a stabilization phase atm) mysterious toolstack crashes / segfaults are rare (famous last words?). The kind of bugs it''s currently suffering from are mostly to do with the new functionality we''ve been integrating recently e.g. RBAC, ballooning etc. For more normal stuff it ought to be pretty good. Caveats: 1. We don''t have an ''xm''... instead there''s a CLI called ''xe'' which can do almost everything the API can do but the syntax is different to ''xm''. You''d either have to port your scripts (''xe vm-start'' rather than ''xm create''?) or write some kind of wrapper. 2. It''s much easier to install and use the whole integrated patched xen + patched qemu + dom0 + toolstack rather than transplant the toolstack onto another dom0. I''m sure it''s possible but we''ve been focusing dev + test on the single environment. http://www.xen.org/products/cloud_source.html <end advert> FWIW we also use an ocaml xenstore which handles concurrent transactions efficiently. There are some performance graphs here: http://thomas.gazagnaire.com/pub/GH09.pdf The reason we rewrote xenstored was because we used xenstore to report periodic guest performance stats to dom0. By doing this we accidentally created a horrible scalability bottleneck where, somewhere around 30 or 40 guests, every transaction aborted and the system livelocked. The new xenstored is smart enough to realize that these separate transactions are not conflicting and can be committed together. Cheers, Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > And XenServer is opensource now.. so you could always try the ocaml > xend.really? where can i find the parts that have recently been open sourced?> > Althought I''m not sure if it''s easy/direct replacement.. > > -- Pasi > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 30, 2009 at 09:13:52PM +0000, Andrew Lyon wrote:> > > > And XenServer is opensource now.. so you could always try the ocaml > > xend. > > really? where can i find the parts that have recently been open sourced? >Announcements: http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00112.html http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00117.html http://blog.xen.org/index.php/2009/11/03/xen-org-announces-availability-of-xen-cloud-platform-0-1/ http://blog.xen.org/index.php/2009/11/03/xapi-toolstack-release-details/ Sources: http://www.xen.org/products/cloud_source.html -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 30, 2009 at 10:02 PM, Dave Scott <Dave.Scott@eu.citrix.com>wrote: The observation that speeding up xenstore reduces the frequency of crashes> is interesting. Perhaps the failure happens when a concurrent transaction > causes an abort? Maybe you could provoke it by running ''xm create'' in a loop > while also writing somewhere in xenstore? IIRC (although I could be > mistaken) the standard C xenstore considers all concurrent transactions to > be conflicting even if they operate on disjoint parts of the tree so > provoking an abort would be easy. >Hey Dave, Thanks for responding! This actually sounds quite plausible.> Caveats: > 1. We don''t have an ''xm''... instead there''s a CLI called ''xe'' which can do > almost everything the API can do but the syntax is different to ''xm''. You''d > either have to port your scripts (''xe vm-start'' rather than ''xm create''?) or > write some kind of wrapper. >That shouldn''t be too difficult :) The reason we rewrote xenstored was because we used xenstore to report> periodic guest performance stats to dom0. By doing this we accidentally > created a horrible scalability bottleneck where, somewhere around 30 or 40 > guests, every transaction aborted and the system livelocked. The new > xenstored is smart enough to realize that these separate transactions are > not conflicting and can be committed together. >We also have a couple of scripts that periodically collect statistics from the xenstore. We haven''t seen any livelocks, but perhaps the xend crashes are caused by the same limitation. The xend crashes don''t seem to happen until we actually have some (20+?) domU''s running. I''d like to try to get the ocaml toolchain (xend/xenstore/xe) working with the community version of the hypervisor (preferably 3.3.2) and our custom dom0 kernel. Do you think I have any chance of succeeding? Or are they really incompatible and need heavy patching to make it work? (In the latter case I''ll just try the XenServer stack instead.) Final question. There also seems to be an opensource version of XenServer published on the citrix site here: http://www.citrix.com/lang/English/lp/lp_1688623.asp Are those the same iso''s as the ones on the xen site? (at http://www.xen.org/products/cloud_source.html)<http://www.xen.org/products/cloud_source.html> Thanks again! -- Dennis Krul <dweazle@gmail.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dennis wrote:> I''d like to try to get the ocaml toolchain (xend/xenstore/xe) working with > the community version of the hypervisor (preferably 3.3.2) and our custom > dom0 kernel. Do you think I have any chance of succeeding? Or are they > really incompatible and need heavy patching to make it work? (In the latter > case I''ll just try the XenServer stack instead.)I think heavy patching would be involved. The ocaml ''xapi toolstack'' currently depends on qemu-xen-3.4 and xen-3.4, both with patchqueues. I''d be much easier to use the complete set of bits: http://www.xen.org/products/cloud_source.html> Final question. There also seems to be an opensource version of XenServer > published on the citrix site here: > http://www.citrix.com/lang/English/lp/lp_1688623.asp > > Are those the same iso''s as the ones on the xen site? (at http://www.xen.org/products/cloud_source.html)They''re different: those source isos are for the OSS components of a previous release of XenServer, from before the open-sourcing of the ocaml toolchain. They''ll have all the dom0, xen, qemu bits on there but they''ll be missing xapi/xenstore/xe. Future versions of XenServer will be based on the OSS ocaml toolchain and their source isos will include all the ocaml stuff ISYWIM. Cheers, Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel