Here''s what I''m using to fix it in 3.1. Obviously too risky for 3.2 now, and anyway, there''s a much more serious problem in 3.2 - entire /vm/ entries are leaked on domU reboot! I haven''t got round to debugging that one yet since it''s not present in 3.1, but it really should be fixed regards john # HG changeset patch # User john.levon@sun.com # Date 1200028523 28800 # Node ID 8a26f9a3f573a7c3b970f4349dd47fd4bf851df1 # Parent c594cafb7e427e3b1f732a37c4b521f1e98dd08d Cleanup xenstore after console device teardown After the changes in 13616:b111908dd70b, DevController was leaking xenstore entries every time we took down a console device, as there was no equivalent to ''xenstore-rm -t'' used in the hotplug scripts for "real" devices. Implement the moral equivalent whenever removal is forced. Signed-off-by: John Levon <john.levon@sun.com> diff --git a/tools/python/xen/xend/server/DevController.py b/tools/python/xen/xend/server/DevController.py --- a/tools/python/xen/xend/server/DevController.py +++ b/tools/python/xen/xend/server/DevController.py @@ -231,11 +231,21 @@ class DevController: self.writeBackend(dev, ''state'', str(xenbusState[''Closing''])) if force: - frontpath = self.frontendPath(dev) - backpath = xstransact.Read(frontpath, "backend") - if backpath: - xstransact.Remove(backpath) - xstransact.Remove(frontpath) + try: + frontpath = self.frontendPath(dev) + t = xstransact() + backpath = t.read("%s/backend" % frontpath) + if backpath: + t.remove(backpath) + # tidy up empty directories + while not t.list(backpath): + t.remove(backpath) + backpath = os.path.dirname(backpath) + t.remove(frontpath) + t.commit() + except: + t.abort() + raise self.vm._removeVm("device/%s/%d" % (self.deviceClass, dev)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shouldn''t the transaction be wrapped in a ''while True'' with a break or return on t.commit() returning True? Otherwise you are not correctly handling transaction failure. Please fix and resubmit. Thanks, Keir On 11/1/08 17:39, "John Levon" <levon@movementarian.org> wrote:> > Here''s what I''m using to fix it in 3.1. Obviously too risky for 3.2 now, > and anyway, there''s a much more serious problem in 3.2 - entire /vm/ > entries are leaked on domU reboot! I haven''t got round to debugging that > one yet since it''s not present in 3.1, but it really should be fixed > > regards > john > > > # HG changeset patch > # User john.levon@sun.com > # Date 1200028523 28800 > # Node ID 8a26f9a3f573a7c3b970f4349dd47fd4bf851df1 > # Parent c594cafb7e427e3b1f732a37c4b521f1e98dd08d > Cleanup xenstore after console device teardown > > After the changes in 13616:b111908dd70b, DevController was leaking > xenstore entries every time we took down a console device, as there was > no equivalent to ''xenstore-rm -t'' used in the hotplug scripts for "real" > devices. Implement the moral equivalent whenever removal is forced. > > Signed-off-by: John Levon <john.levon@sun.com> > > diff --git a/tools/python/xen/xend/server/DevController.py > b/tools/python/xen/xend/server/DevController.py > --- a/tools/python/xen/xend/server/DevController.py > +++ b/tools/python/xen/xend/server/DevController.py > @@ -231,11 +231,21 @@ class DevController: > self.writeBackend(dev, ''state'', str(xenbusState[''Closing''])) > > if force: > - frontpath = self.frontendPath(dev) > - backpath = xstransact.Read(frontpath, "backend") > - if backpath: > - xstransact.Remove(backpath) > - xstransact.Remove(frontpath) > + try: > + frontpath = self.frontendPath(dev) > + t = xstransact() > + backpath = t.read("%s/backend" % frontpath) > + if backpath: > + t.remove(backpath) > + # tidy up empty directories > + while not t.list(backpath): > + t.remove(backpath) > + backpath = os.path.dirname(backpath) > + t.remove(frontpath) > + t.commit() > + except: > + t.abort() > + raise > > self.vm._removeVm("device/%s/%d" % (self.deviceClass, dev)) > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon wrote:> Here''s what I''m using to fix it in 3.1. Obviously too risky for 3.2 now, > and anyway, there''s a much more serious problem in 3.2 - entire /vm/ > entries are leaked on domU reboot! I haven''t got round to debugging that > one yet since it''s not present in 3.1, but it really should be fixed >I''m unable to reproduce the console leak on 3.2. I have tried rebooting managed PV domU''s several times with no orphaned console entries - using ''xm reboot'' and rebooting within the guest itself. I do however see the entire /vm/<uuid> nodes leaking and agree that this is a rather serious bug as it could eventually cause swap thrashing in dom0 [1]. E.g. after 3 reboots of PV domU xenstore contains vm faa7647e-142b-74c7-dc72-efd57fe6d0ef-1 = "" image = "(linux (kernel ) (args ''mem=512M '') (device_model /usr/lib64/xen/bin/qemu\..." ostype = "linux" ... faa7647e-142b-74c7-dc72-efd57fe6d0ef = "" image = "(linux (kernel ) (args ''mem=512M '') (device_model /usr/lib64/xen/bin/qemu\..." ostype = "linux" ... faa7647e-142b-74c7-dc72-efd57fe6d0ef-2 = "" image = "(linux (kernel ) (args ''mem=512M '') (device_model /usr/lib64/xen/bin/qemu\..." ostype = "linux" ... This leak is much worse than a single device leaking :-(. I''ll have a look but may be a day or two before I can get to it. Jim [1] http://lists.xensource.com/archives/html/xen-devel/2007-05/msg00641.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Jan 11, 2008 at 12:48:20PM -0700, Jim Fehlig wrote:> > Here''s what I''m using to fix it in 3.1. Obviously too risky for 3.2 now, > > and anyway, there''s a much more serious problem in 3.2 - entire /vm/ > > entries are leaked on domU reboot! I haven''t got round to debugging that > > one yet since it''s not present in 3.1, but it really should be fixed > > I''m unable to reproduce the console leak on 3.2. I have tried rebooting > managed PV domU''s several times with no orphaned console entries - using > ''xm reboot'' and rebooting within the guest itself.Hmm. Maybe this is a result of one of the other console changes since 3.1? I don''t see how 3.2 fixes this any more than 3.1 on a quick look. Anyway, the patch (once I''ve fixed it, thanks Keir) seems an obvious improvement even if it''s not needed for the console any more.> This leak is much worse than a single device leaking :-(. I''ll have a > look but may be a day or two before I can get to it.That''d be great... cheers, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jim Fehlig wrote:> John Levon wrote: > >> Here''s what I''m using to fix it in 3.1. Obviously too risky for 3.2 now, >> and anyway, there''s a much more serious problem in 3.2 - entire /vm/ >> entries are leaked on domU reboot! I haven''t got round to debugging that >> one yet since it''s not present in 3.1, but it really should be fixed >> >> > > I''m unable to reproduce the console leak on 3.2. I have tried rebooting > managed PV domU''s several times with no orphaned console entries - using > ''xm reboot'' and rebooting within the guest itself. > > I do however see the entire /vm/<uuid> nodes leaking and agree that this > is a rather serious bug as it could eventually cause swap thrashing in > dom0 [1]. E.g. after 3 reboots of PV domU xenstore contains > > vm > faa7647e-142b-74c7-dc72-efd57fe6d0ef-1 = "" > image = "(linux (kernel ) (args ''mem=512M '') (device_model > /usr/lib64/xen/bin/qemu\..." > ostype = "linux" > ... > faa7647e-142b-74c7-dc72-efd57fe6d0ef = "" > image = "(linux (kernel ) (args ''mem=512M '') (device_model > /usr/lib64/xen/bin/qemu\..." > ostype = "linux" > ... > faa7647e-142b-74c7-dc72-efd57fe6d0ef-2 = "" > image = "(linux (kernel ) (args ''mem=512M '') (device_model > /usr/lib64/xen/bin/qemu\..." > ostype = "linux" > ... > > This leak is much worse than a single device leaking :-(. I''ll have a > look but may be a day or two before I can get to it. > > Jim > > [1] http://lists.xensource.com/archives/html/xen-devel/2007-05/msg00641.html >After some investigation I have found that the /vm/<uuid> leak is caused by c/s 15957 (later modified by c/s 15967). Reverting these changesets removes the leak on 3.2 RC4. Further, when reverting c/s 15957 I am not seeing the issue it claims to fix - i.e. I am not losing the vif mac address when doing localhost migrations of PV or HVM domU''s. Keir - under what circumstances was vif mac address being lost? I am unable to reproduce that behavior on RC4 with 15967 and 15957 reverted. Regards, Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 14, 2008 at 12:27:10PM -0700, Jim Fehlig wrote:> After some investigation I have found that the /vm/<uuid> leak is caused > by c/s 15957 (later modified by c/s 15967). Reverting these changesets > removes the leak on 3.2 RC4. Further, when reverting c/s 15957 I am not > seeing the issue it claims to fix - i.e. I am not losing the vif mac > address when doing localhost migrations of PV or HVM domU''s. > > Keir - under what circumstances was vif mac address being lost? I am > unable to reproduce that behavior on RC4 with 15967 and 15957 reverted.I see the behaviour in 3.1 with HVM domains. If I live migrate twice, then the MAC address is lost, and a new random one is generated: $ grep spawning /tmp/xend.log | tail -3 [2008-01-10 23:55:01 101730] INFO (image:443) spawning device models: /usr/lib/xen/bin/qemu-dm [''/usr/lib/xen/bin/qemu-dm'', ''-d'', ''31'', ''-vcpus'', ''1'', ''-boot'', ''c'', ''-usb'', ''-k'', ''en-us'', ''-domain-name'', ''winxp'', ''-net'', ''nic,vlan=0,macaddr=00:16:3e:52:76:6d,model=rtl8139'', ''-net'', ''tap,vlan=0,bridge=xenbr0'', ''-vnc'', ''127.0.0.1:0'', ''-vncunused''] [2008-01-10 23:58:25 101730] INFO (image:443) spawning device models: /usr/lib/xen/bin/qemu-dm [''/usr/lib/xen/bin/qemu-dm'', ''-d'', ''32'', ''-vcpus'', ''1'', ''-boot'', ''c'', ''-usb'', ''-k'', ''en-us'', ''-domain-name'', ''winxp'', ''-net'', ''nic,vlan=0,macaddr=00:16:3e:52:76:6d,model=rtl8139'', ''-net'', ''tap,vlan=0,bridge=xenbr0'', ''-vnc'', ''127.0.0.1:0'', ''-vncunused'', ''-loadvm'', ''/tmp/xen.qemu-dm.32''] [2008-01-11 00:00:43 101730] INFO (image:443) spawning device models: /usr/lib/xen/bin/qemu-dm [''/usr/lib/xen/bin/qemu-dm'', ''-d'', ''33'', ''-vcpus'', ''1'', ''-boot'', ''c'', ''-usb'', ''-k'', ''en-us'', ''-domain-name'', ''winxp'', ''-net'', ''nic,vlan=0,macaddr=00:16:3e:0d:93:13,model=rtl8139'', ''-net'', ''tap,vlan=0,bridge=xenbr0'', ''-vnc'', ''127.0.0.1:0'', ''-vncunused'', ''-loadvm'', ''/tmp/xen.qemu-dm.33''] I don''t have to do anything "special", this happens every time on 3.1. I presume that the troublesome changeset fixes this exact problem. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/1/08 20:08, "John Levon" <levon@movementarian.org> wrote:> I see the behaviour in 3.1 with HVM domains. If I live migrate twice, > then the MAC address is lost, and a new random one is generated: > > I don''t have to do anything "special", this happens every time on 3.1. I > presume that the troublesome changeset fixes this exact problem.Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This aliases with the new domain''s device information (because they''re really the same vm) and so when the old domain is cleaned up the new domain loses information. Disambiguating with an extra level of indirection seemed the simplest fix for this. I''m not sure why this leads to xenstore leaks. When a domain is finally garbage collected in xend, perhaps we should delete its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable catch-all. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 14, 2008 at 09:29:38PM +0000, Keir Fraser wrote:> > I see the behaviour in 3.1 with HVM domains. If I live migrate twice, > > then the MAC address is lost, and a new random one is generated: > > > > I don''t have to do anything "special", this happens every time on 3.1. I > > presume that the troublesome changeset fixes this exact problem. > > Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This > aliases with the new domain''s device information (because they''re really the > same vm) and so when the old domain is cleaned up the new domain loses > information. Disambiguating with an extra level of indirection seemed the > simplest fix for this. I''m not sure why this leads to xenstore leaks. > > When a domain is finally garbage collected in xend, perhaps we should delete > its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable > catch-all.The lack of the latter explains the former - because each instance has a unique /vm path, there''s nothing that ever cleans up the older path contents (image etc.). Right? reards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/1/08 21:48, "John Levon" <levon@movementarian.org> wrote:>> Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This >> aliases with the new domain''s device information (because they''re really the >> same vm) and so when the old domain is cleaned up the new domain loses >> information. Disambiguating with an extra level of indirection seemed the >> simplest fix for this. I''m not sure why this leads to xenstore leaks. >> >> When a domain is finally garbage collected in xend, perhaps we should delete >> its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable >> catch-all. > > The lack of the latter explains the former - because each instance has a > unique /vm path, there''s nothing that ever cleans up the older path > contents (image etc.). Right?I don''t understand what you mean. There''s no code to delete the entire /vm path, regardless of whether the path is /vm/<uuid>/ or /vm/<uuid>-<number> (I''m pretty sure). And if there was, why would it be affected by the detail of what happens to be the domain''s /vm path anyway? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 14, 2008 at 09:54:50PM +0000, Keir Fraser wrote:> >> Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This > >> aliases with the new domain''s device information (because they''re really the > >> same vm) and so when the old domain is cleaned up the new domain loses > >> information. Disambiguating with an extra level of indirection seemed the > >> simplest fix for this. I''m not sure why this leads to xenstore leaks. > >> > >> When a domain is finally garbage collected in xend, perhaps we should delete > >> its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable > >> catch-all. > > > > The lack of the latter explains the former - because each instance has a > > unique /vm path, there''s nothing that ever cleans up the older path > > contents (image etc.). Right? > > I don''t understand what you mean. There''s no code to delete the entire /vm > path, regardless of whether the path is /vm/<uuid>/ or /vm/<uuid>-<number> > (I''m pretty sure).The old code re-used the same /vm/<uuid>/ path, so there was no leak. The new code creates a completely new /vm/<uuid>-<number> path, leaking the old one (/vm/<uuid>-<oldnumber>). Am I missing something obvious here? regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Jan 14, 2008 at 10:55:37PM +0000, John Levon wrote:> On Mon, Jan 14, 2008 at 09:54:50PM +0000, Keir Fraser wrote: > > > >> Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This > > >> aliases with the new domain''s device information (because they''re really the > > >> same vm) and so when the old domain is cleaned up the new domain loses > > >> information. Disambiguating with an extra level of indirection seemed the > > >> simplest fix for this. I''m not sure why this leads to xenstore leaks. > > >> > > >> When a domain is finally garbage collected in xend, perhaps we should delete > > >> its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable > > >> catch-all. > > > > > > The lack of the latter explains the former - because each instance has a > > > unique /vm path, there''s nothing that ever cleans up the older path > > > contents (image etc.). Right? > > > > I don''t understand what you mean. There''s no code to delete the entire /vm > > path, regardless of whether the path is /vm/<uuid>/ or /vm/<uuid>-<number> > > (I''m pretty sure). > > The old code re-used the same /vm/<uuid>/ path, so there was no leak. > The new code creates a completely new /vm/<uuid>-<number> path, leaking the > old one (/vm/<uuid>-<oldnumber>).Yes, I noticed that too - its rather peculiar - the idea of the /vm/ hierarchy was that it was persistent for any individual VM, across creation attempts. If we append <number> on the path each time it ceases to be persistent and we might as well just kill off /vm and use /local/<domid> for everything. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 14/1/08 20:08, "John Levon" <levon@movementarian.org> wrote: > > >> I see the behaviour in 3.1 with HVM domains. If I live migrate twice, >> then the MAC address is lost, and a new random one is generated: >> >> I don''t have to do anything "special", this happens every time on 3.1. I >> presume that the troublesome changeset fixes this exact problem. >> > > Yes, device controllers clean up by deleting /vm/<uuid>/path/to/device. This > aliases with the new domain''s device information (because they''re really the > same vm) and so when the old domain is cleaned up the new domain loses > information. Disambiguating with an extra level of indirection seemed the > simplest fix for this. I''m not sure why this leads to xenstore leaks. > > When a domain is finally garbage collected in xend, perhaps we should delete > its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable > catch-all. >Reverting changesets 15967 and 15957 in addition to the attached patch fixes the leak and allows multiple localhost migrations. I''m not sure what we get by nuking /vm/<uuid>/device/vif/<dev_num> anyway - other than the problems we''re seeing :-). vif appears to be the only device stored in the /vm/<uuid>/device path anyway. I will continue testing with this setup ... Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/1/08 22:55, "John Levon" <levon@movementarian.org> wrote:>> I don''t understand what you mean. There''s no code to delete the entire /vm >> path, regardless of whether the path is /vm/<uuid>/ or /vm/<uuid>-<number> >> (I''m pretty sure). > > The old code re-used the same /vm/<uuid>/ path, so there was no leak. > The new code creates a completely new /vm/<uuid>-<number> path, leaking the > old one (/vm/<uuid>-<oldnumber>). > > Am I missing something obvious here?It depends on your test case. If you save/restore the same VM repeatedly then in the old case you would see no leak. But youw ould still see a leak if you created/destroyed lots of different VMs. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/1/08 22:59, "Daniel P. Berrange" <berrange@redhat.com> wrote:>> The old code re-used the same /vm/<uuid>/ path, so there was no leak. >> The new code creates a completely new /vm/<uuid>-<number> path, leaking the >> old one (/vm/<uuid>-<oldnumber>). > > Yes, I noticed that too - its rather peculiar - the idea of the /vm/ > hierarchy was that it was persistent for any individual VM, across creation > attempts. If we append <number> on the path each time it ceases to be > persistent and we might as well just kill off /vm and use /local/<domid> > for everything.Sure. Nothing in xenstore is now supposed to persist across domain restarts/migrates/etc. xend stores info about managed domains in a separate database. All xenstore info is ephemeral. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/1/08 23:49, "Jim Fehlig" <jfehlig@novell.com> wrote:>> When a domain is finally garbage collected in xend, perhaps we should delete >> its entire /vm/<uuid>/<unique-number>? That would seem a nice and reasonable >> catch-all. >> > > Reverting changesets 15967 and 15957 in addition to the attached patch > fixes the leak and allows multiple localhost migrations. I''m not sure > what we get by nuking /vm/<uuid>/device/vif/<dev_num> anyway - other > than the problems we''re seeing :-). vif appears to be the only device > stored in the /vm/<uuid>/device path anyway. > > I will continue testing with this setup ...Isn''t having two domains (even from the same vm) pointing at the same /vm/ path a recipe for further bugs? Most of the lowlevel xend code doesn''t seem to understand the concept that domains can map to the same vm, and could hence tread on each others toes via the /vm/ path. If we revert the two patches, what happens when you create/destroy lots of domains all with different uuids? I expect the leak will still exist in that case. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jan 15, 2008 at 08:06:58AM +0000, Keir Fraser wrote:> >> I don''t understand what you mean. There''s no code to delete the entire /vm > >> path, regardless of whether the path is /vm/<uuid>/ or /vm/<uuid>-<number> > >> (I''m pretty sure). > > > > The old code re-used the same /vm/<uuid>/ path, so there was no leak. > > The new code creates a completely new /vm/<uuid>-<number> path, leaking the > > old one (/vm/<uuid>-<oldnumber>). > > > > Am I missing something obvious here? > > It depends on your test case. If you save/restore the same VM repeatedly > then in the old case you would see no leak. But youw ould still see a leak > if you created/destroyed lots of different VMs.I don''t think so - domain destruction already removes the /vm path (I''d confused myself here). I just verified this with our 3.1 bits. Your other email:> Sure. Nothing in xenstore is now supposed to persist across domain > restarts/migrates/etc. xend stores info about managed domains in a > separate database. All xenstore info is ephemeral.The VIF MAC is the perfect counter-example to this, is it not? It must remain the same across a domain''s existence, even if it''s randomly generated. How do you propose to fix the bug if you don''t have /vm ? (I''m not so sure that any of the rest of /vm is /really/ needed, but that depends on how xend behaves on restart) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 13:42, "John Levon" <levon@movementarian.org> wrote:>> Sure. Nothing in xenstore is now supposed to persist across domain >> restarts/migrates/etc. xend stores info about managed domains in a >> separate database. All xenstore info is ephemeral. > > The VIF MAC is the perfect counter-example to this, is it not? It must > remain the same across a domain''s existence, even if it''s randomly > generated. How do you propose to fix the bug if you don''t have /vm ? > (I''m not so sure that any of the rest of /vm is /really/ needed, > but that depends on how xend behaves on restart)Well, the VIF MAC is pickled on domain save or migration, and restored into the new /vm/ path on restore or at the migration target. It just so happens, in the localhost migration case on 3.1, that the restore path is the same as the original path. It must remain the same across a particular domain''s existence, but not for the VM''s existence. In that sense it is ephemeral. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 13:42, "John Levon" <levon@movementarian.org> wrote:>> It depends on your test case. If you save/restore the same VM repeatedly >> then in the old case you would see no leak. But youw ould still see a leak >> if you created/destroyed lots of different VMs. > > I don''t think so - domain destruction already removes the /vm path (I''d > confused myself here). I just verified this with our 3.1 bits.We could officially make /vm/ have the same lifetime as /local/domain, in which case it should die even on domain localhost migration, or we could take the patch that Jim suggests, and revert the other two. I don''t have a particularly strong opinion either way. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 13:57, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:>> I don''t think so - domain destruction already removes the /vm path (I''d >> confused myself here). I just verified this with our 3.1 bits. > > We could officially make /vm/ have the same lifetime as /local/domain, in > which case it should die even on domain localhost migration, or we could > take the patch that Jim suggests, and revert the other two. I don''t have a > particularly strong opinion either way.Actually if we attach/detach lots of virtual devices to a single domain, will that end up leaking xenstore nodes with Jim''s patch? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jan 15, 2008 at 01:59:53PM +0000, Keir Fraser wrote:> >> I don''t think so - domain destruction already removes the /vm path (I''d > >> confused myself here). I just verified this with our 3.1 bits. > > > > We could officially make /vm/ have the same lifetime as /local/domain, in > > which case it should die even on domain localhost migration, or we could > > take the patch that Jim suggests, and revert the other two. I don''t have a > > particularly strong opinion either way. > > Actually if we attach/detach lots of virtual devices to a single domain, > will that end up leaking xenstore nodes with Jim''s patch?I''ve become confused about exactly what changes Jim''s patch entails, since he''s backing stuff out too, but it does seem like it. Theoretically, if we made "xm create" domains have an XML file too, we wouldn''t ever need /vm right? Anyway, seems like we need a simple working fix for 3.1 series, even if there''s larger surgery done. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 14:31, "John Levon" <levon@movementarian.org> wrote:> Anyway, seems like we need a simple working fix for 3.1 series, even if > there''s larger surgery done.I checked your patch in for 3.1, which is probably sufficient there. Obviously 3.2 is in stable phase now also, so that''s a bit of an issue. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jan 15, 2008 at 02:37:11PM +0000, Keir Fraser wrote:> > Anyway, seems like we need a simple working fix for 3.1 series, even if > > there''s larger surgery done. > > I checked your patch in for 3.1, which is probably sufficient there.The fix I was referring to was the one for the MAC-lossage issue.> Obviously 3.2 is in stable phase now also, so that''s a bit of an issue.Well I would certainly not try to fix this for 3.2.0, it''s way too risky, and the current symptoms not bad enough to warrant a stopper IMO. regards, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 14:38, "John Levon" <levon@movementarian.org> wrote:>>> Anyway, seems like we need a simple working fix for 3.1 series, even if >>> there''s larger surgery done. >> >> I checked your patch in for 3.1, which is probably sufficient there. > > The fix I was referring to was the one for the MAC-lossage issue.I''d consider taking Jim''s patch to remove the path deletion, and swallow the (rather theoretical) xenstore leak across large numbers of device hotplugs. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 15/1/08 14:38, "John Levon" <levon@movementarian.org> wrote: > > >>>> Anyway, seems like we need a simple working fix for 3.1 series, even if >>>> there''s larger surgery done. >>>> >>> I checked your patch in for 3.1, which is probably sufficient there. >>> >> The fix I was referring to was the one for the MAC-lossage issue. >> > > I''d consider taking Jim''s patch to remove the path deletion, and swallow the > (rather theoretical) xenstore leak across large numbers of device hotplugs. >It''s reality, not just theory :-/. xm network-[attach|detach] does leak. block does not as nothing is written to /vm/uuid/device for block devices. Not sure about other devices, e.g. vtpm, security labels, etc. A quick grep through the sources reveals netif as the only dev controller invoking _writeVm(). Perhaps we should look at handling network and block devices in a similar fashion - eliminating the need to write any device info to this path? Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jan 15, 2008 at 08:59:49AM -0700, Jim Fehlig wrote:> Perhaps we should look at handling network and block devices in a > similar fashion - eliminating the need to write any device info to this > path?Seems like if anything else is needed for networking, it could be put in /local/domain/0/backend, if it''s not already? regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/1/08 16:11, "John Levon" <levon@movementarian.org> wrote:> On Tue, Jan 15, 2008 at 08:59:49AM -0700, Jim Fehlig wrote: > >> Perhaps we should look at handling network and block devices in a >> similar fashion - eliminating the need to write any device info to this >> path? > > Seems like if anything else is needed for networking, it could be put in > /local/domain/0/backend, if it''s not already?Configuration values that are managed by xend should not go in the backend or frontend /local/domain/ directories, unless they are shadow values for use of frontend/backend drivers only and xend maintains its own internal ''true'' value. This is because we are striving for a model (with isolated driver domains) where xend does not have to trust frontend or backend drivers to behave nicely. If we put config values like MAC address into the backend directory then they could be overwritten arbitrarily by the backend driver. I suppose maybe it''s a somewhat academic or theoretical problem, at least right now, but since decisions of this type are visible to any users of the xenbus driver interfaces, not just xend, it''d be nice to keep the interfaces via xenstore as clean and sane as possible. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:>> Reverting changesets 15967 and 15957 in addition to the attached patch >> fixes the leak and allows multiple localhost migrations. I''m not sure >> what we get by nuking /vm/<uuid>/device/vif/<dev_num> anyway - other >> than the problems we''re seeing :-). vif appears to be the only device >> stored in the /vm/<uuid>/device path anyway. >> >> I will continue testing with this setup ... >> > > Isn''t having two domains (even from the same vm) pointing at the same /vm/ > path a recipe for further bugs? Most of the lowlevel xend code doesn''t seem > to understand the concept that domains can map to the same vm, and could > hence tread on each others toes via the /vm/ path. > > If we revert the two patches, what happens when you create/destroy lots of > domains all with different uuids? I expect the leak will still exist in that > case. >Sorry for the delay. I''m not sure why you think that the leak would still exist with those changesets reverted. 15957 (and subsequently 15967) introduced the leak by creating a whole new /vm/<uuid>-<num> path, leaving the previous path orphaned. But I certainly don''t claim to be an expert on this code so perhaps I''m not understanding your concern. Nevertheless, I created/destroyed lots of domains on 3.2 with those changesets reverted and do not see the leak. However I wouldn''t expect so since each domain has a different uuid and hence a different /vm/<uuid> path, which is removed when the domain is destroyed. BTW, with those changesets /vm/<uuid> path is leaked on save/restore, reboot, and localhost migration. Perhaps the source domain in these operations should be removing its /vm path on destruction? Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/1/08 23:14, "Jim Fehlig" <jfehlig@novell.com> wrote:> Sorry for the delay. I''m not sure why you think that the leak would > still exist with those changesets reverted. 15957 (and subsequently > 15967) introduced the leak by creating a whole new /vm/<uuid>-<num> > path, leaving the previous path orphaned. But I certainly don''t claim > to be an expert on this code so perhaps I''m not understanding your concern. > > Nevertheless, I created/destroyed lots of domains on 3.2 with those > changesets reverted and do not see the leak. However I wouldn''t expect > so since each domain has a different uuid and hence a different > /vm/<uuid> path, which is removed when the domain is destroyed. > > BTW, with those changesets /vm/<uuid> path is leaked on save/restore, > reboot, and localhost migration. Perhaps the source domain in these > operations should be removing its /vm path on destruction?Okay, so with the two patches reverted plus your patch, there seem to be no leaks, and MAC addresses are not lost across localhost relocations? I guess that''s the way to go, if so, and I''ll commit to 3.3 and 3.2 trees. I suppose your patch should also be applicable to 3.1? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 18/1/08 23:14, "Jim Fehlig" <jfehlig@novell.com> wrote: > > >> Sorry for the delay. I''m not sure why you think that the leak would >> still exist with those changesets reverted. 15957 (and subsequently >> 15967) introduced the leak by creating a whole new /vm/<uuid>-<num> >> path, leaving the previous path orphaned. But I certainly don''t claim >> to be an expert on this code so perhaps I''m not understanding your concern. >> >> Nevertheless, I created/destroyed lots of domains on 3.2 with those >> changesets reverted and do not see the leak. However I wouldn''t expect >> so since each domain has a different uuid and hence a different >> /vm/<uuid> path, which is removed when the domain is destroyed. >> >> BTW, with those changesets /vm/<uuid> path is leaked on save/restore, >> reboot, and localhost migration. Perhaps the source domain in these >> operations should be removing its /vm path on destruction? >> > > Okay, so with the two patches reverted plus your patch, there seem to be no > leaks, and MAC addresses are not lost across localhost relocations?Correct. But we do leak, as discussed earlier, /vm/<uuid>/device/vif when attaching/detaching vifs and I notice the /vm/<uuid> path remains after a save operation.> I guess > that''s the way to go, if so, and I''ll commit to 3.3 and 3.2 trees. >Hmm, don''t know about replacing one lead with another - although to be fair it is much smaller :-). What about my other suggestion of source domain of these operations nuking its /vm/<uuid> path on destruction? I get the feeling there is a race in the localhost migration path that prevented such an approach, hence c/s 15957. I could look into this though if you like, but will be out of town for the next few days and won''t be able to investigate until Tuesday.> I suppose your patch should also be applicable to 3.1? >It appears so by glancing at the code, but I don''t have a 3.1 system handy atm to verify. Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/1/08 23:47, "Jim Fehlig" <jfehlig@novell.com> wrote:>> I guess >> that''s the way to go, if so, and I''ll commit to 3.3 and 3.2 trees. >> > > Hmm, don''t know about replacing one lead with another - although to be > fair it is much smaller :-). What about my other suggestion of source > domain of these operations nuking its /vm/<uuid> path on destruction? I > get the feeling there is a race in the localhost migration path that > prevented such an approach, hence c/s 15957. I could look into this > though if you like, but will be out of town for the next few days and > won''t be able to investigate until Tuesday.It would be nice to reference count, or otherwise track the vm<->domain mapping, of /vm, so that we could somehow naturally avoid destroying the /vm path on localhost migration yet we could destroy it on most saves and relocates. I don''t know how hard this would be. Could we just make all saves and relocates destroy the /vm/<uuid>-<number> path? I can imagine I missed adding some deletions of that path when I introduced the <number> extra level of discrimination, but would it be hard to find the one or two places it might be needed and add it? Compared with the other options we have? I''m afraid I''m really not sure -- but I suspect at this point you know xend a bit better than I do. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 18/1/08 23:47, "Jim Fehlig" <jfehlig@novell.com> wrote: > > >>> I guess >>> that''s the way to go, if so, and I''ll commit to 3.3 and 3.2 trees. >>> >>> >> Hmm, don''t know about replacing one lead with another - although to be >> fair it is much smaller :-). What about my other suggestion of source >> domain of these operations nuking its /vm/<uuid> path on destruction? I >> get the feeling there is a race in the localhost migration path that >> prevented such an approach, hence c/s 15957. I could look into this >> though if you like, but will be out of town for the next few days and >> won''t be able to investigate until Tuesday. >> > > It would be nice to reference count, or otherwise track the vm<->domain > mapping, of /vm, so that we could somehow naturally avoid destroying the /vm > path on localhost migration yet we could destroy it on most saves and > relocates. I don''t know how hard this would be. > > Could we just make all saves and relocates destroy the /vm/<uuid>-<number> > path? I can imagine I missed adding some deletions of that path when I > introduced the <number> extra level of discrimination, but would it be hard > to find the one or two places it might be needed and add it? Compared with > the other options we have? >Ok, I took this path as it seemed rather straight forward after fixing other issues. I have done the following testing with attached patch and see no leaks. Note: the ''crashed power state'' and ''rename-restart'' patches I submitted earlier were applied on the test rig as well. - Multiple reboot from within guest and ''xm reboot'' - Multiple localhost migrations (followed by reboots) - Multiple save/restore - Multiple suspend/resume (managed domains) - Multiple checkpoints - Multiple shutdows from with guest and ''xm shut'' - Crash guest with ''on_crash=preserve'', followed by ''xm dump-core'' and ''xm destroy'' of preserved vm - Crash guest with ''on_crash=rename-restart'', followed by ''xm dump-core'' and ''xm destroy'' of renamed vm - Crash guest with ''on_crash=restart'' - Multiple ''xm network-[attach|detach] All of these tests were performed on PV as well as HVM guests, where it makes sense. Cheers, Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Which of your three xend patches ought to be ported to 3.2 (and 3.1?) branches? -- Keir On 25/1/08 02:00, "Jim Fehlig" <jfehlig@novell.com> wrote:> Keir Fraser wrote: >> On 18/1/08 23:47, "Jim Fehlig" <jfehlig@novell.com> wrote: >> >> >>>> I guess >>>> that''s the way to go, if so, and I''ll commit to 3.3 and 3.2 trees. >>>> >>>> >>> Hmm, don''t know about replacing one lead with another - although to be >>> fair it is much smaller :-). What about my other suggestion of source >>> domain of these operations nuking its /vm/<uuid> path on destruction? I >>> get the feeling there is a race in the localhost migration path that >>> prevented such an approach, hence c/s 15957. I could look into this >>> though if you like, but will be out of town for the next few days and >>> won''t be able to investigate until Tuesday. >>> >> >> It would be nice to reference count, or otherwise track the vm<->domain >> mapping, of /vm, so that we could somehow naturally avoid destroying the /vm >> path on localhost migration yet we could destroy it on most saves and >> relocates. I don''t know how hard this would be. >> >> Could we just make all saves and relocates destroy the /vm/<uuid>-<number> >> path? I can imagine I missed adding some deletions of that path when I >> introduced the <number> extra level of discrimination, but would it be hard >> to find the one or two places it might be needed and add it? Compared with >> the other options we have? >> > > Ok, I took this path as it seemed rather straight forward after fixing > other issues. > > I have done the following testing with attached patch and see no leaks. > Note: the ''crashed power state'' and ''rename-restart'' patches I > submitted earlier were applied on the test rig as well. > > - Multiple reboot from within guest and ''xm reboot'' > - Multiple localhost migrations (followed by reboots) > - Multiple save/restore > - Multiple suspend/resume (managed domains) > - Multiple checkpoints > - Multiple shutdows from with guest and ''xm shut'' > - Crash guest with ''on_crash=preserve'', followed by ''xm dump-core'' and > ''xm destroy'' of preserved vm > - Crash guest with ''on_crash=rename-restart'', followed by ''xm dump-core'' > and ''xm destroy'' of renamed vm > - Crash guest with ''on_crash=restart'' > - Multiple ''xm network-[attach|detach] > > All of these tests were performed on PV as well as HVM guests, where it > makes sense. > > Cheers, > Jim > > # HG changeset patch > # User Jim Fehlig <jfehlig@novell.com> > # Date 1201226101 25200 > # Node ID ee07e51eefb8b75f8084d2e8b5c2bff04b95ae5e > # Parent 605d470326da347552e17e9b1cc0466219f16dc2 > Fix leaking of /vm/<uuid> path in xenstore on various VM lifecycle events. > > Signed-off-by: Jim Fehlig <jfehlig@novell.com> > > diff -r 605d470326da -r ee07e51eefb8 tools/python/xen/xend/XendCheckpoint.py > --- a/tools/python/xen/xend/XendCheckpoint.py Thu Jan 24 16:33:42 2008 -0700 > +++ b/tools/python/xen/xend/XendCheckpoint.py Thu Jan 24 18:55:01 2008 -0700 > @@ -125,10 +125,10 @@ def save(fd, dominfo, network, live, dst > if checkpoint: > dominfo.resumeDomain() > else: > - dominfo.destroyDomain() > + dominfo.destroy() > dominfo.testDeviceComplete() > try: > - dominfo.setName(domain_name) > + dominfo.setName(domain_name, False) > except VmError: > # Ignore this. The name conflict (hopefully) arises because we > # are doing localhost migration; if we are doing a suspend of a > diff -r 605d470326da -r ee07e51eefb8 tools/python/xen/xend/XendDomainInfo.py > --- a/tools/python/xen/xend/XendDomainInfo.py Thu Jan 24 16:33:42 2008 -0700 > +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Jan 24 18:55:01 2008 -0700 > @@ -1125,10 +1125,11 @@ class XendDomainInfo: > def getDomid(self): > return self.domid > > - def setName(self, name): > + def setName(self, name, to_store = True): > self._checkName(name) > self.info[''name_label''] = name > - self.storeVm("name", name) > + if to_store: > + self.storeVm("name", name) > > def getName(self): > return self.info[''name_label''] > @@ -1399,7 +1400,7 @@ class XendDomainInfo: > new_dom_info = self._preserveForRestart() > else: > self._unwatchVm() > - self.destroyDomain() > + self.destroy() > > # new_dom''s VM will be the same as this domain''s VM, except where > # the rename flag has instructed us to call preserveForRestart. > @@ -1413,9 +1414,6 @@ class XendDomainInfo: > new_dom_info) > new_dom.waitForDevices() > new_dom.unpause() > - rst_cnt = self._readVm(''xend/restart_count'') > - rst_cnt = int(rst_cnt) + 1 > - self._writeVm(''xend/restart_count'', str(rst_cnt)) > new_dom._removeVm(RESTART_IN_PROGRESS) > except: > if new_dom: > @@ -1441,13 +1439,19 @@ class XendDomainInfo: > new_name, new_uuid) > self._unwatchVm() > self._releaseDevices() > + # Remove existing vm node in xenstore > + self._removeVm() > new_dom_info = self.info.copy() > new_dom_info[''name_label''] = self.info[''name_label''] > new_dom_info[''uuid''] = self.info[''uuid''] > self.info[''name_label''] = new_name > self.info[''uuid''] = new_uuid > self.vmpath = XS_VMROOT + new_uuid > + # Write out new vm node to xenstore > self._storeVmDetails() > + rst_cnt = self._readVm(''xend/restart_count'') > + rst_cnt = int(rst_cnt) + 1 > + self._writeVm(''xend/restart_count'', str(rst_cnt)) > self._preserve() > return new_dom_info >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Jan 25, 2008 at 08:28:46AM -0700, Jim Fehlig wrote:> > Which of your three xend patches ought to be ported to 3.2 (and 3.1?) > > branches? > > All three for 3.2. In fact, all of the testing was done on 3.2 branch > (they apply cleanly there). Not sure about 3.1 as I haven''t played with > it for several months.I''ve lost track a bit. 3.1 /does/ need to fix the "MAC address is lost" problem but currently /doesn''t/ have the "/vm/blah is leaked" problems. If someone can translate that into a patch for me to backport I can try it out on our bits. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon wrote:> On Fri, Jan 25, 2008 at 08:28:46AM -0700, Jim Fehlig wrote: > > >>> Which of your three xend patches ought to be ported to 3.2 (and 3.1?) >>> branches? >>> >> All three for 3.2. In fact, all of the testing was done on 3.2 branch >> (they apply cleanly there). Not sure about 3.1 as I haven''t played with >> it for several months. >> > > I''ve lost track a bit. 3.1 /does/ need to fix the "MAC address is lost" > problem but currently /doesn''t/ have the "/vm/blah is leaked" problems. > > If someone can translate that into a patch for me to backport I can try > it out on our bits. >Yuk. Changesets 15957 (and subsequent adjustment via c/s 15967) fix the "MAC address is lost problem" but introduce the /vm/blah leak. You could try the last patch I submitted on this thread to fix that. So 15957, 15967, and my proposed patch should do the trick. Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel