After a hvm+shadow domain dies (either clean shutdown or merciless destroy), the domain is left in a zombie state with 1 (one) page left dangling with a single reference. (XEN) General information for domain 1: (XEN) refcnt=1 dying=2 pause_count=1 (XEN) nr_pages=1 xenheap_pages=0 shared_pages=0 paged_pages=0 dirty_cpus={} max_pages=524544 (XEN) handle=deadbeef-dead-beef-dead-beef00000001 vm_assist=00000000 (XEN) paging assistance: shadow refcounts translate external (XEN) Rangesets belonging to domain 1: (XEN) I/O Ports { } (XEN) Interrupts { } (XEN) I/O Memory { } (XEN) Memory pages belonging to domain 1: (XEN) DomPage 000000000010698e: caf=00000001, taf=7400000000000000 (XEN) PoD entries=0 cachesize=0 (XEN) VCPU information and callbacks for domain 1: (XEN) VCPU0: CPU0 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={0-3} (XEN) pause_count=1 pause_flags=0 (XEN) paging assistance: shadowed 4-on-4 (XEN) No periodic timer If add a considerable amount of synchronous printk''s, sometimes the domain is not left zombie. There seems to be a race going on here. Due to the type information of the page, I believe this is a page that has been shadowed with a writable map. I verified the page is not any of the helper rings (qemu, buffered qemu, store, console) that may get external writeable references. This happens on win7 guest with or without pv drivers. It happens with or without shadow optimizations (SHOPT defines). It happens with or without synchronized p2m lookups (patches just posted). Hopefully the shadow masters have a better understanding on how to proceed from here on. Thanks, Andres
On Fri, Apr 13, 2012 at 9:19 AM, Andres Lagar-Cavilla <andres@lagarcavilla.org> wrote:> After a hvm+shadow domain dies (either clean shutdown or merciless > destroy), the domain is left in a zombie state with 1 (one) page left > dangling with a single reference.[...]> (XEN) paging assistance: shadowed 4-on-4[...]> This happens on win7 guest with or without pv drivers. It happens with or > without shadow optimizations (SHOPT defines). It happens with or without > synchronized p2m lookups (patches just posted).Does it happens only in 64bit guests? Thanks, Gianluca
> On Fri, Apr 13, 2012 at 9:19 AM, Andres Lagar-Cavilla > <andres@lagarcavilla.org> wrote: >> After a hvm+shadow domain dies (either clean shutdown or merciless >> destroy), the domain is left in a zombie state with 1 (one) page left >> dangling with a single reference. > > [...] > >> (XEN) paging assistance: shadowed 4-on-4 > > [...] > >> This happens on win7 guest with or without pv drivers. It happens with >> or >> without shadow optimizations (SHOPT defines). It happens with or without >> synchronized p2m lookups (patches just posted). > > Does it happens only in 64bit guests?Haven''t tried 32 bit (w/ wo/ PAE) guests. Andres> > Thanks, > Gianluca >
At 09:19 -0700 on 13 Apr (1334308772), Andres Lagar-Cavilla wrote:> After a hvm+shadow domain dies (either clean shutdown or merciless > destroy), the domain is left in a zombie state with 1 (one) page left > dangling with a single reference.The reference is to the top-level pagetable that was pointed to by CR3 when the domain was killed. This bug came in with: changeset: 23142:f5e8d152a565 user: Jan Beulich <jbeulich@novell.com> date: Tue Apr 05 13:01:25 2011 +0100 description: x86: split struct vcpu where HVM domains no longer have vcpu_destroy_pagetables(v) called on their VCPUs as they die. Proposed fix attached. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
> At 09:19 -0700 on 13 Apr (1334308772), Andres Lagar-Cavilla wrote: >> After a hvm+shadow domain dies (either clean shutdown or merciless >> destroy), the domain is left in a zombie state with 1 (one) page left >> dangling with a single reference. > > The reference is to the top-level pagetable that was pointed to by CR3 > when the domain was killed. This bug came in with: > > changeset: 23142:f5e8d152a565 > user: Jan Beulich <jbeulich@novell.com> > date: Tue Apr 05 13:01:25 2011 +0100 > description: x86: split struct vcpu > > where HVM domains no longer have vcpu_destroy_pagetables(v) called on > their VCPUs as they die. Proposed fix attached. > > Cheers,Looks good. Thanks for tracing that down. Ack from my end. Thanks Andres> > Tim. >
>>> On 19.04.12 at 19:08, Tim Deegan <tim@xen.org> wrote: > At 09:19 -0700 on 13 Apr (1334308772), Andres Lagar-Cavilla wrote: >> After a hvm+shadow domain dies (either clean shutdown or merciless >> destroy), the domain is left in a zombie state with 1 (one) page left >> dangling with a single reference. > > The reference is to the top-level pagetable that was pointed to by CR3 > when the domain was killed. This bug came in with: > > changeset: 23142:f5e8d152a565 > user: Jan Beulich <jbeulich@novell.com> > date: Tue Apr 05 13:01:25 2011 +0100 > description: x86: split struct vcpu > > where HVM domains no longer have vcpu_destroy_pagetables(v) called on > their VCPUs as they die. Proposed fix attached.Acked-by: Jan Beulich <jbeulich@suse.com> I''m sorry for that. Given that this had been quite some time back, I can only guess that I got misguided by the fact that arch_vcpu_reset() calls this for PV only (legitimately, i.e. not causing any leak). Jan
At 08:59 +0100 on 20 Apr (1334912349), Jan Beulich wrote:> >>> On 19.04.12 at 19:08, Tim Deegan <tim@xen.org> wrote: > > At 09:19 -0700 on 13 Apr (1334308772), Andres Lagar-Cavilla wrote: > >> After a hvm+shadow domain dies (either clean shutdown or merciless > >> destroy), the domain is left in a zombie state with 1 (one) page left > >> dangling with a single reference. > > > > The reference is to the top-level pagetable that was pointed to by CR3 > > when the domain was killed. This bug came in with: > > > > changeset: 23142:f5e8d152a565 > > user: Jan Beulich <jbeulich@novell.com> > > date: Tue Apr 05 13:01:25 2011 +0100 > > description: x86: split struct vcpu > > > > where HVM domains no longer have vcpu_destroy_pagetables(v) called on > > their VCPUs as they die. Proposed fix attached. > > Acked-by: Jan Beulich <jbeulich@suse.com> > > I''m sorry for that. Given that this had been quite some time back, I > can only guess that I got misguided by the fact that > arch_vcpu_reset() calls this for PV only (legitimately, i.e. not causing > any leak).Yeah, it''s not exactly clear. Maybe after 4.2 I''ll look at making it more uniform. Cheers, Tim.
On Thu, 2012-04-19 at 18:08 +0100, Tim Deegan wrote:> At 09:19 -0700 on 13 Apr (1334308772), Andres Lagar-Cavilla wrote: > > After a hvm+shadow domain dies (either clean shutdown or merciless > > destroy), the domain is left in a zombie state with 1 (one) page left > > dangling with a single reference. > > The reference is to the top-level pagetable that was pointed to by CR3 > when the domain was killed. This bug came in with: > > changeset: 23142:f5e8d152a565 > user: Jan Beulich <jbeulich@novell.com> > date: Tue Apr 05 13:01:25 2011 +0100 > description: x86: split struct vcpu > > where HVM domains no longer have vcpu_destroy_pagetables(v) called on > their VCPUs as they die. Proposed fix attached.FTR this fixes the zombie domain issue I''d been seeing, AFAICT. Thanks! Ian.