Tim Deegan
2010-Jul-23 13:49 UTC
[Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
There are a few places in Xen where we walk a domain''s page lists without holding the page_alloc lock. They race with updates to the page lists, which are normally rare but can be quite common under PoD when the domain is close to its memory limit and the PoD reclaimer is busy. This patch protects those places by taking the page_alloc lock. I think this is OK for the two debug-key printouts - they don''t run from irq context and look deadlock-free. The tboot change seems safe too unless tboot shutdown functions are called from irq context or with the page_alloc lock held. The p2m one is the scariest but there are already code paths in PoD that take the page_alloc lock with the p2m lock held so it''s no worse than existing code. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> diff -r e8dbc1262f52 xen/arch/x86/domain.c --- a/xen/arch/x86/domain.c Wed Jul 21 09:02:10 2010 +0100 +++ b/xen/arch/x86/domain.c Fri Jul 23 14:33:22 2010 +0100 @@ -139,12 +139,14 @@ void dump_pageframe_info(struct domain * } else { + spin_lock(&d->page_alloc_lock); page_list_for_each ( page, &d->page_list ) { printk(" DomPage %p: caf=%08lx, taf=%" PRtype_info "\n", _p(page_to_mfn(page)), page->count_info, page->u.inuse.type_info); } + spin_unlock(&d->page_alloc_lock); } if ( is_hvm_domain(d) ) @@ -152,12 +154,14 @@ void dump_pageframe_info(struct domain * p2m_pod_dump_data(d); } + spin_lock(&d->page_alloc_lock); page_list_for_each ( page, &d->xenpage_list ) { printk(" XenPage %p: caf=%08lx, taf=%" PRtype_info "\n", _p(page_to_mfn(page)), page->count_info, page->u.inuse.type_info); } + spin_unlock(&d->page_alloc_lock); } struct domain *alloc_domain_struct(void) diff -r e8dbc1262f52 xen/arch/x86/mm/p2m.c --- a/xen/arch/x86/mm/p2m.c Wed Jul 21 09:02:10 2010 +0100 +++ b/xen/arch/x86/mm/p2m.c Fri Jul 23 14:33:22 2010 +0100 @@ -1833,6 +1833,7 @@ int p2m_alloc_table(struct domain *d, goto error; /* Copy all existing mappings from the page list and m2p */ + spin_lock(&d->page_alloc_lock); page_list_for_each(page, &d->page_list) { mfn = page_to_mfn(page); @@ -1848,13 +1849,16 @@ int p2m_alloc_table(struct domain *d, #endif && gfn != INVALID_M2P_ENTRY && !set_p2m_entry(d, gfn, mfn, 0, p2m_ram_rw) ) - goto error; + goto error_unlock; } + spin_unlock(&d->page_alloc_lock); P2M_PRINTK("p2m table initialised (%u pages)\n", page_count); p2m_unlock(p2m); return 0; +error_unlock: + spin_unlock(&d->page_alloc_lock); error: P2M_PRINTK("failed to initialize p2m table, gfn=%05lx, mfn=%" PRI_mfn "\n", gfn, mfn_x(mfn)); diff -r e8dbc1262f52 xen/arch/x86/numa.c --- a/xen/arch/x86/numa.c Wed Jul 21 09:02:10 2010 +0100 +++ b/xen/arch/x86/numa.c Fri Jul 23 14:33:22 2010 +0100 @@ -385,11 +385,13 @@ static void dump_numa(unsigned char key) for_each_online_node(i) page_num_node[i] = 0; + spin_lock(&d->page_alloc_lock); page_list_for_each(page, &d->page_list) { i = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT); page_num_node[i]++; } + spin_unlock(&d->page_alloc_lock); for_each_online_node(i) printk(" Node %u: %u\n", i, page_num_node[i]); diff -r e8dbc1262f52 xen/arch/x86/tboot.c --- a/xen/arch/x86/tboot.c Wed Jul 21 09:02:10 2010 +0100 +++ b/xen/arch/x86/tboot.c Fri Jul 23 14:33:22 2010 +0100 @@ -211,12 +211,14 @@ static void tboot_gen_domain_integrity(c continue; printk("MACing Domain %u\n", d->domain_id); + spin_lock(&d->page_alloc_lock); page_list_for_each(page, &d->page_list) { void *pg = __map_domain_page(page); vmac_update(pg, PAGE_SIZE, &ctx); unmap_domain_page(pg); } + spin_unlock(&d->page_alloc_lock); if ( !is_idle_domain(d) ) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Jul-23 13:55 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
At 14:49 +0100 on 23 Jul (1279896553), Tim Deegan wrote:> There are a few places in Xen where we walk a domain''s page lists > without holding the page_alloc lock. They race with updates to the page > lists, which are normally rare but can be quite common under PoD when > the domain is close to its memory limit and the PoD reclaimer is busy. > This patch protects those places by taking the page_alloc lock.I should say that the other place I found is in construct_dom0(), which I left because it (a) can''t really race with allocations and (b) calls process_pending_softirqs() within the page_list_for_each(). Tim.> I think this is OK for the two debug-key printouts - they don''t run from > irq context and look deadlock-free. The tboot change seems safe too > unless tboot shutdown functions are called from irq context or with the > page_alloc lock held. The p2m one is the scariest but there are already > code paths in PoD that take the page_alloc lock with the p2m lock held > so it''s no worse than existing code. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > diff -r e8dbc1262f52 xen/arch/x86/domain.c > --- a/xen/arch/x86/domain.c Wed Jul 21 09:02:10 2010 +0100 > +++ b/xen/arch/x86/domain.c Fri Jul 23 14:33:22 2010 +0100 > @@ -139,12 +139,14 @@ void dump_pageframe_info(struct domain * > } > else > { > + spin_lock(&d->page_alloc_lock); > page_list_for_each ( page, &d->page_list ) > { > printk(" DomPage %p: caf=%08lx, taf=%" PRtype_info "\n", > _p(page_to_mfn(page)), > page->count_info, page->u.inuse.type_info); > } > + spin_unlock(&d->page_alloc_lock); > } > > if ( is_hvm_domain(d) ) > @@ -152,12 +154,14 @@ void dump_pageframe_info(struct domain * > p2m_pod_dump_data(d); > } > > + spin_lock(&d->page_alloc_lock); > page_list_for_each ( page, &d->xenpage_list ) > { > printk(" XenPage %p: caf=%08lx, taf=%" PRtype_info "\n", > _p(page_to_mfn(page)), > page->count_info, page->u.inuse.type_info); > } > + spin_unlock(&d->page_alloc_lock); > } > > struct domain *alloc_domain_struct(void) > diff -r e8dbc1262f52 xen/arch/x86/mm/p2m.c > --- a/xen/arch/x86/mm/p2m.c Wed Jul 21 09:02:10 2010 +0100 > +++ b/xen/arch/x86/mm/p2m.c Fri Jul 23 14:33:22 2010 +0100 > @@ -1833,6 +1833,7 @@ int p2m_alloc_table(struct domain *d, > goto error; > > /* Copy all existing mappings from the page list and m2p */ > + spin_lock(&d->page_alloc_lock); > page_list_for_each(page, &d->page_list) > { > mfn = page_to_mfn(page); > @@ -1848,13 +1849,16 @@ int p2m_alloc_table(struct domain *d, > #endif > && gfn != INVALID_M2P_ENTRY > && !set_p2m_entry(d, gfn, mfn, 0, p2m_ram_rw) ) > - goto error; > + goto error_unlock; > } > + spin_unlock(&d->page_alloc_lock); > > P2M_PRINTK("p2m table initialised (%u pages)\n", page_count); > p2m_unlock(p2m); > return 0; > > +error_unlock: > + spin_unlock(&d->page_alloc_lock); > error: > P2M_PRINTK("failed to initialize p2m table, gfn=%05lx, mfn=%" > PRI_mfn "\n", gfn, mfn_x(mfn)); > diff -r e8dbc1262f52 xen/arch/x86/numa.c > --- a/xen/arch/x86/numa.c Wed Jul 21 09:02:10 2010 +0100 > +++ b/xen/arch/x86/numa.c Fri Jul 23 14:33:22 2010 +0100 > @@ -385,11 +385,13 @@ static void dump_numa(unsigned char key) > for_each_online_node(i) > page_num_node[i] = 0; > > + spin_lock(&d->page_alloc_lock); > page_list_for_each(page, &d->page_list) > { > i = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT); > page_num_node[i]++; > } > + spin_unlock(&d->page_alloc_lock); > > for_each_online_node(i) > printk(" Node %u: %u\n", i, page_num_node[i]); > diff -r e8dbc1262f52 xen/arch/x86/tboot.c > --- a/xen/arch/x86/tboot.c Wed Jul 21 09:02:10 2010 +0100 > +++ b/xen/arch/x86/tboot.c Fri Jul 23 14:33:22 2010 +0100 > @@ -211,12 +211,14 @@ static void tboot_gen_domain_integrity(c > continue; > printk("MACing Domain %u\n", d->domain_id); > > + spin_lock(&d->page_alloc_lock); > page_list_for_each(page, &d->page_list) > { > void *pg = __map_domain_page(page); > vmac_update(pg, PAGE_SIZE, &ctx); > unmap_domain_page(pg); > } > + spin_unlock(&d->page_alloc_lock); > > if ( !is_idle_domain(d) ) > { > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-12 15:09 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
>>> On 23.07.10 at 15:49, Tim Deegan <Tim.Deegan@citrix.com> wrote: > There are a few places in Xen where we walk a domain''s page lists > without holding the page_alloc lock. They race with updates to the page > lists, which are normally rare but can be quite common under PoD when > the domain is close to its memory limit and the PoD reclaimer is busy. > This patch protects those places by taking the page_alloc lock. > > I think this is OK for the two debug-key printouts - they don''t run from > irq context and look deadlock-free. The tboot change seems safe tooWhile the comment says the patch would leave debug key printouts alone, ...> unless tboot shutdown functions are called from irq context or with the > page_alloc lock held. The p2m one is the scariest but there are already > code paths in PoD that take the page_alloc lock with the p2m lock held > so it''s no worse than existing code. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > diff -r e8dbc1262f52 xen/arch/x86/domain.c > --- a/xen/arch/x86/domain.c Wed Jul 21 09:02:10 2010 +0100 > +++ b/xen/arch/x86/domain.c Fri Jul 23 14:33:22 2010 +0100 > @@ -139,12 +139,14 @@ void dump_pageframe_info(struct domain *... the actual patch still touches a respective function. It would seem to me that this part ought to be reverted.> } > else > { > + spin_lock(&d->page_alloc_lock); > page_list_for_each ( page, &d->page_list ) > { > printk(" DomPage %p: caf=%08lx, taf=%" PRtype_info "\n", > _p(page_to_mfn(page)), > page->count_info, page->u.inuse.type_info); > } > + spin_unlock(&d->page_alloc_lock); > } > > if ( is_hvm_domain(d) ) > @@ -152,12 +154,14 @@ void dump_pageframe_info(struct domain * > p2m_pod_dump_data(d); > } > > + spin_lock(&d->page_alloc_lock); > page_list_for_each ( page, &d->xenpage_list ) > { > printk(" XenPage %p: caf=%08lx, taf=%" PRtype_info "\n", > _p(page_to_mfn(page)), > page->count_info, page->u.inuse.type_info); > } > + spin_unlock(&d->page_alloc_lock); > } > > struct domain *alloc_domain_struct(void)Sorry for not noticing this earlier. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Aug-12 16:37 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
At 16:09 +0100 on 12 Aug (1281629364), Jan Beulich wrote:> >>> On 23.07.10 at 15:49, Tim Deegan <Tim.Deegan@citrix.com> wrote: > > There are a few places in Xen where we walk a domain''s page lists > > without holding the page_alloc lock. They race with updates to the page > > lists, which are normally rare but can be quite common under PoD when > > the domain is close to its memory limit and the PoD reclaimer is busy. > > This patch protects those places by taking the page_alloc lock. > > > > I think this is OK for the two debug-key printouts - they don''t run from > > irq context and look deadlock-free. The tboot change seems safe too > > While the comment says the patch would leave debug key printouts > alone, ...Sorry, my intention was to say that changes to the debug-key printouts are safe, not that they didn''t require changes. The debug-key printouts (in particular the NUMA one) are where I actually hit this bug on a running system. Tim.> > unless tboot shutdown functions are called from irq context or with the > > page_alloc lock held. The p2m one is the scariest but there are already > > code paths in PoD that take the page_alloc lock with the p2m lock held > > so it''s no worse than existing code. > > > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > > > > diff -r e8dbc1262f52 xen/arch/x86/domain.c > > --- a/xen/arch/x86/domain.c Wed Jul 21 09:02:10 2010 +0100 > > +++ b/xen/arch/x86/domain.c Fri Jul 23 14:33:22 2010 +0100 > > @@ -139,12 +139,14 @@ void dump_pageframe_info(struct domain * > > ... the actual patch still touches a respective function. It would seem > to me that this part ought to be reverted. > > > } > > else > > { > > + spin_lock(&d->page_alloc_lock); > > page_list_for_each ( page, &d->page_list ) > > { > > printk(" DomPage %p: caf=%08lx, taf=%" PRtype_info "\n", > > _p(page_to_mfn(page)), > > page->count_info, page->u.inuse.type_info); > > } > > + spin_unlock(&d->page_alloc_lock); > > } > > > > if ( is_hvm_domain(d) ) > > @@ -152,12 +154,14 @@ void dump_pageframe_info(struct domain * > > p2m_pod_dump_data(d); > > } > > > > + spin_lock(&d->page_alloc_lock); > > page_list_for_each ( page, &d->xenpage_list ) > > { > > printk(" XenPage %p: caf=%08lx, taf=%" PRtype_info "\n", > > _p(page_to_mfn(page)), > > page->count_info, page->u.inuse.type_info); > > } > > + spin_unlock(&d->page_alloc_lock); > > } > > > > struct domain *alloc_domain_struct(void) > > Sorry for not noticing this earlier. > > Jan >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-13 06:40 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
>>> On 12.08.10 at 18:37, Tim Deegan <Tim.Deegan@citrix.com> wrote: > At 16:09 +0100 on 12 Aug (1281629364), Jan Beulich wrote: >> >>> On 23.07.10 at 15:49, Tim Deegan <Tim.Deegan@citrix.com> wrote: >> > There are a few places in Xen where we walk a domain''s page lists >> > without holding the page_alloc lock. They race with updates to the page >> > lists, which are normally rare but can be quite common under PoD when >> > the domain is close to its memory limit and the PoD reclaimer is busy. >> > This patch protects those places by taking the page_alloc lock. >> > >> > I think this is OK for the two debug-key printouts - they don''t run from >> > irq context and look deadlock-free. The tboot change seems safe too >> >> While the comment says the patch would leave debug key printouts >> alone, ... > > Sorry, my intention was to say that changes to the debug-key printouts > are safe, not that they didn''t require changes. > > The debug-key printouts (in particular the NUMA one) are where I > actually hit this bug on a running system.But then, to avoid a hanging system, these should be trylock-s rather than plain locks, shouldn''t they? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Aug-13 06:46 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
On 13/08/2010 07:40, "Jan Beulich" <JBeulich@novell.com> wrote:>> Sorry, my intention was to say that changes to the debug-key printouts >> are safe, not that they didn''t require changes. >> >> The debug-key printouts (in particular the NUMA one) are where I >> actually hit this bug on a running system. > > But then, to avoid a hanging system, these should be trylock-s > rather than plain locks, shouldn''t they?Why? The handler is called in softirq context. It should be safe to spin. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-13 07:06 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
>>> On 13.08.10 at 08:46, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 13/08/2010 07:40, "Jan Beulich" <JBeulich@novell.com> wrote: > >>> Sorry, my intention was to say that changes to the debug-key printouts >>> are safe, not that they didn''t require changes. >>> >>> The debug-key printouts (in particular the NUMA one) are where I >>> actually hit this bug on a running system. >> >> But then, to avoid a hanging system, these should be trylock-s >> rather than plain locks, shouldn''t they? > > Why? The handler is called in softirq context. It should be safe to spin.Hmm, indeed. I was looking at others, and at least domain_dump_evtchn_info() also uses a trylock - apparently for no good reason. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Aug-13 07:10 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
On 13/08/2010 08:06, "Jan Beulich" <JBeulich@novell.com> wrote:>>> But then, to avoid a hanging system, these should be trylock-s >>> rather than plain locks, shouldn''t they? >> >> Why? The handler is called in softirq context. It should be safe to spin. > > Hmm, indeed. I was looking at others, and at least > domain_dump_evtchn_info() also uses a trylock - apparently for > no good reason.Well, since you wrote that function, would you like me to switch domain_dump_evtchn_info() to do a proper spin_lock()? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-13 07:20 UTC
Re: [Xen-devel] [RFC][PATCH] walking the page lists needs the page_alloc lock
>>> On 13.08.10 at 09:10, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 13/08/2010 08:06, "Jan Beulich" <JBeulich@novell.com> wrote: > >>>> But then, to avoid a hanging system, these should be trylock-s >>>> rather than plain locks, shouldn''t they? >>> >>> Why? The handler is called in softirq context. It should be safe to spin. >> >> Hmm, indeed. I was looking at others, and at least >> domain_dump_evtchn_info() also uses a trylock - apparently for >> no good reason. > > Well, since you wrote that function, would you like me to switch > domain_dump_evtchn_info() to do a proper spin_lock()?Yes. I''d want to do a little cleanup to the initial printk()-s at once, like in the patch below. Jan Signed-off-by: Jan Beulich <jbeulich@novell.com> --- 2010-08-12.orig/xen/common/event_channel.c +++ 2010-08-12/xen/common/event_channel.c @@ -1123,14 +1123,11 @@ static void domain_dump_evtchn_info(stru bitmap_scnlistprintf(keyhandler_scratch, sizeof(keyhandler_scratch), d->poll_mask, d->max_vcpus); - printk("Domain %d polling vCPUs: {%s}\n", - d->domain_id, keyhandler_scratch); - - if ( !spin_trylock(&d->event_lock) ) - return; - printk("Event channel information for domain %d:\n" - " port [p/m]\n", d->domain_id); + "Polling vCPUs: {%s}\n" + " port [p/m]\n", d->domain_id, keyhandler_scratch); + + spin_lock(&d->event_lock); for ( port = 1; port < MAX_EVTCHNS(d); ++port ) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel