Tim, Andres, while looking at how "X86/vMCE: handle broken page occurred before migration" uses the p2m code, I wondered whether the comment * N.B. get_gfn_query() is the _only_ one guaranteed not to take the * p2m lock; none of the others can be called with the p2m or paging * lock held. */ isn''t stale - afaict with get_gfn_type_access() passing "1" for the "locked" parameter of __get_gfn_type_access(), there''s no way to avoid the locking without using __get_gfn_type_access() directly. And then again, with the p2m lock being recursive these days, I don''t think there''s any harm calling the other methods here with that lock held. Jan
At 09:23 +0000 on 30 Oct (1351588996), Jan Beulich wrote:> while looking at how "X86/vMCE: handle broken page occurred > before migration" uses the p2m code, I wondered whether the > comment > > * N.B. get_gfn_query() is the _only_ one guaranteed not to take the > * p2m lock; none of the others can be called with the p2m or paging > * lock held. */ > > isn''t staleYes, it seems to be stale. I think that even with the originally proposed fine-grained p2m locking it would have been possible for this to take the p2m lock. :(> - afaict with get_gfn_type_access() passing "1" for the > "locked" parameter of __get_gfn_type_access(), there''s no way > to avoid the locking without using __get_gfn_type_access() > directly.get_gfn_query_unlocked() is the way to do lookups without the lock, but of course if you do that then you''ve no guarantee that the mfn it returns won''t get freed under your feet.> And then again, with the p2m lock being recursive these > days, I don''t think there''s any harm calling the other methods > here with that lock held.True, but it wouldn''t be safe to call it with the paging lock held. OTOH since we''re not seeing any crashes from the lock-ordering constraints maybe we don''t do that. Andres, what do you think? Can we just drop/amend that comment? Tim.
On Oct 30, 2012, at 5:36 AM, Tim Deegan <tim@xen.org> wrote:> At 09:23 +0000 on 30 Oct (1351588996), Jan Beulich wrote: >> while looking at how "X86/vMCE: handle broken page occurred >> before migration" uses the p2m code, I wondered whether the >> comment >> >> * N.B. get_gfn_query() is the _only_ one guaranteed not to take the >> * p2m lock; none of the others can be called with the p2m or paging >> * lock held. */ >> >> isn''t stale > > Yes, it seems to be stale. I think that even with the originally > proposed fine-grained p2m locking it would have been possible for this > to take the p2m lock. :(Stale comment indeed.> >> - afaict with get_gfn_type_access() passing "1" for the >> "locked" parameter of __get_gfn_type_access(), there''s no way >> to avoid the locking without using __get_gfn_type_access() >> directly. > > get_gfn_query_unlocked() is the way to do lookups without the lock, but > of course if you do that then you''ve no guarantee that the mfn it > returns won''t get freed under your feet.get_gfn_query_unlocked() is really only meant for debugging, domain is toast, etc. As Tim says, no guarantee that you are getting a live mfn and/or p2m type. Having said that, we have a big XOR for domains that use IOMMU -- no paging or sharing. From an mm p.o.v. they are fairly vanilla domains and under most circumstances one could perform unlocked queries. Still there are dangerous situations in which a p2m entry might change. I do not think we have an interlock in place for PoD, and if the domain were to run a balloon driver you''d have to trouble. I say this because a similar interlock may apply for vMCE? Is there an expectation for a domain with vMCE turned on to be "land-locked" memory-wise?> >> And then again, with the p2m lock being recursive these >> days, I don''t think there''s any harm calling the other methods >> here with that lock held.Is the patch you refer to http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in question the following? + get_gfn_query(d, pfn, &pt); + p2m_change_type(d, pfn, pt, p2m_ram_broken); + put_gfn(d, pfn); There really is no way to get rid of that p2m lock-protected critical section if the domain allows for paging etc. You might want to introduce a syntactically cleaner unconditional p2m_change_type variant that doesn''t cmpxchg with the previous type -- that is effectively what goes on here. Should be a tiny amount of refactoring and the code will be cleaner, no need for query or put.> > True, but it wouldn''t be safe to call it with the paging lock held. > OTOH since we''re not seeing any crashes from the lock-ordering > constraints maybe we don''t do that. > > Andres, what do you think? Can we just drop/amend that comment? >If you refer to removing the ordering constraints, my opinion is to NACK that. I think once you remove the ordering constraints it''s only a matter of time before deadlock-inducing code creeps in. These mm locks are loads of fun to get right (that kind of fun). The constraints ensure that any developer new to the mm code will get straightened out immediately. Note that the ordering constraints allow you to, beyond recursively taking the p2m lock, to take p2m -> paging -> p2m locks. But they does *not* allow you to do get_page_from_gfn -> paging lock -> p2m lock. Obviously we need more documentation in the .h files. I''ll get to that next week. Jan, ping me if I haven''t submitted something to the 4.2.1 tree by the close.> Tim._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
At 10:53 -0400 on 30 Oct (1351594401), Andres Lagar-Cavilla wrote:> >> And then again, with the p2m lock being recursive these > >> days, I don''t think there''s any harm calling the other methods > >> here with that lock held. > > Is the patch you refer to http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in question the following? > + get_gfn_query(d, pfn, &pt); > + p2m_change_type(d, pfn, pt, p2m_ram_broken); > + put_gfn(d, pfn); > > There really is no way to get rid of that p2m lock-protected critical > section if the domain allows for paging etc. You might want to > introduce a syntactically cleaner unconditional p2m_change_type > variant that doesn''t cmpxchg with the previous type -- that is > effectively what goes on here. Should be a tiny amount of refactoring > and the code will be cleaner, no need for query or put.I don''t think that change-type is even what''s wanted here. You want to use some more raw form of set_p2m_entry(), since keeping the MFN is not important. How about: guest_physmap_add_entry(d, pfn, MFN_INVALID, p2m_ram_broken); ?> > > > True, but it wouldn''t be safe to call it with the paging lock held. > > OTOH since we''re not seeing any crashes from the lock-ordering > > constraints maybe we don''t do that. > > > > Andres, what do you think? Can we just drop/amend that comment? > > > > If you refer to removing the ordering constraintsNoooooooooooooo! :) I think we should drop the comment that says it''s OK to call get_gfn_query() with the paging lock held (and audit to check that we don''t do that anywhere). Tim.
>>> On 30.10.12 at 15:53, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: > On Oct 30, 2012, at 5:36 AM, Tim Deegan <tim@xen.org> wrote: > I say this because a similar interlock may apply for vMCE? Is there an > expectation for a domain with vMCE turned on to be "land-locked" memory-wise?I don''t think there''s any dependency here, vMCE should be transparent in that respect.>>> And then again, with the p2m lock being recursive these >>> days, I don''t think there''s any harm calling the other methods >>> here with that lock held. > > Is the patch you refer to > http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in > question the following? > + get_gfn_query(d, pfn, &pt); > + p2m_change_type(d, pfn, pt, p2m_ram_broken); > + put_gfn(d, pfn);Yes.> There really is no way to get rid of that p2m lock-protected critical section > if the domain allows for paging etc.I wasn''t questioning the locking here. It was merely that code (and the lack of error handling therein) that made me look at the definition of the used p2m constructs.> You might want to introduce a > syntactically cleaner unconditional p2m_change_type variant that doesn''t > cmpxchg with the previous type -- that is effectively what goes on here. Should > be a tiny amount of refactoring and the code will be cleaner, no need for > query or put.That might help here, yes. Jan
On Oct 30, 2012, at 11:06 AM, Tim Deegan <tim@xen.org> wrote:> At 10:53 -0400 on 30 Oct (1351594401), Andres Lagar-Cavilla wrote: >>>> And then again, with the p2m lock being recursive these >>>> days, I don''t think there''s any harm calling the other methods >>>> here with that lock held. >> >> Is the patch you refer to http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in question the following? >> + get_gfn_query(d, pfn, &pt); >> + p2m_change_type(d, pfn, pt, p2m_ram_broken); >> + put_gfn(d, pfn); >> >> There really is no way to get rid of that p2m lock-protected critical >> section if the domain allows for paging etc. You might want to >> introduce a syntactically cleaner unconditional p2m_change_type >> variant that doesn''t cmpxchg with the previous type -- that is >> effectively what goes on here. Should be a tiny amount of refactoring >> and the code will be cleaner, no need for query or put. > > I don''t think that change-type is even what''s wanted here. You want to > use some more raw form of set_p2m_entry(), since keeping the MFN is not > important. How about: > > guest_physmap_add_entry(d, pfn, MFN_INVALID, p2m_ram_broken); >Has the side effect of leaking the current mfn (if any) in the p2m entry for the remainder of the lifetime of the domain. Digression: is this something we want to just go ahead and fix? Or is that semantic intended?> ? > >>> >>> True, but it wouldn''t be safe to call it with the paging lock held. >>> OTOH since we''re not seeing any crashes from the lock-ordering >>> constraints maybe we don''t do that. >>> >>> Andres, what do you think? Can we just drop/amend that comment? >>> >> >> If you refer to removing the ordering constraints > > Noooooooooooooo! :) > > I think we should drop the comment that says it''s OK to call > get_gfn_query() with the paging lock held (and audit to check that we > don''t do that anywhere).Good, same page then, phew Andres> > Tim.
On Oct 30, 2012, at 11:09 AM, Jan Beulich <jbeulich@suse.com> wrote:>>>> On 30.10.12 at 15:53, Andres Lagar-Cavilla <andreslc@gridcentric.ca> wrote: >> On Oct 30, 2012, at 5:36 AM, Tim Deegan <tim@xen.org> wrote: >> I say this because a similar interlock may apply for vMCE? Is there an >> expectation for a domain with vMCE turned on to be "land-locked" memory-wise? > > I don''t think there''s any dependency here, vMCE should be > transparent in that respect. > >>>> And then again, with the p2m lock being recursive these >>>> days, I don''t think there''s any harm calling the other methods >>>> here with that lock held. >> >> Is the patch you refer to >> http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in >> question the following? >> + get_gfn_query(d, pfn, &pt); >> + p2m_change_type(d, pfn, pt, p2m_ram_broken); >> + put_gfn(d, pfn); > > Yes. > >> There really is no way to get rid of that p2m lock-protected critical section >> if the domain allows for paging etc. > > I wasn''t questioning the locking here. It was merely that code (and > the lack of error handling therein) that made me look at the definition > of the used p2m constructs.Remind if I don''t get around to adding this to a cleanup Andres> >> You might want to introduce a >> syntactically cleaner unconditional p2m_change_type variant that doesn''t >> cmpxchg with the previous type -- that is effectively what goes on here. Should >> be a tiny amount of refactoring and the code will be cleaner, no need for >> query or put. > > That might help here, yes. > > Jan >
At 13:04 -0400 on 31 Oct (1351688673), Andres Lagar-Cavilla wrote:> On Oct 30, 2012, at 11:06 AM, Tim Deegan <tim@xen.org> wrote: > > > At 10:53 -0400 on 30 Oct (1351594401), Andres Lagar-Cavilla wrote: > >>>> And then again, with the p2m lock being recursive these > >>>> days, I don''t think there''s any harm calling the other methods > >>>> here with that lock held. > >> > >> Is the patch you refer to http://www.gossamer-threads.com/lists/xen/devel/261025 and the hunk in question the following? > >> + get_gfn_query(d, pfn, &pt); > >> + p2m_change_type(d, pfn, pt, p2m_ram_broken); > >> + put_gfn(d, pfn); > >> > >> There really is no way to get rid of that p2m lock-protected critical > >> section if the domain allows for paging etc. You might want to > >> introduce a syntactically cleaner unconditional p2m_change_type > >> variant that doesn''t cmpxchg with the previous type -- that is > >> effectively what goes on here. Should be a tiny amount of refactoring > >> and the code will be cleaner, no need for query or put. > > > > I don''t think that change-type is even what''s wanted here. You want to > > use some more raw form of set_p2m_entry(), since keeping the MFN is not > > important. How about: > > > > guest_physmap_add_entry(d, pfn, MFN_INVALID, p2m_ram_broken); > > > Has the side effect of leaking the current mfn (if any) in the p2m entry for the remainder of the lifetime of the domain. >Since this call is made whren the underlying MFN is broken and must never be touched, that''s actually a good thing. :)> Digression: is this something we want to just go ahead and fix? Or is > that semantic intended?The xenmem_add_to_physmap() code is basically a wrapper around this that DTRT with the thing that was there before. It Would Be Nice to audit all the various users of that code an make them consistent, though. Goes on the list... Tim.