Hi, Tim, Please take a look whether this optimization is feasible. :-) ---- Further release shadow lock overhead. emulate_map_dest doesn''t require holding lock, since only shadow related operation possibly involved is to remove shadow which is less frequent and can acquire lock inside. Rest are either guest table walk or per-vcpu monitor table manipulation Signed-off-by Kevin Tian <kevin.tian@intel.com> Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Jan-22 09:36 UTC
[Xen-devel] Re: [PATCH]Further release shadow lock overhead
Hi, At 14:10 +0800 on 22 Jan (1201011038), Tian, Kevin wrote:> Please take a look whether this optimization is > feasible. :-)It goes just a little bit too far. :) The shadow lock must be taken before the guest pagetable is modified and not released until after the shadow has been modified to match it. It might be possible to delay taking the lock until after the guest walk and the mappings are done, but I''m not sure of it. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Jan-22 13:23 UTC
[Xen-devel] RE: [PATCH]Further release shadow lock overhead
>From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: 2008年1月22日 17:37 > >Hi, > >At 14:10 +0800 on 22 Jan (1201011038), Tian, Kevin wrote: >> Please take a look whether this optimization is >> feasible. :-) > >It goes just a little bit too far. :) The shadow lock must be taken >before the guest pagetable is modified and not released until after the >shadow has been modified to match it. > >It might be possible to delay taking the lock until after the >guest walk >and the mappings are done, but I''m not sure of it. > >Cheers, > >Tim. >In the start with this idea, I also came up same concern. But after more thinking, I think it should be safe since we are emulating a local processor behavior. A sane guest won''t have concurrent updates to same entry or change one entry at the time being walked by another processor. However for architecture correctness, shadow code should be able to handle insane cases, which is the natural reason for the current lock range. However if we look into above insane cases and consider real processor behavior, actually this lock-release patch won''t make things different. As stated above, we may have two types of insane scenarios regarding page table race: (let''s take 2-cpu environment) #1 - Updates to same entry are issued on both cpus #2 - Update to one entry when it''s being walked by the other cpu Let''s see how physical processors behave: #1 - the order of two updates are undefined. Either one can be effective as the last update. But processors ensure atomic update, i.e, no partial content is in effect #2 - either old mapping or new mapping may be observed by cpu walking page table, depends on when the walk happen upon update from the other cpu. Atomic update is still honored Then let''s see current shadow write emulation logic: a) acquire shadow lock b) walk guest page table and map it c) update guest entry and validate its content to sync shadow d) release shadow lock #1 - Still, order of two writes are undefined. Lock is required to protect shadow-sync as a domain-wise shared data. #2 - Say cpu0 is using a1-b1-c1-d1 to emulate write to va, while cpu1 is using a2-b2-c2-d2 to emulate write to gl1e mapping that va. In current logic, only two possibilities: 1) a1-b1-c1-d1-a2-b2-c2-d2, with old mapping for va used and write emulation is done to an old frame 2) a2-b2-c2-d2-a1-b1-c1-d1, with new mapping for va used and write emulation is done to a new frame So even without attached patch, current logic still has two possible results, to use either new or old mapping just like physical processors. With attached patch, we may get more combinations like: b1-b2-a1-c1-d1-a2-c2-d2, ... But the net effect doesn''t change as long as atomic access is honored. Anyway, my points are: Shadow lock is just the lock for shadow page tables, which can''t be used to prevent guest page table race. Even when guest table walk is included in shadow lock, there''s no determined result for insane guest behavior. As long as atomic access is ensured, shadow can just behave like physical processors to release lock... I know such change should be careful, as you said shadow bug is most difficult to root cause. But I do want to share above thought for your inputs. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Jan-22 14:52 UTC
[Xen-devel] Re: [PATCH]Further release shadow lock overhead
At 21:23 +0800 on 22 Jan (1201037017), Tian, Kevin wrote:> Anyway, my points are: > Shadow lock is just the lock for shadow page tables, > which can''t be used to prevent guest page table race. Even > when guest table walk is included in shadow lock, there''s no > determined result for insane guest behavior.Agreed. I''m willing to believe that the walks can be done without the lock held (just didn''t want to assert it was safe without finding time to look more closely). But the current semantics of the shadow lock also includes maintaining the invariant that a shadow entry is always a valid shadow of the guest entry it represents (or not present). It''s relied on for level-2 and higher entries in the pagefault handler -- if there is a shadow pagetable present mapping a particular VA range, it is assumed that it is the shadow of the guest pagetable. For example, using a 2-level system for simplicity: - Guest l2e in slot 0 is 0x00001067 (GFN 0x1 is the gl1 table for VA 0x0) Shadow l2e in slot 0 is 0x0000a067 (MFN 0xA is the sl1 table for VA 0x0) - Guest vcpu 1 writes 0x00002067 into l2e slot 0. This write is emulated, and the shadow lock is not held when the guest pagetable is written to in sh_x86_emulate_write(). Just before the lock is taken... - Guest vcpu 2 takes a pagefault to VA 0x0 (and is using the same pagetable as vcpu 1 -- unlikely but possible, and less crazy scenarios have the same effect on 4-level pagetables). - The shadow fault handler sees 0x00002067, and maps GFN 0x2 as the guest l1 table. It sees that there is already a shadow l1 table present; so it propagates the contents of an l1 from GFN 0x2 into MFN 0xA, which is the wrong shadow. - Guest vcpu 1''s emulation now acquires the shadow lock and updates the shadow l2e to 0x0000B067, but it''s too late - the shadows are already corrupted, and no amount of TLB flushing will make it better. Sorry if that''s a bit long-winded; needed to make sure it was still true after the recent rework of the shadow PT walker. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Jan-22 15:26 UTC
RE: [Xen-devel] Re: [PATCH]Further release shadow lock overhead
>From: Tim Deegan >Sent: 2008年1月22日 22:52 > >At 21:23 +0800 on 22 Jan (1201037017), Tian, Kevin wrote: >> Anyway, my points are: >> Shadow lock is just the lock for shadow page tables, >> which can''t be used to prevent guest page table race. Even >> when guest table walk is included in shadow lock, there''s no >> determined result for insane guest behavior. > >Agreed. I''m willing to believe that the walks can be done without the >lock held (just didn''t want to assert it was safe without finding time >to look more closely).Fair enough. :-)> >But the current semantics of the shadow lock also includes maintaining >the invariant that a shadow entry is always a valid shadow of the guest >entry it represents (or not present).That''s an important point I may overlook before...> >It''s relied on for level-2 and higher entries in the pagefault handler >-- if there is a shadow pagetable present mapping a particular >VA range, >it is assumed that it is the shadow of the guest pagetable. For >example, using a 2-level system for simplicity: > >- Guest l2e in slot 0 is 0x00001067 (GFN 0x1 is the gl1 table >for VA 0x0) > Shadow l2e in slot 0 is 0x0000a067 (MFN 0xA is the sl1 table >for VA 0x0) > >- Guest vcpu 1 writes 0x00002067 into l2e slot 0. This write is > emulated, and the shadow lock is not held when the guest pagetable is > written to in sh_x86_emulate_write(). Just before the lock >is taken... > >- Guest vcpu 2 takes a pagefault to VA 0x0 (and is using the same > pagetable as vcpu 1 -- unlikely but possible, and less crazy >scenarios > have the same effect on 4-level pagetables). > >- The shadow fault handler sees 0x00002067, and maps GFN 0x2 as the > guest l1 table. It sees that there is already a shadow l1 table > present; so it propagates the contents of an l1 from GFN 0x2 into MFN > 0xA, which is the wrong shadow. > >- Guest vcpu 1''s emulation now acquires the shadow lock and updates the > shadow l2e to 0x0000B067, but it''s too late - the shadows are already > corrupted, and no amount of TLB flushing will make it better.Yes, this one is not recoverable. Not like a physical processor which can even recover as long as TLB is flushed later. It''s a good lesson to me about invariant relationship between shadow entry and guest one.> >Sorry if that''s a bit long-winded; needed to make sure it was >still true >after the recent rework of the shadow PT walker. >So I admit that my patch is a bit incorrectly aggressive, and guest table update must be always coupled with shadow sync logic within lock. It''s not worthy of breaking it which may introduce more overhead in normal path. At this time, it''s natural to ask how about just releasing the lock before the point updating guest entry. But no, I''ll hold this time to think more in case of similar ignorance in other path, though my mind tempts to persuade me then it should be safe. :-P I just really want to release unnecessary lock contention which is worse after scaled up... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2008-Jan-23 09:59 UTC
Re: [Xen-devel] Re: [PATCH]Further release shadow lock overhead
At 23:26 +0800 on 22 Jan (1201044364), Tian, Kevin wrote:> At this time, it''s natural to ask how about just releasing the lock before > the point updating guest entry.If you mean taking the lock just before the memcpy()/cmpxchg(), it sounds plausible. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Jan-23 10:04 UTC
RE: [Xen-devel] Re: [PATCH]Further release shadow lock overhead
>From: Tim Deegan [mailto:Tim.Deegan@citrix.com] >Sent: 2008年1月23日 18:00 > >At 23:26 +0800 on 22 Jan (1201044364), Tian, Kevin wrote: >> At this time, it''s natural to ask how about just releasing >the lock before >> the point updating guest entry. > >If you mean taking the lock just before the memcpy()/cmpxchg(), it >sounds plausible. > >Cheers, > >Tim. >Yes, exactly. I''ll resend the patch after test. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel