George Dunlap
2009-Sep-17 17:51 UTC
[Xen-devel] [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
ept_sync_domain() is called whenever the p2m changes. The current code calls sync on all cpus; this is extremely wasteful, especially for a single-vcpu VM on a 16-way (2x4x2) box. This patch will only call sync on cpus where there may be dirty cpu state. I''ve done a basic test, but I''d appreciate someone from Intel verifying that there shouldn''t be any problems. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Sep-18 07:43 UTC
[Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
This is definitely not safe. See ARM Vol 3B Section 24.3.3. EPTP-tagged cached mappings (that is, partial mappings from guest-phys to host-phys addresses, tagged by the EPT base pointer value) only need be flushed when a suitable INVEPT instruction is executed. So a HVM VCPU can leave EPTP-tagged droppings lying around in other TLBs as it migrates from CPU to CPU -- the domain_dirty_cpumask does not track this! The way to fix this is to'' if ( hap_enabled ) __invept(1, d->arch.hvm_domain.vmx.ept_control.eptp, 0)'' in vmx_ctxt_switch_from(). That then makes your patch correct. Care to test this and spin another patch? -- Keir On 17/09/2009 18:51, "George Dunlap" <dunlapg@umich.edu> wrote:> ept_sync_domain() is called whenever the p2m changes. The current > code calls sync on all cpus; this is extremely wasteful, especially > for a single-vcpu VM on a 16-way (2x4x2) box. > > This patch will only call sync on cpus where there may be dirty cpu > state. I''ve done a basic test, but I''d appreciate someone from Intel > verifying that there shouldn''t be any problems. > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Sep-18 09:09 UTC
Re: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
Just making sure I understand the failure mode... the risk is that if we change the p2m for this domain while a vcpu is not running on that core, and it''s scheduled there again, it may have stale mappings that get used (tagged by the EPT base pointer). Hmm... any idea the performance implications of flushing the EPT mappings when switching out? The main performance problem I need to fix is the ballooning case. Without this patch, on my 2x4x2 Nehalem box, a single-vcpu guest ballooning from 16GiB down to 8GiB for an initial boot takes 12-15 minutes; with the patch, it takes around 40 seconds. Perhaps we could start with the flush-ept-on-context-switch-out, and do some performance testing later to see if it''s worth maintaining a "might-need-an-ept-flush" mask. -George On Fri, Sep 18, 2009 at 8:43 AM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> This is definitely not safe. See ARM Vol 3B Section 24.3.3. EPTP-tagged > cached mappings (that is, partial mappings from guest-phys to host-phys > addresses, tagged by the EPT base pointer value) only need be flushed when a > suitable INVEPT instruction is executed. So a HVM VCPU can leave EPTP-tagged > droppings lying around in other TLBs as it migrates from CPU to CPU -- the > domain_dirty_cpumask does not track this! > > The way to fix this is to'' if ( hap_enabled ) __invept(1, > d->arch.hvm_domain.vmx.ept_control.eptp, 0)'' in vmx_ctxt_switch_from(). That > then makes your patch correct. > > Care to test this and spin another patch? > > -- Keir > > On 17/09/2009 18:51, "George Dunlap" <dunlapg@umich.edu> wrote: > >> ept_sync_domain() is called whenever the p2m changes. The current >> code calls sync on all cpus; this is extremely wasteful, especially >> for a single-vcpu VM on a 16-way (2x4x2) box. >> >> This patch will only call sync on cpus where there may be dirty cpu >> state. I''ve done a basic test, but I''d appreciate someone from Intel >> verifying that there shouldn''t be any problems. >> >> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Sep-18 13:21 UTC
Re: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
On 18/09/2009 10:09, "George Dunlap" <dunlapg@umich.edu> wrote:> Just making sure I understand the failure mode... the risk is that if > we change the p2m for this domain while a vcpu is not running on that > core, and it''s scheduled there again, it may have stale mappings that > get used (tagged by the EPT base pointer).Correct.> Hmm... any idea the performance implications of flushing the EPT > mappings when switching out? The main performance problem I need to > fix is the ballooning case. Without this patch, on my 2x4x2 Nehalem > box, a single-vcpu guest ballooning from 16GiB down to 8GiB for an > initial boot takes 12-15 minutes; with the patch, it takes around 40 > seconds.How bad is it is you don''t flush at all? 40s is still not great and maybe we could batch the INVEPT invocations? That might also greatly reduce the cost of the current naïve flush-all implementation, to a point where it is actually acceptable.> Perhaps we could start with the flush-ept-on-context-switch-out, and > do some performance testing later to see if it''s worth maintaining a > "might-need-an-ept-flush" mask.Well, that''s a possible alternative, yes. Just needs a bit of care in implementation but it wouldn''t need much extra code. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2009-Sep-18 14:55 UTC
Re: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
On Fri, Sep 18, 2009 at 2:21 PM, Keir Fraser <keir.fraser@eu.citrix.com> wrote:> How bad is it is you don''t flush at all? 40s is still not great and maybe we > could batch the INVEPT invocations? That might also greatly reduce the cost > of the current naïve flush-all implementation, to a point where it is > actually acceptable.In theory, the memory handed back by the balloon driver shouldn''t be touched by the OS. I think it would be OK if guess accesses to that gfn space didn''t fail for the guest giving up the pages; however, we can''t give the memory to another guest until we know for sure that the first guest can''t possibly access it anymore. I think we should be able to modify the balloon driver to "batch" some number of updates; say, 1024 at a time. Paul, any thoughts on this? The other thing is the level of indirection -- we have to add a parameter to set_p2m_entry() that says, "Don''t sync this right away", and then add another function that says, "Now commit all the changes I just made". That may take some thought to do correctly. I think avoiding flush-all is still a good idea, as we have other things like populate-on-demand and zero-page sweep doing lots of modifications of the p2m table during boot that can''t be batched this way. For example, the zero sweep *must* remove the page from the p2m before doing a final scan to make sure it''s still zero. I suppose we could scan a list of pages, remove them all from the p2m (taking the flush-all), and then scan them again... but it seems like now we''re starting to get a lot more complicated than just keeping a mask or two around. Thoughts? I think starting with a "flush-on-switch-to" would be good; it should be fairly straightforward to make that flush happen only when: * the domain has run on that cpu before, and * the domain has had p2m changes since the last time it ran -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paul Durrant
2009-Sep-18 16:51 UTC
Re: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
George Dunlap wrote:> > In theory, the memory handed back by the balloon driver shouldn''t be > touched by the OS. I think it would be OK if guess accesses to that > gfn space didn''t fail for the guest giving up the pages; however, we > can''t give the memory to another guest until we know for sure that the > first guest can''t possibly access it anymore. I think we should be > able to modify the balloon driver to "batch" some number of updates; > say, 1024 at a time. Paul, any thoughts on this? >The decrease_reservation op is called with however many PFNs can fit into the Windows MDL structure (8184 for a 32-bit VM) so we already do things in pretty big batches. Paul -- ==============================Paul Durrant, Software Engineer Citrix Systems (R&D) Ltd. First Floor, Building 101 Cambridge Science Park Milton Road Cambridge CB4 0FY United Kingdom TEL: x35957 (+44 1223 225957) ============================== _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Sep-18 17:06 UTC
RE: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
> In theory, the memory handed back by the balloon driver shouldn''t be > touched by the OS. I think it would be OK if guess accesses to that > gfn space didn''t fail for the guest giving up the pages; however, we > can''t give the memory to another guest until we know for sure that the > first guest can''t possibly access it anymore. I think we should be > able to modify the balloon driver to "batch" some number of updates; > say, 1024 at a time. Paul, any thoughts on this?Not sure if this is relevant, but tmem can and will grab memory handed back by the balloon driver for one guest and immediately use it for another guest''s data. So I hope the ballooning guest can''t possibly access it anymore! Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Sep-19 10:34 UTC
Re: [Xen-devel] Re: [PATCH] EPT: Only sync pcpus on which a domain''s vcpus might be running
On 18/09/2009 15:55, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> I think starting with a "flush-on-switch-to" would be good; it should > be fairly straightforward to make that flush happen only when: > * the domain has run on that cpu before, and > * the domain has had p2m changes since the last time it ranWell, I''m okay with that, but I will wait for a tested patch from you. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel