Christoph Egger
2010-Aug-05 15:05 UTC
[Xen-devel] [PATCH 14/14] Nested Virtualization: hap-on-hap
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Aug-09 13:18 UTC
Re: [Xen-devel] [PATCH 14/14] Nested Virtualization: hap-on-hap
Hi, This looks a lot nicer than the last version I reviewed. I''m still concerned about TLB and p2m flushes, though. - I can''t see how writes to the ''host'' p2m table cause the ''shadow'' p2m tables to be flushed. I might just have missed it. - The p2m_flush operations don''t look safe against other vpcus. Mostly they''re called with v==current, which looks OK, but what if two vcpus are running on the same p2m? Also when p2m_get_nestedp2m() flushes the domain''s LRU p2m, there''s no shootdown if that p2m is in use on another pcpu. That could happen if the VM has more vcpus than MAX_NESTEDP2M. (Actually, that case is probably pretty broken generally.) Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-Aug-19 15:55 UTC
Re: [Xen-devel] [PATCH 14/14] Nested Virtualization: hap-on-hap
On Monday 09 August 2010 15:18:22 Tim Deegan wrote:> Hi, > > This looks a lot nicer than the last version I reviewed. I''m still > concerned about TLB and p2m flushes, though. > > - I can''t see how writes to the ''host'' p2m table cause the ''shadow'' p2m > tables to be flushed. I might just have missed it.The ''shadow'' p2m is flushed when - the l1 guest runs an instruction like INVLPGA (e.g. Windows 7 does so) - the l1 guest sets up a VMCB where * the tlb_control is set * the asid changed * the nested cr3 changed (and there is no free nestedp2m slot)> - The p2m_flush operations don''t look safe against other vpcus. Mostly > they''re called with v==current, which looks OK, but what if two vcpus > are running on the same p2m? Also when p2m_get_nestedp2m() flushes > the domain''s LRU p2m, there''s no shootdown if that p2m is in use on > another pcpu. That could happen if the VM has more vcpus than > MAX_NESTEDP2M. (Actually, that case is probably pretty broken > generally.)Yes, this is indeed an issue that needs to be fixed. How do I do a TLB shootdown across physical cpus which schedule vcpus bound to the l1 guest ? The physical cpu must leave the l1/l2 guest on the tlb shootdown. An optimization is to limit the tlb shootdown to those physical cpus where "their" vcpus are in guestmode if this is possible to implement. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Aug-19 16:33 UTC
Re: [Xen-devel] [PATCH 14/14] Nested Virtualization: hap-on-hap
At 16:55 +0100 on 19 Aug (1282236902), Christoph Egger wrote:> On Monday 09 August 2010 15:18:22 Tim Deegan wrote: > > - I can''t see how writes to the ''host'' p2m table cause the ''shadow'' p2m > > tables to be flushed. I might just have missed it. > > The ''shadow'' p2m is flushed when > - the l1 guest runs an instruction like INVLPGA (e.g. Windows 7 does so) > - the l1 guest sets up a VMCB where > * the tlb_control is set > * the asid changed > * the nested cr3 changed (and there is no free nestedp2m slot)OK, so the case I''m worried about is: if the L1''s p2m is changed (by ballooning, or the mem-event code, or by a page being marked as broken) then the shadow p2ms need to be updated or discarded because they might contain mappings made before the change. The equivalent problem exists in the normal shadow pagetables, which is why the p2m code has to use a callback (shadow_write_p2m_entry()) to write its entries. The shadow code writes the entry and removes all offending shadow PTEs at the same time. You''ll need something equivalent here for safety. BTW, it won''t be enough to just declare that these are unsupported combinations - if a nested-mode guest can give up a page of memory that its l2 guest still has mappings of then it breaks the basic memory safety of Xen.> > - The p2m_flush operations don''t look safe against other vpcus. Mostly > > they''re called with v==current, which looks OK, but what if two vcpus > > are running on the same p2m? Also when p2m_get_nestedp2m() flushes > > the domain''s LRU p2m, there''s no shootdown if that p2m is in use on > > another pcpu. That could happen if the VM has more vcpus than > > MAX_NESTEDP2M. (Actually, that case is probably pretty broken > > generally.) > > Yes, this is indeed an issue that needs to be fixed. How do I do > a TLB shootdown across physical cpus which schedule > vcpus bound to the l1 guest ?You can call on_selected_cpus() with the vcpu''s v->vcpu_dirty_cpumask (remembering to take smp_processor_id() out of the mask first). If all you need is a TLB shootdown you can call flush_tlb_mask(). That will cause a VMEXIT on the target CPUs so if you make it so that they can''t VMENTER again with the old p2m settings that might be all you need.> The physical cpu must leave the l1/l2 guest on the tlb shootdown. > An optimization is to limit the tlb shootdown to those physical cpus > where "their" vcpus are in guestmode if this is possible to implement.You''d have to add another cpumask to express that, but that''s doable. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-Sep-01 14:27 UTC
Re: [Xen-devel] [PATCH 14/14] Nested Virtualization: hap-on-hap
On Thursday 19 August 2010 18:33:16 Tim Deegan wrote:> At 16:55 +0100 on 19 Aug (1282236902), Christoph Egger wrote: > > On Monday 09 August 2010 15:18:22 Tim Deegan wrote: > > > - I can''t see how writes to the ''host'' p2m table cause the ''shadow'' p2m > > > tables to be flushed. I might just have missed it. > > > > The ''shadow'' p2m is flushed when > > - the l1 guest runs an instruction like INVLPGA (e.g. Windows 7 does so) > > - the l1 guest sets up a VMCB where > > * the tlb_control is set > > * the asid changed > > * the nested cr3 changed (and there is no free nestedp2m slot) > > OK, so the case I''m worried about is: if the L1''s p2m is changed (by > ballooning, or the mem-event code, or by a page being marked as broken) > then the shadow p2ms need to be updated or discarded because they might > contain mappings made before the change. > > The equivalent problem exists in the normal shadow pagetables, which is > why the p2m code has to use a callback (shadow_write_p2m_entry()) to > write its entries. The shadow code writes the entry and removes all > offending shadow PTEs at the same time. You''ll need something > equivalent here for safety. > > BTW, it won''t be enough to just declare that these are unsupported > combinations - if a nested-mode guest can give up a page of memory that > its l2 guest still has mappings of then it breaks the basic memory > safety of Xen.I see.> > > - The p2m_flush operations don''t look safe against other vpcus. Mostly > > > they''re called with v==current, which looks OK, but what if two vcpus > > > are running on the same p2m? Also when p2m_get_nestedp2m() flushes > > > the domain''s LRU p2m, there''s no shootdown if that p2m is in use on > > > another pcpu. That could happen if the VM has more vcpus than > > > MAX_NESTEDP2M. (Actually, that case is probably pretty broken > > > generally.) > > > > Yes, this is indeed an issue that needs to be fixed. How do I do > > a TLB shootdown across physical cpus which schedule > > vcpus bound to the l1 guest ? > > You can call on_selected_cpus() with the vcpu''s v->vcpu_dirty_cpumask > (remembering to take smp_processor_id() out of the mask first). > If all you need is a TLB shootdown you can call flush_tlb_mask(). > That will cause a VMEXIT on the target CPUs so if you make it so that > they can''t VMENTER again with the old p2m settings that might be all you > need. > > > The physical cpu must leave the l1/l2 guest on the tlb shootdown. > > An optimization is to limit the tlb shootdown to those physical cpus > > where "their" vcpus are in guestmode if this is possible to implement. > > You''d have to add another cpumask to express that, but that''s doable.Thanks. I implemented the tlb shootdown per p2m and do it on those physical cpus whose vcpus are in guestmode. I have implemented and fixed all feedback I got so far, except for the "flush nestedp2m on hostp2m write" we talked above. It is on my todo list. I will send the fourth patch series for feedback, the "flush nestedp2m on hostp2m write" will be part of the over next patch series. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel