Aravindh Puthiyaparambil (aravindp)
2013-Nov-25 07:49 UTC
[RFC] Overview of work required to implement mem_access for PV guests
The mem_access APIs only work with HVM guests that run on Intel hardware with EPT support. This effort is to enable it for PV guests that run with shadow page tables. To facilitate this, the following will be done: 1. A magic page will be created for the mem_access (mem_event) ring buffer during the PV domain creation. 2. Most of the mem_event / mem_access functions and variable name are HVM specific. Given that I am enabling it for PV; I will change the names to something more generic. This also holds for the mem_access hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to make them a memory op or a domctl. 3. A new shadow option will be added called PG_mem_access. This mode is basic shadow mode with the addition of a table that will track the access permissions of each page in the guest. mem_access_tracker[gfmn] = access_type If there is a place where I can stash this in an existing structure, please point me at it. This will be enabled using xc_shadow_control() before attempting to enable mem_access on a PV guest. 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access for PV guests running with PG_mem_access shadow mode. 5. xc_domain_set_access_required(): No change required 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will call shadow_blow_tables() after recording the default access type for the domain. In the mode where it is setting mem_access type for individual gmfns, we will call a function that will drop the shadow for that individual gmfn. I am not sure which function to call. Will sh_remove_all_mappings(gmfn) do the trick? Please advise. The other issue here is that in the HVM case we could use xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to gfn+nr would be set. This won''t be possible in the PV case as we are actually dealing with mfns and mfn to mfn+nr need not belong to the same guest. But given that setting *all* page access permissions are done implicitly when setting default access, I think we can live with setting page permissions one at a time as they are faulted in. 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn from the mem_access_tracker table. 8. In sh_page_fault() perform access checks similar to ept_handle_violation() / hvm_hap_nested_page_fault(). 9. Hook in to _sh_propagate() and set up the L1 entries based on access permissions. This will be similar to ept_p2m_type_to_flags(). I think I might also have to hook in to the code that emulates page table writes to ensure access permissions are honored there too. Please give feedback on the above. Thanks, Aravindh
Andrew Cooper
2013-Nov-25 10:47 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote:> The mem_access APIs only work with HVM guests that run on Intel hardware with EPT support. This effort is to enable it for PV guests that run with shadow page tables. To facilitate this, the following will be done:Are you sure that this is only Intel with EPT? It looks to be a HAP feature, which includes AMD with NPT support.> > 1. A magic page will be created for the mem_access (mem_event) ring buffer during the PV domain creation.Where is this magic page being created from? This will likely have to be at the behest of the domain creation flags to avoid making it for the vast majority of domains which wont want the extra overhead.> > 2. Most of the mem_event / mem_access functions and variable name are HVM specific. Given that I am enabling it for PV; I will change the names to something more generic. This also holds for the mem_access hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to make them a memory op or a domctl.You cannot remove the hvmops. That would break the hypervisor ABI. You can certainly introduce new (more generic) hypercalls, implement the hvmop ones in terms of the new ones and mark the hvmop ones as deprecated in the documentation. ~Andrew> > 3. A new shadow option will be added called PG_mem_access. This mode is basic shadow mode with the addition of a table that will track the access permissions of each page in the guest. > mem_access_tracker[gfmn] = access_type > If there is a place where I can stash this in an existing structure, please point me at it. > This will be enabled using xc_shadow_control() before attempting to enable mem_access on a PV guest. > > 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access for PV guests running with PG_mem_access shadow mode. > > 5. xc_domain_set_access_required(): No change required > > 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will call shadow_blow_tables() after recording the default access type for the domain. In the mode where it is setting mem_access type for individual gmfns, we will call a function that will drop the shadow for that individual gmfn. I am not sure which function to call. Will sh_remove_all_mappings(gmfn) do the trick? Please advise. > > The other issue here is that in the HVM case we could use xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to gfn+nr would be set. This won''t be possible in the PV case as we are actually dealing with mfns and mfn to mfn+nr need not belong to the same guest. But given that setting *all* page access permissions are done implicitly when setting default access, I think we can live with setting page permissions one at a time as they are faulted in. > > 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn from the mem_access_tracker table. > > 8. In sh_page_fault() perform access checks similar to ept_handle_violation() / hvm_hap_nested_page_fault(). > > 9. Hook in to _sh_propagate() and set up the L1 entries based on access permissions. This will be similar to ept_p2m_type_to_flags(). I think I might also have to hook in to the code that emulates page table writes to ensure access permissions are honored there too. > > Please give feedback on the above. > > Thanks, > Aravindh > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Aravindh Puthiyaparambil (aravindp)
2013-Nov-25 19:39 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
>On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote: >> The mem_access APIs only work with HVM guests that run on Intel >hardware with EPT support. This effort is to enable it for PV guests that run >with shadow page tables. To facilitate this, the following will be done: > >Are you sure that this is only Intel with EPT? It looks to be a HAP feature, >which includes AMD with NPT support.Yes, mem_access is gated on EPT being available. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586 However, I think it is possible to implement this for NPT also.>> >> 1. A magic page will be created for the mem_access (mem_event) ring >buffer during the PV domain creation. > >Where is this magic page being created from? This will likely have to be at the >behest of the domain creation flags to avoid making it for the vast majority of >domains which wont want the extra overhead.This page will be similar to the console, xenstore and start_info pages. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86.c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452 I can definitely make it depend on a domain creation flag, however on the HVM side pages for all mem_events including mem_access are created by default. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build_x86.c;h=77bd3650c6486b4180101b5944a93ab6aaceca15;hb=HEAD#l487 So is it ok to have a domain creation flag just for mem_access for PV guests?>> >> 2. Most of the mem_event / mem_access functions and variable name are >HVM specific. Given that I am enabling it for PV; I will change the names to >something more generic. This also holds for the mem_access hypercalls, >which fall under HVM ops and do_hvm_op(). My plan is to make them a >memory op or a domctl. > >You cannot remove the hvmops. That would break the hypervisor ABI. > >You can certainly introduce new (more generic) hypercalls, implement the >hvmop ones in terms of the new ones and mark the hvmop ones as >deprecated in the documentation.Sorry, I should have been more explicit in the above paragraph. I was planning on doing exactly what you have said. I will be adding a new hypercall interface for the PV guests; we can then use that for HVM also and keep the old hvm_op hypercall interface as an alias. I would do something similar on the tool stack side. Create xc_domain_*_access() or xc_*_access() and make them wrappers that call xc_hvm_*_access() or vice-versa. Then move the functions to xc_domain.c or xc_mem_access.c. This way I am hoping the existing libxc APIs will still work. Thanks, Aravindh>~Andrew > >> >> 3. A new shadow option will be added called PG_mem_access. This mode is >basic shadow mode with the addition of a table that will track the access >permissions of each page in the guest. >> mem_access_tracker[gfmn] = access_type If there is a place where I can >> stash this in an existing structure, please point me at it. >> This will be enabled using xc_shadow_control() before attempting to enable >mem_access on a PV guest. >> >> 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access >for PV guests running with PG_mem_access shadow mode. >> >> 5. xc_domain_set_access_required(): No change required >> >> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will call >shadow_blow_tables() after recording the default access type for the >domain. In the mode where it is setting mem_access type for individual >gmfns, we will call a function that will drop the shadow for that individual >gmfn. I am not sure which function to call. Will >sh_remove_all_mappings(gmfn) do the trick? Please advise. >> >> The other issue here is that in the HVM case we could use >xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to >gfn+nr would be set. This won''t be possible in the PV case as we are actually >dealing with mfns and mfn to mfn+nr need not belong to the same guest. But >given that setting *all* page access permissions are done implicitly when >setting default access, I think we can live with setting page permissions one at >a time as they are faulted in. >> >> 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn >from the mem_access_tracker table. >> >> 8. In sh_page_fault() perform access checks similar to >ept_handle_violation() / hvm_hap_nested_page_fault(). >> >> 9. Hook in to _sh_propagate() and set up the L1 entries based on access >permissions. This will be similar to ept_p2m_type_to_flags(). I think I might >also have to hook in to the code that emulates page table writes to ensure >access permissions are honored there too. >> >> Please give feedback on the above. >> >> Thanks, >> Aravindh >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
Andrew Cooper
2013-Nov-25 20:18 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
On 25/11/13 19:39, Aravindh Puthiyaparambil (aravindp) wrote:>> On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote: >>> The mem_access APIs only work with HVM guests that run on Intel >> hardware with EPT support. This effort is to enable it for PV guests that run >> with shadow page tables. To facilitate this, the following will be done: >> >> Are you sure that this is only Intel with EPT? It looks to be a HAP feature, >> which includes AMD with NPT support. > Yes, mem_access is gated on EPT being available. > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586 > > However, I think it is possible to implement this for NPT also.So it is - I missed that.> >>> 1. A magic page will be created for the mem_access (mem_event) ring >> buffer during the PV domain creation. >> >> Where is this magic page being created from? This will likely have to be at the >> behest of the domain creation flags to avoid making it for the vast majority of >> domains which wont want the extra overhead. > This page will be similar to the console, xenstore and start_info pages. > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86.c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452 > > I can definitely make it depend on a domain creation flag, however on the HVM side pages for all mem_events including mem_access are created by default. > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build_x86.c;h=q;hb=HEAD#l487 > > So is it ok to have a domain creation flag just for mem_access for PV guests?The start_info and xenstore pages are critical for a PV guest to boot, and the console is fairly useful (although not essential). These pages belong to the guest and the guest has full read/write access and control over the pages. For HVM guests, the special pfns are hidden in the MMIO region, and have no access by default. HVM domains need to use add_to_physmap to get access to a subset of the magic pages. I do not think it is reasonable for a guest to be able to access its own mem_access page, and I am not sure how best to prevent PV guests from getting at it.> >>> 2. Most of the mem_event / mem_access functions and variable name are >> HVM specific. Given that I am enabling it for PV; I will change the names to >> something more generic. This also holds for the mem_access hypercalls, >> which fall under HVM ops and do_hvm_op(). My plan is to make them a >> memory op or a domctl. >> >> You cannot remove the hvmops. That would break the hypervisor ABI. >> >> You can certainly introduce new (more generic) hypercalls, implement the >> hvmop ones in terms of the new ones and mark the hvmop ones as >> deprecated in the documentation. > Sorry, I should have been more explicit in the above paragraph. I was planning on doing exactly what you have said. I will be adding a new hypercall interface for the PV guests; we can then use that for HVM also and keep the old hvm_op hypercall interface as an alias. > I would do something similar on the tool stack side. Create xc_domain_*_access() or xc_*_access() and make them wrappers that call xc_hvm_*_access() or vice-versa. Then move the functions to xc_domain.c or xc_mem_access.c. This way I am hoping the existing libxc APIs will still work. > > Thanks, > AravindhAh ok - that looks sensible overall. ~Andrew>> >>> 3. A new shadow option will be added called PG_mem_access. This mode is >> basic shadow mode with the addition of a table that will track the access >> permissions of each page in the guest. >>> mem_access_tracker[gfmn] = access_type If there is a place where I can >>> stash this in an existing structure, please point me at it. >>> This will be enabled using xc_shadow_control() before attempting to enable >> mem_access on a PV guest. >>> 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access >> for PV guests running with PG_mem_access shadow mode. >>> 5. xc_domain_set_access_required(): No change required >>> >>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >> pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will call >> shadow_blow_tables() after recording the default access type for the >> domain. In the mode where it is setting mem_access type for individual >> gmfns, we will call a function that will drop the shadow for that individual >> gmfn. I am not sure which function to call. Will >> sh_remove_all_mappings(gmfn) do the trick? Please advise. >>> The other issue here is that in the HVM case we could use >> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to >> gfn+nr would be set. This won''t be possible in the PV case as we are actually >> dealing with mfns and mfn to mfn+nr need not belong to the same guest. But >> given that setting *all* page access permissions are done implicitly when >> setting default access, I think we can live with setting page permissions one at >> a time as they are faulted in. >>> 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn > >from the mem_access_tracker table. >>> 8. In sh_page_fault() perform access checks similar to >> ept_handle_violation() / hvm_hap_nested_page_fault(). >>> 9. Hook in to _sh_propagate() and set up the L1 entries based on access >> permissions. This will be similar to ept_p2m_type_to_flags(). I think I might >> also have to hook in to the code that emulates page table writes to ensure >> access permissions are honored there too. >>> Please give feedback on the above. >>> >>> Thanks, >>> Aravindh >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel
Aravindh Puthiyaparambil (aravindp)
2013-Nov-25 20:29 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
>On 25/11/13 19:39, Aravindh Puthiyaparambil (aravindp) wrote: >>> On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote: >>>> The mem_access APIs only work with HVM guests that run on Intel >>> hardware with EPT support. This effort is to enable it for PV guests that run >>> with shadow page tables. To facilitate this, the following will be done: >>> >>> Are you sure that this is only Intel with EPT? It looks to be a HAP feature, >>> which includes AMD with NPT support. >> Yes, mem_access is gated on EPT being available. >> >http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem >_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586 >> >> However, I think it is possible to implement this for NPT also. > >So it is - I missed that. > >> >>>> 1. A magic page will be created for the mem_access (mem_event) ring >>> buffer during the PV domain creation. >>> >>> Where is this magic page being created from? This will likely have to be at >the >>> behest of the domain creation flags to avoid making it for the vast majority >of >>> domains which wont want the extra overhead. >> This page will be similar to the console, xenstore and start_info pages. >> >http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86. >c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452 >> >> I can definitely make it depend on a domain creation flag, however on the >HVM side pages for all mem_events including mem_access are created by >default. >> >http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build >_x86.c;h=q;hb=HEAD#l487 >> >> So is it ok to have a domain creation flag just for mem_access for PV guests? > >The start_info and xenstore pages are critical for a PV guest to boot, >and the console is fairly useful (although not essential). These pages >belong to the guest and the guest has full read/write access and control >over the pages. > >For HVM guests, the special pfns are hidden in the MMIO region, and have >no access by default. HVM domains need to use add_to_physmap to get >access to a subset of the magic pages. > >I do not think it is reasonable for a guest to be able to access its own >mem_access page, and I am not sure how best to prevent PV guests from >getting at it.In the mem_access listener for HVM guests, what happens is that the page is mapped in and then removed from physmap of the guest. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333 I was hoping to do the same for PV guests. Will that not work? Thanks, Aravindh>> >>>> 2. Most of the mem_event / mem_access functions and variable name >are >>> HVM specific. Given that I am enabling it for PV; I will change the names to >>> something more generic. This also holds for the mem_access hypercalls, >>> which fall under HVM ops and do_hvm_op(). My plan is to make them a >>> memory op or a domctl. >>> >>> You cannot remove the hvmops. That would break the hypervisor ABI. >>> >>> You can certainly introduce new (more generic) hypercalls, implement the >>> hvmop ones in terms of the new ones and mark the hvmop ones as >>> deprecated in the documentation. >> Sorry, I should have been more explicit in the above paragraph. I was >planning on doing exactly what you have said. I will be adding a new hypercall >interface for the PV guests; we can then use that for HVM also and keep the >old hvm_op hypercall interface as an alias. >> I would do something similar on the tool stack side. Create >xc_domain_*_access() or xc_*_access() and make them wrappers that call >xc_hvm_*_access() or vice-versa. Then move the functions to xc_domain.c or >xc_mem_access.c. This way I am hoping the existing libxc APIs will still work. >> >> Thanks, >> Aravindh > >Ah ok - that looks sensible overall. > >~Andrew >>> >>>> 3. A new shadow option will be added called PG_mem_access. This mode >is >>> basic shadow mode with the addition of a table that will track the access >>> permissions of each page in the guest. >>>> mem_access_tracker[gfmn] = access_type If there is a place where I can >>>> stash this in an existing structure, please point me at it. >>>> This will be enabled using xc_shadow_control() before attempting to >enable >>> mem_access on a PV guest. >>>> 4. xc_mem_access_enable/disable(): Change the flow to allow >mem_access >>> for PV guests running with PG_mem_access shadow mode. >>>> 5. xc_domain_set_access_required(): No change required >>>> >>>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >>> pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will >call >>> shadow_blow_tables() after recording the default access type for the >>> domain. In the mode where it is setting mem_access type for individual >>> gmfns, we will call a function that will drop the shadow for that individual >>> gmfn. I am not sure which function to call. Will >>> sh_remove_all_mappings(gmfn) do the trick? Please advise. >>>> The other issue here is that in the HVM case we could use >>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn >to >>> gfn+nr would be set. This won''t be possible in the PV case as we are >actually >>> dealing with mfns and mfn to mfn+nr need not belong to the same guest. >But >>> given that setting *all* page access permissions are done implicitly when >>> setting default access, I think we can live with setting page permissions one >at >>> a time as they are faulted in. >>>> 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn >> >from the mem_access_tracker table. >>>> 8. In sh_page_fault() perform access checks similar to >>> ept_handle_violation() / hvm_hap_nested_page_fault(). >>>> 9. Hook in to _sh_propagate() and set up the L1 entries based on access >>> permissions. This will be similar to ept_p2m_type_to_flags(). I think I might >>> also have to hook in to the code that emulates page table writes to ensure >>> access permissions are honored there too. >>>> Please give feedback on the above. >>>> >>>> Thanks, >>>> Aravindh >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel
Tim Deegan
2013-Nov-26 10:01 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
Hi, At 07:49 +0000 on 25 Nov (1385362167), Aravindh Puthiyaparambil (aravindp) wrote:> The mem_access APIs only work with HVM guests that run on Intel hardware with EPT support. This effort is to enable it for PV guests that run with shadow page tables. To facilitate this, the following will be done: > > 1. A magic page will be created for the mem_access (mem_event) ring > buffer during the PV domain creation.As Andrew pointed out, you might have to be careful about this -- if the page is owned by the domain itself, and it can find out (or guess) its MFN, it can map and write to it. You might need to allocate an anonymous page for this?> 2. Most of the mem_event / mem_access functions and variable name > are HVM specific. Given that I am enabling it for PV; I will change > the names to something more generic. This also holds for the > mem_access hypercalls, which fall under HVM ops and do_hvm_op(). My > plan is to make them a memory op or a domctl.Sure.> 3. A new shadow option will be added called PG_mem_access. This mode > is basic shadow mode with the addition of a table that will track > the access permissions of each page in the guest. > mem_access_tracker[gfmn] = access_type If there is a place where I > can stash this in an existing structure, please point me at it.My suggestion was that you should make another implementation of the p2m.h interface, which is already called in all the right places. You might want to borrow the tree-building code from the existing p2m-pt.c, though there''s no reason why your table should be structured as a pagetable. The important detail is that you should be using memory from the shadow pool to hold this datastructure.> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the > start pfn/gmfn is ~0ull, it takes it as a request to set default > access. Here we will call shadow_blow_tables() after recording the > default access type for the domain. In the mode where it is setting > mem_access type for individual gmfns, we will call a function that > will drop the shadow for that individual gmfn. I am not sure which > function to call. Will sh_remove_all_mappings(gmfn) do the trick?Yes, sh_remove_all_mappings() is the one you want.> The other issue here is that in the HVM case we could use > xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn > to gfn+nr would be set. This won''t be possible in the PV case as we > are actually dealing with mfns and mfn to mfn+nr need not belong to > the same guest. But given that setting *all* page access permissions > are done implicitly when setting default access, I think we can live > with setting page permissions one at a time as they are faulted in.Seems OK to me.> 8. In sh_page_fault() perform access checks similar to > ept_handle_violation() / hvm_hap_nested_page_fault().Yep.> 9. Hook in to _sh_propagate() and set up the L1 entries based on > access permissions. This will be similar to > ept_p2m_type_to_flags(). I think I might also have to hook in to the > code that emulates page table writes to ensure access permissions > are honored there too.I guess you might; again, the p2m interface will help here, and probably the exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook. Cheers, Tim.
Aravindh Puthiyaparambil (aravindp)
2013-Nov-26 18:19 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
>> The mem_access APIs only work with HVM guests that run on Intel >hardware with EPT support. This effort is to enable it for PV guests that run >with shadow page tables. To facilitate this, the following will be done: >> >> 1. A magic page will be created for the mem_access (mem_event) ring >> buffer during the PV domain creation. > >As Andrew pointed out, you might have to be careful about this -- if the page >is owned by the domain itself, and it can find out (or guess) its MFN, it can >map and write to it. You might need to allocate an anonymous page for this?Do you mean allocate an anonymous page in dom0 and use that? Won''t we run in to the problem Andres was mentioning a while back? http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc Or were you meaning something else? I was planning on doing exactly what we do in the mem_access listener for HVM guests. The magic page is mapped in and then removed from physmap of the guest. http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333 From my reading of xc_domain_decrease_reservation_exact(), I think it will also work for PV guests. Or am I missing something here?>> 2. Most of the mem_event / mem_access functions and variable name are >> HVM specific. Given that I am enabling it for PV; I will change the >> names to something more generic. This also holds for the mem_access >> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to >> make them a memory op or a domctl. > >Sure. > >> 3. A new shadow option will be added called PG_mem_access. This mode >> is basic shadow mode with the addition of a table that will track the >> access permissions of each page in the guest. >> mem_access_tracker[gfmn] = access_type If there is a place where I can >> stash this in an existing structure, please point me at it. > >My suggestion was that you should make another implementation of the >p2m.h interface, which is already called in all the right places. You might want >to borrow the tree-building code from the existing p2m-pt.c, though there''s >no reason why your table should be structured as a pagetable. The important >detail is that you should be using memory from the shadow pool to hold this >datastructure.OK, I will go down the path. I agree that my table needn''t be structured as a pagetable. The other thing I was thinking about is stashing the access information in the per mfn page_info structures. Or is that memory overhead too much of an overkill?>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >> pfn/gmfn is ~0ull, it takes it as a request to set default access. >> Here we will call shadow_blow_tables() after recording the default >> access type for the domain. In the mode where it is setting mem_access >> type for individual gmfns, we will call a function that will drop the >> shadow for that individual gmfn. I am not sure which function to call. >> Will sh_remove_all_mappings(gmfn) do the trick? > >Yes, sh_remove_all_mappings() is the one you want. > >> The other issue here is that in the HVM case we could use >> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn >> to gfn+nr would be set. This won''t be possible in the PV case as we >> are actually dealing with mfns and mfn to mfn+nr need not belong to >> the same guest. But given that setting *all* page access permissions >> are done implicitly when setting default access, I think we can live >> with setting page permissions one at a time as they are faulted in. > >Seems OK to me. > >> 8. In sh_page_fault() perform access checks similar to >> ept_handle_violation() / hvm_hap_nested_page_fault(). > >Yep. > >> 9. Hook in to _sh_propagate() and set up the L1 entries based on >> access permissions. This will be similar to ept_p2m_type_to_flags(). I >> think I might also have to hook in to the code that emulates page >> table writes to ensure access permissions are honored there too. > >I guess you might; again, the p2m interface will help here, and probably the >exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook.Thanks so much for the feedback. Aravindh
Andres Lagar-Cavilla
2013-Nov-26 18:41 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
>>> The mem_access APIs only work with HVM guests that run on Intel >> hardware with EPT support. This effort is to enable it for PV guests that run >> with shadow page tables. To facilitate this, the following will be done: >>> >>> 1. A magic page will be created for the mem_access (mem_event) ring >>> buffer during the PV domain creation. >> >> As Andrew pointed out, you might have to be careful about this -- if the page >> is owned by the domain itself, and it can find out (or guess) its MFN, it can >> map and write to it. You might need to allocate an anonymous page for this? > > Do you mean allocate an anonymous page in dom0 and use that? Won''t we run in to the problem Andres was mentioning a while back? > http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc > Or were you meaning something else? > > I was planning on doing exactly what we do in the mem_access listener for HVM guests. The magic page is mapped in and then removed from physmap of the guest. > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333Once the page is removed from the physmap, an hvm guest has no way of indexing that page and thus mapping it -- even though it''s a page that belongs to it, and that it''s threaded on its list of pages owned. WIth PV, you have an additional means of indexing, which is the raw MFN. The PV guest will be able to get at the page because it owns it, if it knows the MFN. No PFN/GFN required. This is how, for example, things like the grant table are mapped in classic PV domains. I don''t know how realistic is the concern about the domain guessing the MFN for the page. But if it can, and it maps it and mucks with the ring, the thing to evaluate is: can the guest throw dom0/host into a tailspin? The answer is likely "no", because guests can''t reasonably do this with other rings they have access to, like PV driver backends. But a flaw on the consumer side of mem events could yield a vector for DoS. If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there is no way for the PV domain to map it.> >> From my reading of xc_domain_decrease_reservation_exact(), I think it will also work for PV guests. Or am I missing something here? > >>> 2. Most of the mem_event / mem_access functions and variable name are >>> HVM specific. Given that I am enabling it for PV; I will change the >>> names to something more generic. This also holds for the mem_access >>> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to >>> make them a memory op or a domctl. >> >> Sure. >> >>> 3. A new shadow option will be added called PG_mem_access. This mode >>> is basic shadow mode with the addition of a table that will track the >>> access permissions of each page in the guest. >>> mem_access_tracker[gfmn] = access_type If there is a place where I can >>> stash this in an existing structure, please point me at it. >> >> My suggestion was that you should make another implementation of the >> p2m.h interface, which is already called in all the right places. You might want >> to borrow the tree-building code from the existing p2m-pt.c, though there''s >> no reason why your table should be structured as a pagetable. The important >> detail is that you should be using memory from the shadow pool to hold this >> datastructure. > > OK, I will go down the path. I agree that my table needn''t be structured as a pagetable. The other thing I was thinking about is stashing the access information in the per mfn page_info structures. Or is that memory overhead too much of an overkill?Well, the page/MFN could conceivably be mapped by many domains. There are ample bits to play with in the type flag, for example. But as long as you don''t care about mem_event on pages shared across two or more PV domains, then that should be fine. I wouldn''t blame you if you didn''t care :) OTOH, all you need is a byte per pfn, and the great thing is that in PV domains, the physmap is bounded and continuous. Unlike HVM and its PCI holes, etc, which demand the sparse tree structure. So you can allocate an easily indexable array, notwithstanding super page concerns (I think/hope). Andres> >>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start >>> pfn/gmfn is ~0ull, it takes it as a request to set default access. >>> Here we will call shadow_blow_tables() after recording the default >>> access type for the domain. In the mode where it is setting mem_access >>> type for individual gmfns, we will call a function that will drop the >>> shadow for that individual gmfn. I am not sure which function to call. >>> Will sh_remove_all_mappings(gmfn) do the trick? >> >> Yes, sh_remove_all_mappings() is the one you want. >> >>> The other issue here is that in the HVM case we could use >>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn >>> to gfn+nr would be set. This won''t be possible in the PV case as we >>> are actually dealing with mfns and mfn to mfn+nr need not belong to >>> the same guest. But given that setting *all* page access permissions >>> are done implicitly when setting default access, I think we can live >>> with setting page permissions one at a time as they are faulted in. >> >> Seems OK to me. >> >>> 8. In sh_page_fault() perform access checks similar to >>> ept_handle_violation() / hvm_hap_nested_page_fault(). >> >> Yep. >> >>> 9. Hook in to _sh_propagate() and set up the L1 entries based on >>> access permissions. This will be similar to ept_p2m_type_to_flags(). I >>> think I might also have to hook in to the code that emulates page >>> table writes to ensure access permissions are honored there too. >> >> I guess you might; again, the p2m interface will help here, and probably the >> exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook. > > Thanks so much for the feedback. > Aravindh
Aravindh Puthiyaparambil (aravindp)
2013-Nov-26 19:46 UTC
Re: [RFC] Overview of work required to implement mem_access for PV guests
>>>> The mem_access APIs only work with HVM guests that run on Intel >>> hardware with EPT support. This effort is to enable it for PV guests >>> that run with shadow page tables. To facilitate this, the following will be >done: >>>> >>>> 1. A magic page will be created for the mem_access (mem_event) ring >>>> buffer during the PV domain creation. >>> >>> As Andrew pointed out, you might have to be careful about this -- if >>> the page is owned by the domain itself, and it can find out (or >>> guess) its MFN, it can map and write to it. You might need to allocate an >anonymous page for this? >> >> Do you mean allocate an anonymous page in dom0 and use that? Won''t we >run in to the problem Andres was mentioning a while back? >> http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc >> Or were you meaning something else? >> >> I was planning on doing exactly what we do in the mem_access listener for >HVM guests. The magic page is mapped in and then removed from physmap >of the guest. >> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-acce >> ss/xen- >access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l33 >> 3 > >Once the page is removed from the physmap, an hvm guest has no way of >indexing that page and thus mapping it -- even though it''s a page that belongs >to it, and that it''s threaded on its list of pages owned. > >WIth PV, you have an additional means of indexing, which is the raw MFN. >The PV guest will be able to get at the page because it owns it, if it knows the >MFN. No PFN/GFN required. This is how, for example, things like the grant >table are mapped in classic PV domains. > >I don''t know how realistic is the concern about the domain guessing the MFN >for the page. But if it can, and it maps it and mucks with the ring, the thing to >evaluate is: can the guest throw dom0/host into a tailspin? The answer is likely >"no", because guests can''t reasonably do this with other rings they have >access to, like PV driver backends. But a flaw on the consumer side of mem >events could yield a vector for DoS. > >If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there >is no way for the PV domain to map it.Thanks so much for the explanation. I will use to alloc_xenheap_pages.>> >>> From my reading of xc_domain_decrease_reservation_exact(), I think it >will also work for PV guests. Or am I missing something here? >> >>>> 2. Most of the mem_event / mem_access functions and variable name >>>> are HVM specific. Given that I am enabling it for PV; I will change >>>> the names to something more generic. This also holds for the >>>> mem_access hypercalls, which fall under HVM ops and do_hvm_op(). My >>>> plan is to make them a memory op or a domctl. >>> >>> Sure. >>> >>>> 3. A new shadow option will be added called PG_mem_access. This mode >>>> is basic shadow mode with the addition of a table that will track >>>> the access permissions of each page in the guest. >>>> mem_access_tracker[gfmn] = access_type If there is a place where I >>>> can stash this in an existing structure, please point me at it. >>> >>> My suggestion was that you should make another implementation of the >>> p2m.h interface, which is already called in all the right places. >>> You might want to borrow the tree-building code from the existing >>> p2m-pt.c, though there''s no reason why your table should be >>> structured as a pagetable. The important detail is that you should >>> be using memory from the shadow pool to hold this datastructure. >> >> OK, I will go down the path. I agree that my table needn''t be structured as a >pagetable. The other thing I was thinking about is stashing the access >information in the per mfn page_info structures. Or is that memory overhead >too much of an overkill? > >Well, the page/MFN could conceivably be mapped by many domains. There >are ample bits to play with in the type flag, for example. But as long as you >don''t care about mem_event on pages shared across two or more PV >domains, then that should be fine. I wouldn''t blame you if you didn''t care :)Yup, I don''t care :-)>OTOH, all you need is a byte per pfn, and the great thing is that in PV domains, >the physmap is bounded and continuous. Unlike HVM and its PCI holes, etc, >which demand the sparse tree structure. So you can allocate an easily >indexable array, notwithstanding super page concerns (I think/hope).I did not realize that the physmap is bounded and contiguous. I will go with an indexable array. Thanks, Aravindh