thr3ads.net - Xen devel - [RFC] Overview of work required to implement mem

If this information is useful, please help other people find it:
Share via:

Aravindh Puthiyaparambil (aravindp)

2013-Nov-25 07:49 UTC

[RFC] Overview of work required to implement mem_access for PV guests

The mem_access APIs only work with HVM guests that run on Intel hardware with
EPT support. This effort is to enable it for PV guests that run with shadow page
tables. To facilitate this, the following will be done:

1. A magic page will be created for the mem_access (mem_event) ring buffer
during the PV domain creation.

2. Most of the mem_event / mem_access functions and variable name are HVM
specific. Given that I am enabling it for PV; I will change the names to
something more generic. This also holds for the mem_access hypercalls, which
fall under HVM ops and do_hvm_op(). My plan is to make them a memory op or a
domctl.

3. A new shadow option will be added called PG_mem_access. This mode is basic
shadow mode with the addition of a table that will track the access permissions
of each page in the guest.
mem_access_tracker[gfmn] = access_type
If there is a place where I can stash this in an existing structure, please
point me at it.
This will be enabled using xc_shadow_control() before attempting to enable
mem_access on a PV guest.

4. xc_mem_access_enable/disable(): Change the flow to allow mem_access for PV
guests running with PG_mem_access shadow mode.

5. xc_domain_set_access_required(): No change required

6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start pfn/gmfn
is ~0ull, it takes it as a request to set default access. Here we will call
shadow_blow_tables() after recording the default access type for the domain. In
the mode where it is setting mem_access type for individual gmfns, we will call
a function that will drop the shadow for that individual gmfn. I am not sure
which function to call. Will sh_remove_all_mappings(gmfn) do the trick? Please
advise.

The other issue here is that in the HVM case we could use
xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to gfn+nr
would be set. This won''t be possible in the PV case as we are actually
dealing with mfns and mfn to mfn+nr need not belong to the same guest. But given
that setting *all* page access permissions are done implicitly when setting
default access, I think we can live with setting page permissions one at a time
as they are faulted in.

7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn from the
mem_access_tracker table.

8. In sh_page_fault() perform access checks similar to ept_handle_violation() /
hvm_hap_nested_page_fault().

9. Hook in to _sh_propagate() and set up the L1 entries based on access
permissions. This will be similar to ept_p2m_type_to_flags(). I think I might
also have to hook in to the code that emulates page table writes to ensure
access permissions are honored there too.

Please give feedback on the above.

Thanks,
Aravindh

Andrew Cooper

2013-Nov-25 10:47 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp)
wrote:> The mem_access APIs only work with HVM guests that run on Intel hardware
with EPT support. This effort is to enable it for PV guests that run with shadow
page tables. To facilitate this, the following will be done:
Are you sure that this is only Intel with EPT?  It looks to be a HAP
feature, which includes AMD with NPT support.
>
> 1. A magic page will be created for the mem_access (mem_event) ring buffer
during the PV domain creation.
Where is this magic page being created from? This will likely have to be
at the behest of the domain creation flags to avoid making it for the
vast majority of domains which wont want the extra overhead.
>
> 2. Most of the mem_event / mem_access functions and variable name are HVM
specific. Given that I am enabling it for PV; I will change the names to
something more generic. This also holds for the mem_access hypercalls, which
fall under HVM ops and do_hvm_op(). My plan is to make them a memory op or a
domctl.
You cannot remove the hvmops.  That would break the hypervisor ABI.

You can certainly introduce new (more generic) hypercalls, implement the
hvmop ones in terms of the new ones and mark the hvmop ones as
deprecated in the documentation.

~Andrew
>
> 3. A new shadow option will be added called PG_mem_access. This mode is
basic shadow mode with the addition of a table that will track the access
permissions of each page in the guest.
> mem_access_tracker[gfmn] = access_type
> If there is a place where I can stash this in an existing structure, please
point me at it.
> This will be enabled using xc_shadow_control() before attempting to enable
mem_access on a PV guest.
>
> 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access for
PV guests running with PG_mem_access shadow mode.
>
> 5. xc_domain_set_access_required(): No change required
>
> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start
pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we will
call shadow_blow_tables() after recording the default access type for the
domain. In the mode where it is setting mem_access type for individual gmfns, we
will call a function that will drop the shadow for that individual gmfn. I am
not sure which function to call. Will sh_remove_all_mappings(gmfn) do the trick?
Please advise.
>
> The other issue here is that in the HVM case we could use
xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to gfn+nr
would be set. This won''t be possible in the PV case as we are actually
dealing with mfns and mfn to mfn+nr need not belong to the same guest. But given
that setting *all* page access permissions are done implicitly when setting
default access, I think we can live with setting page permissions one at a time
as they are faulted in.
>
> 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn
from the mem_access_tracker table.
>
> 8. In sh_page_fault() perform access checks similar to
ept_handle_violation() / hvm_hap_nested_page_fault().
>
> 9. Hook in to _sh_propagate() and set up the L1 entries based on access
permissions. This will be similar to ept_p2m_type_to_flags(). I think I might
also have to hook in to the code that emulates page table writes to ensure
access permissions are honored there too.
>
> Please give feedback on the above.
>
> Thanks,
> Aravindh
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Aravindh Puthiyaparambil (aravindp)

2013-Nov-25 19:39 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

>On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote:
>> The mem_access APIs only work with HVM guests that run on Intel
>hardware with EPT support. This effort is to enable it for PV guests that
run
>with shadow page tables. To facilitate this, the following will be done:
>
>Are you sure that this is only Intel with EPT?  It looks to be a HAP
feature,
>which includes AMD with NPT support.
Yes, mem_access is gated on EPT being available.
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586

However, I think it is possible to implement this for NPT also.
>>
>> 1. A magic page will be created for the mem_access (mem_event) ring
>buffer during the PV domain creation.
>
>Where is this magic page being created from? This will likely have to be at
the
>behest of the domain creation flags to avoid making it for the vast majority
of
>domains which wont want the extra overhead.
This page will be similar to the console, xenstore and start_info pages.
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86.c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452

I can definitely make it depend on a domain creation flag, however on the HVM
side pages for all mem_events including mem_access are created by default.
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build_x86.c;h=77bd3650c6486b4180101b5944a93ab6aaceca15;hb=HEAD#l487

So is it ok to have a domain creation flag just for mem_access for PV guests?
>>
>> 2. Most of the mem_event / mem_access functions and variable name are
>HVM specific. Given that I am enabling it for PV; I will change the names to
>something more generic. This also holds for the mem_access hypercalls,
>which fall under HVM ops and do_hvm_op(). My plan is to make them a
>memory op or a domctl.
>
>You cannot remove the hvmops.  That would break the hypervisor ABI.
>
>You can certainly introduce new (more generic) hypercalls, implement the
>hvmop ones in terms of the new ones and mark the hvmop ones as
>deprecated in the documentation.
Sorry, I should have been more explicit in the above paragraph. I was planning
on doing exactly what you have said. I will be adding a new hypercall interface
for the PV guests; we can then use that for HVM also and keep the old hvm_op
hypercall interface as an alias.
I would do something similar on the tool stack side. Create xc_domain_*_access()
or xc_*_access() and make them wrappers  that call xc_hvm_*_access() or
vice-versa. Then move the functions to xc_domain.c or xc_mem_access.c. This way
I am hoping the existing libxc APIs will still work.

Thanks,
Aravindh
>~Andrew
>
>>
>> 3. A new shadow option will be added called PG_mem_access. This mode is
>basic shadow mode with the addition of a table that will track the access
>permissions of each page in the guest.
>> mem_access_tracker[gfmn] = access_type If there is a place where I can
>> stash this in an existing structure, please point me at it.
>> This will be enabled using xc_shadow_control() before attempting to
enable
>mem_access on a PV guest.
>>
>> 4. xc_mem_access_enable/disable(): Change the flow to allow mem_access
>for PV guests running with PG_mem_access shadow mode.
>>
>> 5. xc_domain_set_access_required(): No change required
>>
>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start
>pfn/gmfn is ~0ull, it takes it as a request to set default access. Here we
will call
>shadow_blow_tables() after recording the default access type for the
>domain. In the mode where it is setting mem_access type for individual
>gmfns, we will call a function that will drop the shadow for that individual
>gmfn. I am not sure which function to call. Will
>sh_remove_all_mappings(gmfn) do the trick? Please advise.
>>
>> The other issue here is that in the HVM case we could use
>xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to
>gfn+nr would be set. This won''t be possible in the PV case as we
are actually
>dealing with mfns and mfn to mfn+nr need not belong to the same guest. But
>given that setting *all* page access permissions are done implicitly when
>setting default access, I think we can live with setting page permissions
one at
>a time as they are faulted in.
>>
>> 7. xc_(hvm)_get_mem_access(): This will return the access type for gmfn
>from the mem_access_tracker table.
>>
>> 8. In sh_page_fault() perform access checks similar to
>ept_handle_violation() / hvm_hap_nested_page_fault().
>>
>> 9. Hook in to _sh_propagate() and set up the L1 entries based on access
>permissions. This will be similar to ept_p2m_type_to_flags(). I think I
might
>also have to hook in to the code that emulates page table writes to ensure
>access permissions are honored there too.
>>
>> Please give feedback on the above.
>>
>> Thanks,
>> Aravindh
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

Andrew Cooper

2013-Nov-25 20:18 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

On 25/11/13 19:39, Aravindh Puthiyaparambil (aravindp)
wrote:>> On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote:
>>> The mem_access APIs only work with HVM guests that run on Intel
>> hardware with EPT support. This effort is to enable it for PV guests
that run
>> with shadow page tables. To facilitate this, the following will be
done:
>>
>> Are you sure that this is only Intel with EPT?  It looks to be a HAP
feature,
>> which includes AMD with NPT support.
> Yes, mem_access is gated on EPT being available.
>
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586
>
> However, I think it is possible to implement this for NPT also.
So it is - I missed that.
>
>>> 1. A magic page will be created for the mem_access (mem_event) ring
>> buffer during the PV domain creation.
>>
>> Where is this magic page being created from? This will likely have to
be at the
>> behest of the domain creation flags to avoid making it for the vast
majority of
>> domains which wont want the extra overhead.
> This page will be similar to the console, xenstore and start_info pages.
>
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86.c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452
>
> I can definitely make it depend on a domain creation flag, however on the
HVM side pages for all mem_events including mem_access are created by default.
>
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build_x86.c;h=q;hb=HEAD#l487
>
> So is it ok to have a domain creation flag just for mem_access for PV
guests?
The start_info and xenstore pages are critical for a PV guest to boot,
and the console is fairly useful (although not essential).  These pages
belong to the guest and the guest has full read/write access and control
over the pages.

For HVM guests, the special pfns are hidden in the MMIO region, and have
no access by default.  HVM domains need to use add_to_physmap to get
access to a subset of the magic pages.

I do not think it is reasonable for a guest to be able to access its own
mem_access page, and I am not sure how best to prevent PV guests from
getting at it.
>
>>> 2. Most of the mem_event / mem_access functions and variable name
are
>> HVM specific. Given that I am enabling it for PV; I will change the
names to
>> something more generic. This also holds for the mem_access hypercalls,
>> which fall under HVM ops and do_hvm_op(). My plan is to make them a
>> memory op or a domctl.
>>
>> You cannot remove the hvmops.  That would break the hypervisor ABI.
>>
>> You can certainly introduce new (more generic) hypercalls, implement
the
>> hvmop ones in terms of the new ones and mark the hvmop ones as
>> deprecated in the documentation.
> Sorry, I should have been more explicit in the above paragraph. I was
planning on doing exactly what you have said. I will be adding a new hypercall
interface for the PV guests; we can then use that for HVM also and keep the old
hvm_op hypercall interface as an alias.
> I would do something similar on the tool stack side. Create
xc_domain_*_access() or xc_*_access() and make them wrappers  that call
xc_hvm_*_access() or vice-versa. Then move the functions to xc_domain.c or
xc_mem_access.c. This way I am hoping the existing libxc APIs will still work.
>
> Thanks,
> Aravindh
Ah ok - that looks sensible overall.

~Andrew>>
>>> 3. A new shadow option will be added called PG_mem_access. This
mode is
>> basic shadow mode with the addition of a table that will track the
access
>> permissions of each page in the guest.
>>> mem_access_tracker[gfmn] = access_type If there is a place where I
can
>>> stash this in an existing structure, please point me at it.
>>> This will be enabled using xc_shadow_control() before attempting to
enable
>> mem_access on a PV guest.
>>> 4. xc_mem_access_enable/disable(): Change the flow to allow
mem_access
>> for PV guests running with PG_mem_access shadow mode.
>>> 5. xc_domain_set_access_required(): No change required
>>>
>>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the
start
>> pfn/gmfn is ~0ull, it takes it as a request to set default access. Here
we will call
>> shadow_blow_tables() after recording the default access type for the
>> domain. In the mode where it is setting mem_access type for individual
>> gmfns, we will call a function that will drop the shadow for that
individual
>> gmfn. I am not sure which function to call. Will
>> sh_remove_all_mappings(gmfn) do the trick? Please advise.
>>> The other issue here is that in the HVM case we could use
>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn to
>> gfn+nr would be set. This won''t be possible in the PV case as
we are actually
>> dealing with mfns and mfn to mfn+nr need not belong to the same guest.
But
>> given that setting *all* page access permissions are done implicitly
when
>> setting default access, I think we can live with setting page
permissions one at
>> a time as they are faulted in.
>>> 7. xc_(hvm)_get_mem_access(): This will return the access type for
gmfn
> >from the mem_access_tracker table.
>>> 8. In sh_page_fault() perform access checks similar to
>> ept_handle_violation() / hvm_hap_nested_page_fault().
>>> 9. Hook in to _sh_propagate() and set up the L1 entries based on
access
>> permissions. This will be similar to ept_p2m_type_to_flags(). I think I
might
>> also have to hook in to the code that emulates page table writes to
ensure
>> access permissions are honored there too.
>>> Please give feedback on the above.
>>>
>>> Thanks,
>>> Aravindh
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel

Aravindh Puthiyaparambil (aravindp)

2013-Nov-25 20:29 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

>On 25/11/13 19:39, Aravindh Puthiyaparambil (aravindp) wrote:
>>> On 25/11/13 07:49, Aravindh Puthiyaparambil (aravindp) wrote:
>>>> The mem_access APIs only work with HVM guests that run on Intel
>>> hardware with EPT support. This effort is to enable it for PV
guests that run
>>> with shadow page tables. To facilitate this, the following will be
done:
>>>
>>> Are you sure that this is only Intel with EPT?  It looks to be a
HAP feature,
>>> which includes AMD with NPT support.
>> Yes, mem_access is gated on EPT being available.
>>
>http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/mm/mem
>_event.c;h=d00e4041b2bd099b850644db86449c8a235f0f5a;hb=HEAD#l586
>>
>> However, I think it is possible to implement this for NPT also.
>
>So it is - I missed that.
>
>>
>>>> 1. A magic page will be created for the mem_access (mem_event)
ring
>>> buffer during the PV domain creation.
>>>
>>> Where is this magic page being created from? This will likely have
to be at
>the
>>> behest of the domain creation flags to avoid making it for the vast
majority
>of
>>> domains which wont want the extra overhead.
>> This page will be similar to the console, xenstore and start_info
pages.
>>
>http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_dom_x86.
>c;h=e034d62373c7a080864d1aefaa6a06412653c9af;hb=HEAD#l452
>>
>> I can definitely make it depend on a domain creation flag, however on
the
>HVM side pages for all mem_events including mem_access are created by
>default.
>>
>http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_hvm_build
>_x86.c;h=q;hb=HEAD#l487
>>
>> So is it ok to have a domain creation flag just for mem_access for PV
guests?
>
>The start_info and xenstore pages are critical for a PV guest to boot,
>and the console is fairly useful (although not essential).  These pages
>belong to the guest and the guest has full read/write access and control
>over the pages.
>
>For HVM guests, the special pfns are hidden in the MMIO region, and have
>no access by default.  HVM domains need to use add_to_physmap to get
>access to a subset of the magic pages.
>
>I do not think it is reasonable for a guest to be able to access its own
>mem_access page, and I am not sure how best to prevent PV guests from
>getting at it.
In the mem_access listener for HVM guests, what happens is that the page is
mapped in and then removed from physmap of the guest.
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333

I was hoping to do the same for PV guests. Will that not work?

Thanks,
Aravindh
>>
>>>> 2. Most of the mem_event / mem_access functions and variable
name
>are
>>> HVM specific. Given that I am enabling it for PV; I will change the
names to
>>> something more generic. This also holds for the mem_access
hypercalls,
>>> which fall under HVM ops and do_hvm_op(). My plan is to make them a
>>> memory op or a domctl.
>>>
>>> You cannot remove the hvmops.  That would break the hypervisor ABI.
>>>
>>> You can certainly introduce new (more generic) hypercalls,
implement the
>>> hvmop ones in terms of the new ones and mark the hvmop ones as
>>> deprecated in the documentation.
>> Sorry, I should have been more explicit in the above paragraph. I was
>planning on doing exactly what you have said. I will be adding a new
hypercall
>interface for the PV guests; we can then use that for HVM also and keep the
>old hvm_op hypercall interface as an alias.
>> I would do something similar on the tool stack side. Create
>xc_domain_*_access() or xc_*_access() and make them wrappers  that call
>xc_hvm_*_access() or vice-versa. Then move the functions to xc_domain.c or
>xc_mem_access.c. This way I am hoping the existing libxc APIs will still
work.
>>
>> Thanks,
>> Aravindh
>
>Ah ok - that looks sensible overall.
>
>~Andrew
>>>
>>>> 3. A new shadow option will be added called PG_mem_access. This
mode
>is
>>> basic shadow mode with the addition of a table that will track the
access
>>> permissions of each page in the guest.
>>>> mem_access_tracker[gfmn] = access_type If there is a place
where I can
>>>> stash this in an existing structure, please point me at it.
>>>> This will be enabled using xc_shadow_control() before
attempting to
>enable
>>> mem_access on a PV guest.
>>>> 4. xc_mem_access_enable/disable(): Change the flow to allow
>mem_access
>>> for PV guests running with PG_mem_access shadow mode.
>>>> 5. xc_domain_set_access_required(): No change required
>>>>
>>>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if
the start
>>> pfn/gmfn is ~0ull, it takes it as a request to set default access.
Here we will
>call
>>> shadow_blow_tables() after recording the default access type for
the
>>> domain. In the mode where it is setting mem_access type for
individual
>>> gmfns, we will call a function that will drop the shadow for that
individual
>>> gmfn. I am not sure which function to call. Will
>>> sh_remove_all_mappings(gmfn) do the trick? Please advise.
>>>> The other issue here is that in the HVM case we could use
>>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range
gfn
>to
>>> gfn+nr would be set. This won''t be possible in the PV case
as we are
>actually
>>> dealing with mfns and mfn to mfn+nr need not belong to the same
guest.
>But
>>> given that setting *all* page access permissions are done
implicitly when
>>> setting default access, I think we can live with setting page
permissions one
>at
>>> a time as they are faulted in.
>>>> 7. xc_(hvm)_get_mem_access(): This will return the access type
for gmfn
>> >from the mem_access_tracker table.
>>>> 8. In sh_page_fault() perform access checks similar to
>>> ept_handle_violation() / hvm_hap_nested_page_fault().
>>>> 9. Hook in to _sh_propagate() and set up the L1 entries based
on access
>>> permissions. This will be similar to ept_p2m_type_to_flags(). I
think I might
>>> also have to hook in to the code that emulates page table writes to
ensure
>>> access permissions are honored there too.
>>>> Please give feedback on the above.
>>>>
>>>> Thanks,
>>>> Aravindh
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel

Tim Deegan

2013-Nov-26 10:01 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

Hi,

At 07:49 +0000 on 25 Nov (1385362167), Aravindh Puthiyaparambil (aravindp)
wrote:> The mem_access APIs only work with HVM guests that run on Intel hardware
with EPT support. This effort is to enable it for PV guests that run with shadow
page tables. To facilitate this, the following will be done:
> 
> 1. A magic page will be created for the mem_access (mem_event) ring
> buffer during the PV domain creation. 
As Andrew pointed out, you might have to be careful about this -- if
the page is owned by the domain itself, and it can find out (or guess)
its MFN, it can map and write to it.  You might need to allocate an
anonymous page for this?
> 2. Most of the mem_event / mem_access functions and variable name
> are HVM specific. Given that I am enabling it for PV; I will change
> the names to something more generic. This also holds for the
> mem_access hypercalls, which fall under HVM ops and do_hvm_op(). My
> plan is to make them a memory op or a domctl.
Sure.
> 3. A new shadow option will be added called PG_mem_access. This mode
> is basic shadow mode with the addition of a table that will track
> the access permissions of each page in the guest.
> mem_access_tracker[gfmn] = access_type If there is a place where I
> can stash this in an existing structure, please point me at it.
My suggestion was that you should make another implementation of the
p2m.h interface, which is already called in all the right places.  You
might want to borrow the tree-building code from the existing
p2m-pt.c, though there''s no reason why your table should be structured
as a pagetable.  The important detail is that you should be using
memory from the shadow pool to hold this datastructure.
> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the
> start pfn/gmfn is ~0ull, it takes it as a request to set default
> access. Here we will call shadow_blow_tables() after recording the
> default access type for the domain. In the mode where it is setting
> mem_access type for individual gmfns, we will call a function that
> will drop the shadow for that individual gmfn. I am not sure which
> function to call. Will sh_remove_all_mappings(gmfn) do the trick?
Yes, sh_remove_all_mappings() is the one you want.
> The other issue here is that in the HVM case we could use
> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn
> to gfn+nr would be set. This won''t be possible in the PV case as
we
> are actually dealing with mfns and mfn to mfn+nr need not belong to
> the same guest. But given that setting *all* page access permissions
> are done implicitly when setting default access, I think we can live
> with setting page permissions one at a time as they are faulted in.
Seems OK to me.
> 8. In sh_page_fault() perform access checks similar to
> ept_handle_violation() / hvm_hap_nested_page_fault().
Yep.
> 9. Hook in to _sh_propagate() and set up the L1 entries based on
> access permissions. This will be similar to
> ept_p2m_type_to_flags(). I think I might also have to hook in to the
> code that emulates page table writes to ensure access permissions
> are honored there too.
I guess you might; again, the p2m interface will help here, and
probably the exisitng tidy-up code in emulate_gva_to_mfn will be the
place to hook.

Cheers,

Tim.

Aravindh Puthiyaparambil (aravindp)

2013-Nov-26 18:19 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

>> The mem_access APIs only work with HVM guests that run on Intel
>hardware with EPT support. This effort is to enable it for PV guests that
run
>with shadow page tables. To facilitate this, the following will be done:
>>
>> 1. A magic page will be created for the mem_access (mem_event) ring
>> buffer during the PV domain creation.
>
>As Andrew pointed out, you might have to be careful about this -- if the
page
>is owned by the domain itself, and it can find out (or guess) its MFN, it
can
>map and write to it.  You might need to allocate an anonymous page for this?
Do you mean allocate an anonymous page in dom0 and use that? Won''t we
run in to the problem Andres was mentioning a while back?
http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc
Or were you meaning something else?

I was planning on doing exactly what we do in the mem_access listener for HVM
guests. The magic page is mapped in and then removed from physmap of the guest.
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333

From my reading of xc_domain_decrease_reservation_exact(), I think it will also
work for PV guests. Or am I missing something here?
>> 2. Most of the mem_event / mem_access functions and variable name are
>> HVM specific. Given that I am enabling it for PV; I will change the
>> names to something more generic. This also holds for the mem_access
>> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to
>> make them a memory op or a domctl.
>
>Sure.
>
>> 3. A new shadow option will be added called PG_mem_access. This mode
>> is basic shadow mode with the addition of a table that will track the
>> access permissions of each page in the guest.
>> mem_access_tracker[gfmn] = access_type If there is a place where I can
>> stash this in an existing structure, please point me at it.
>
>My suggestion was that you should make another implementation of the
>p2m.h interface, which is already called in all the right places.  You might
want
>to borrow the tree-building code from the existing p2m-pt.c, though
there''s
>no reason why your table should be structured as a pagetable.  The important
>detail is that you should be using memory from the shadow pool to hold this
>datastructure.
OK, I will go down the path. I agree that my table needn''t be
structured as a pagetable. The other thing I was thinking about is stashing the
access information in the per mfn page_info structures. Or is that memory
overhead too much of an overkill?
>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the start
>> pfn/gmfn is ~0ull, it takes it as a request to set default access.
>> Here we will call shadow_blow_tables() after recording the default
>> access type for the domain. In the mode where it is setting mem_access
>> type for individual gmfns, we will call a function that will drop the
>> shadow for that individual gmfn. I am not sure which function to call.
>> Will sh_remove_all_mappings(gmfn) do the trick?
>
>Yes, sh_remove_all_mappings() is the one you want.
>
>> The other issue here is that in the HVM case we could use
>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range gfn
>> to gfn+nr would be set. This won''t be possible in the PV case
as we
>> are actually dealing with mfns and mfn to mfn+nr need not belong to
>> the same guest. But given that setting *all* page access permissions
>> are done implicitly when setting default access, I think we can live
>> with setting page permissions one at a time as they are faulted in.
>
>Seems OK to me.
>
>> 8. In sh_page_fault() perform access checks similar to
>> ept_handle_violation() / hvm_hap_nested_page_fault().
>
>Yep.
>
>> 9. Hook in to _sh_propagate() and set up the L1 entries based on
>> access permissions. This will be similar to ept_p2m_type_to_flags(). I
>> think I might also have to hook in to the code that emulates page
>> table writes to ensure access permissions are honored there too.
>
>I guess you might; again, the p2m interface will help here, and probably the
>exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook.
Thanks so much for the feedback.
Aravindh

Andres Lagar-Cavilla

2013-Nov-26 18:41 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

>>> The mem_access APIs only work with HVM guests that run on Intel
>> hardware with EPT support. This effort is to enable it for PV guests
that run
>> with shadow page tables. To facilitate this, the following will be
done:
>>> 
>>> 1. A magic page will be created for the mem_access (mem_event) ring
>>> buffer during the PV domain creation.
>> 
>> As Andrew pointed out, you might have to be careful about this -- if
the page
>> is owned by the domain itself, and it can find out (or guess) its MFN,
it can
>> map and write to it.  You might need to allocate an anonymous page for
this?
> 
> Do you mean allocate an anonymous page in dom0 and use that? Won''t
we run in to the problem Andres was mentioning a while back?
> http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc
> Or were you meaning something else?
> 
> I was planning on doing exactly what we do in the mem_access listener for
HVM guests. The magic page is mapped in and then removed from physmap of the
guest.
>
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-access/xen-access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l333
Once the page is removed from the physmap, an hvm guest has no way of indexing
that page and thus mapping it -- even though it''s a page that belongs
to it, and that it''s threaded on its list of pages owned.

WIth PV, you have an additional means of indexing, which is the raw MFN. The PV
guest will be able to get at the page because it owns it, if it knows the MFN.
No PFN/GFN required. This is how, for example, things like the grant table are
mapped in classic PV domains.

I don''t know how realistic is the concern about the domain guessing the
MFN for the page. But if it can, and it maps it and mucks with the ring, the
thing to evaluate is: can the guest throw dom0/host into a tailspin? The answer
is likely "no", because guests can''t reasonably do this with
other rings they have access to, like PV driver backends. But a flaw on the
consumer side of mem events could yield a vector for DoS.

If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there is
no way for the PV domain to map it.
> 
>> From my reading of xc_domain_decrease_reservation_exact(), I think it
will also work for PV guests. Or am I missing something here?
> 
>>> 2. Most of the mem_event / mem_access functions and variable name
are
>>> HVM specific. Given that I am enabling it for PV; I will change the
>>> names to something more generic. This also holds for the mem_access
>>> hypercalls, which fall under HVM ops and do_hvm_op(). My plan is to
>>> make them a memory op or a domctl.
>> 
>> Sure.
>> 
>>> 3. A new shadow option will be added called PG_mem_access. This
mode
>>> is basic shadow mode with the addition of a table that will track
the
>>> access permissions of each page in the guest.
>>> mem_access_tracker[gfmn] = access_type If there is a place where I
can
>>> stash this in an existing structure, please point me at it.
>> 
>> My suggestion was that you should make another implementation of the
>> p2m.h interface, which is already called in all the right places.  You
might want
>> to borrow the tree-building code from the existing p2m-pt.c, though
there''s
>> no reason why your table should be structured as a pagetable.  The
important
>> detail is that you should be using memory from the shadow pool to hold
this
>> datastructure.
> 
> OK, I will go down the path. I agree that my table needn''t be
structured as a pagetable. The other thing I was thinking about is stashing the
access information in the per mfn page_info structures. Or is that memory
overhead too much of an overkill?
Well, the page/MFN could conceivably be mapped by many domains. There are ample
bits to play with in the type flag, for example. But as long as you
don''t care about mem_event on pages shared across two or more PV
domains, then that should be fine. I wouldn''t blame you if you
didn''t care :)

OTOH, all you need is a byte per pfn, and the great thing is that in PV domains,
the physmap is bounded and continuous. Unlike HVM and its PCI holes, etc, which
demand the sparse tree structure. So you can allocate an easily indexable array,
notwithstanding super page concerns (I think/hope).

Andres> 
>>> 6. xc_(hvm)_set_mem_access(): This API has two modes, one if the
start
>>> pfn/gmfn is ~0ull, it takes it as a request to set default access.
>>> Here we will call shadow_blow_tables() after recording the default
>>> access type for the domain. In the mode where it is setting
mem_access
>>> type for individual gmfns, we will call a function that will drop
the
>>> shadow for that individual gmfn. I am not sure which function to
call.
>>> Will sh_remove_all_mappings(gmfn) do the trick?
>> 
>> Yes, sh_remove_all_mappings() is the one you want.
>> 
>>> The other issue here is that in the HVM case we could use
>>> xc_hvm_set_mem_access(gfn, nr) and the permissions for the range
gfn
>>> to gfn+nr would be set. This won''t be possible in the PV
case as we
>>> are actually dealing with mfns and mfn to mfn+nr need not belong to
>>> the same guest. But given that setting *all* page access
permissions
>>> are done implicitly when setting default access, I think we can
live
>>> with setting page permissions one at a time as they are faulted in.
>> 
>> Seems OK to me.
>> 
>>> 8. In sh_page_fault() perform access checks similar to
>>> ept_handle_violation() / hvm_hap_nested_page_fault().
>> 
>> Yep.
>> 
>>> 9. Hook in to _sh_propagate() and set up the L1 entries based on
>>> access permissions. This will be similar to
ept_p2m_type_to_flags(). I
>>> think I might also have to hook in to the code that emulates page
>>> table writes to ensure access permissions are honored there too.
>> 
>> I guess you might; again, the p2m interface will help here, and
probably the
>> exisitng tidy-up code in emulate_gva_to_mfn will be the place to hook.
> 
> Thanks so much for the feedback.
> Aravindh

Aravindh Puthiyaparambil (aravindp)

2013-Nov-26 19:46 UTC

head link

Re: [RFC] Overview of work required to implement mem_access for PV guests

>>>> The mem_access APIs only work with HVM guests that run on Intel
>>> hardware with EPT support. This effort is to enable it for PV
guests
>>> that run with shadow page tables. To facilitate this, the following
will be
>done:
>>>>
>>>> 1. A magic page will be created for the mem_access (mem_event)
ring
>>>> buffer during the PV domain creation.
>>>
>>> As Andrew pointed out, you might have to be careful about this --
if
>>> the page is owned by the domain itself, and it can find out (or
>>> guess) its MFN, it can map and write to it.  You might need to
allocate an
>anonymous page for this?
>>
>> Do you mean allocate an anonymous page in dom0 and use that?
Won''t we
>run in to the problem Andres was mentioning a while back?
>> http://xen.markmail.org/thread/kbrz7vo3oyrvgsnc
>> Or were you meaning something else?
>>
>> I was planning on doing exactly what we do in the mem_access listener
for
>HVM guests. The magic page is mapped in and then removed from physmap
>of the guest.
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/xen-acce
>> ss/xen-
>access.c;h=b00c05aa4890ee694e8101b77cca582fff420c7b;hb=HEAD#l33
>> 3
>
>Once the page is removed from the physmap, an hvm guest has no way of
>indexing that page and thus mapping it -- even though it''s a page
that belongs
>to it, and that it''s threaded on its list of pages owned.
>
>WIth PV, you have an additional means of indexing, which is the raw MFN.
>The PV guest will be able to get at the page because it owns it, if it knows
the
>MFN. No PFN/GFN required. This is how, for example, things like the grant
>table are mapped in classic PV domains.
>
>I don''t know how realistic is the concern about the domain guessing
the MFN
>for the page. But if it can, and it maps it and mucks with the ring, the
thing to
>evaluate is: can the guest throw dom0/host into a tailspin? The answer is
likely
>"no", because guests can''t reasonably do this with other
rings they have
>access to, like PV driver backends. But a flaw on the consumer side of mem
>events could yield a vector for DoS.
>
>If, instead, the page is a xen-owned page (alloc_xenheap_pages), then there
>is no way for the PV domain to map it.
Thanks so much for the explanation. I will use to alloc_xenheap_pages. 
>>
>>> From my reading of xc_domain_decrease_reservation_exact(), I think
it
>will also work for PV guests. Or am I missing something here?
>>
>>>> 2. Most of the mem_event / mem_access functions and variable
name
>>>> are HVM specific. Given that I am enabling it for PV; I will
change
>>>> the names to something more generic. This also holds for the
>>>> mem_access hypercalls, which fall under HVM ops and
do_hvm_op(). My
>>>> plan is to make them a memory op or a domctl.
>>>
>>> Sure.
>>>
>>>> 3. A new shadow option will be added called PG_mem_access. This
mode
>>>> is basic shadow mode with the addition of a table that will
track
>>>> the access permissions of each page in the guest.
>>>> mem_access_tracker[gfmn] = access_type If there is a place
where I
>>>> can stash this in an existing structure, please point me at it.
>>>
>>> My suggestion was that you should make another implementation of
the
>>> p2m.h interface, which is already called in all the right places.
>>> You might want to borrow the tree-building code from the existing
>>> p2m-pt.c, though there''s no reason why your table should
be
>>> structured as a pagetable.  The important detail is that you should
>>> be using memory from the shadow pool to hold this datastructure.
>>
>> OK, I will go down the path. I agree that my table needn''t be
structured as a
>pagetable. The other thing I was thinking about is stashing the access
>information in the per mfn page_info structures. Or is that memory overhead
>too much of an overkill?
>
>Well, the page/MFN could conceivably be mapped by many domains. There
>are ample bits to play with in the type flag, for example. But as long as
you
>don''t care about mem_event on pages shared across two or more PV
>domains, then that should be fine. I wouldn''t blame you if you
didn''t care :)
Yup, I don''t care :-)
>OTOH, all you need is a byte per pfn, and the great thing is that in PV
domains,
>the physmap is bounded and continuous. Unlike HVM and its PCI holes, etc,
>which demand the sparse tree structure. So you can allocate an easily
>indexable array, notwithstanding super page concerns (I think/hope).
I did not realize that the physmap is bounded and contiguous. I will go with an
indexable array.

Thanks,
Aravindh

Xen devel - Nov 2013 - [RFC] Overview of work required to implement mem_access for PV guests

[RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests

Re: [RFC] Overview of work required to implement mem_access for PV guests