thr3ads.net - Xen devel - [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Razvan Cojocaru

2013-Jan-17 14:02 UTC

[PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

This patch makes it possible to emulate an instruction that triggered
a page fault (received via the mem_event API). This is done by setting
the MEM_EVENT_FLAG_EMULATE in mem_event_response_t.flags. The purpose
of this is to be able to receive several distinct page fault mem_events
for the same address, and choose which ones are allowed to go through
from dom0 userspace.

Signed-off-by: Razvan Cojocaru <rzvncj@gmail.com>

diff -r b6195e277da5 -r c5db0882bfcf xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c	Wed Jan 16 14:15:44 2013 +0000
+++ b/xen/arch/x86/mm/p2m.c	Thu Jan 17 16:01:11 2013 +0200
@@ -1309,6 +1309,17 @@ bool_t p2m_mem_access_check(paddr_t gpa,
         }
     }
 
+    if ( v->arch.hvm_vmx.mem_event_emulate )
+    {
+        struct hvm_emulate_ctxt ctx[1] = {};
+
+        v->arch.hvm_vmx.mem_event_emulate = 0;
+        hvm_emulate_prepare(ctx, guest_cpu_user_regs());
+        hvm_emulate_one(ctx);
+
+        return 1;
+    }
+
     *req_ptr = NULL;
     req = xzalloc(mem_event_request_t);
     if ( req )
@@ -1347,8 +1358,15 @@ void p2m_mem_access_resume(struct domain
     /* Pull all responses off the ring */
     while( mem_event_get_response(d, &d->mem_event->access, &rsp)
)
     {
+        d->vcpu[rsp.vcpu_id]->arch.hvm_vmx.mem_event_emulate = 0;
+
         if ( rsp.flags & MEM_EVENT_FLAG_DUMMY )
             continue;
+
+        /* Mark vcpu for skipping one instruction upon rescheduling */
+        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
+            d->vcpu[rsp.vcpu_id]->arch.hvm_vmx.mem_event_emulate = 1;
+
         /* Unpause domain */
         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
             vcpu_unpause(d->vcpu[rsp.vcpu_id]);
diff -r b6195e277da5 -r c5db0882bfcf xen/include/asm-x86/hvm/vmx/vmcs.h
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h	Wed Jan 16 14:15:44 2013 +0000
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h	Thu Jan 17 16:01:11 2013 +0200
@@ -125,6 +125,8 @@ struct arch_vmx_struct {
     /* Remember EFLAGS while in virtual 8086 mode */
     uint32_t             vm86_saved_eflags;
     int                  hostenv_migrated;
+    /* Should we emulate the first instruction on VCPU resume after a
mem_event? */
+    uint8_t              mem_event_emulate;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff -r b6195e277da5 -r c5db0882bfcf xen/include/public/mem_event.h
--- a/xen/include/public/mem_event.h	Wed Jan 16 14:15:44 2013 +0000
+++ b/xen/include/public/mem_event.h	Thu Jan 17 16:01:11 2013 +0200
@@ -36,6 +36,7 @@
 #define MEM_EVENT_FLAG_EVICT_FAIL   (1 << 2)
 #define MEM_EVENT_FLAG_FOREIGN      (1 << 3)
 #define MEM_EVENT_FLAG_DUMMY        (1 << 4)
+#define MEM_EVENT_FLAG_EMULATE      (1 << 5) /* Emulate the instruction
that caused the current mem_event */
 
 /* Reasons for the memory event request */
 #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */

Andres Lagar-Cavilla

2013-Jan-17 15:38 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> This patch makes it possible to emulate an instruction that triggered
> a page fault (received via the mem_event API). This is done by setting
> the MEM_EVENT_FLAG_EMULATE in mem_event_response_t.flags. The purpose
> of this is to be able to receive several distinct page fault mem_events
> for the same address, and choose which ones are allowed to go through
> from dom0 userspace.
> 
> Signed-off-by: Razvan Cojocaru <rzvncj@gmail.com>
> 
> diff -r b6195e277da5 -r c5db0882bfcf xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c	Wed Jan 16 14:15:44 2013 +0000
> +++ b/xen/arch/x86/mm/p2m.c	Thu Jan 17 16:01:11 2013 +0200
> @@ -1309,6 +1309,17 @@ bool_t p2m_mem_access_check(paddr_t gpa,
>         }
>     }
> 
> +    if ( v->arch.hvm_vmx.mem_event_emulate )
Lack of mem event support for AMD processors will be fixed as soon as someone
with interest (and time!) gets going. I don''t recall anything
fundamental standing in the way.

So it''s best if you lift this field into the generic hvm level.
> +    {
> +        struct hvm_emulate_ctxt ctx[1] = {};
> +
> +        v->arch.hvm_vmx.mem_event_emulate = 0;
> +        hvm_emulate_prepare(ctx, guest_cpu_user_regs());Tim''s point is that you won''t get all the mem events, because
the instruction can easily touch multiple pages. It''s a question that
addresses the need for this patch in the first place.

One potential (hairy) fix is to have all get_page_from_gfn check for and emit a
mem event. It''s a bit of rat hole, because we''ll need to pass
the intended permissions down the stack, check against mem event status, etc
etc. It will help extend mem event to catch all hypervisor-based accesses that
currently it mostly can''t, as well as foreign mappings. It''s
certainly not for the faint of heart.
> +        hvm_emulate_one(ctx);
> +
> +        return 1;
> +    }
> +
>     *req_ptr = NULL;
>     req = xzalloc(mem_event_request_t);
>     if ( req )
> @@ -1347,8 +1358,15 @@ void p2m_mem_access_resume(struct domain
>     /* Pull all responses off the ring */
>     while( mem_event_get_response(d, &d->mem_event->access,
&rsp) )
>     {
> +        d->vcpu[rsp.vcpu_id]->arch.hvm_vmx.mem_event_emulate = 0;
> +
>         if ( rsp.flags & MEM_EVENT_FLAG_DUMMY )
>             continue;
> +
> +        /* Mark vcpu for skipping one instruction upon rescheduling */
> +        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
> +            d->vcpu[rsp.vcpu_id]->arch.hvm_vmx.mem_event_emulate =
1;
> +
>         /* Unpause domain */
>         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
>             vcpu_unpause(d->vcpu[rsp.vcpu_id]);
> diff -r b6195e277da5 -r c5db0882bfcf xen/include/asm-x86/hvm/vmx/vmcs.h
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h	Wed Jan 16 14:15:44 2013 +0000
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h	Thu Jan 17 16:01:11 2013 +0200
> @@ -125,6 +125,8 @@ struct arch_vmx_struct {
>     /* Remember EFLAGS while in virtual 8086 mode */
>     uint32_t             vm86_saved_eflags;
>     int                  hostenv_migrated;
> +    /* Should we emulate the first instruction on VCPU resume after a
mem_event? */
> +    uint8_t              mem_event_emulate;
> };
> 
> int vmx_create_vmcs(struct vcpu *v);
> diff -r b6195e277da5 -r c5db0882bfcf xen/include/public/mem_event.h
> --- a/xen/include/public/mem_event.h	Wed Jan 16 14:15:44 2013 +0000
> +++ b/xen/include/public/mem_event.h	Thu Jan 17 16:01:11 2013 +0200
> @@ -36,6 +36,7 @@
> #define MEM_EVENT_FLAG_EVICT_FAIL   (1 << 2)
> #define MEM_EVENT_FLAG_FOREIGN      (1 << 3)
> #define MEM_EVENT_FLAG_DUMMY        (1 << 4)
> +#define MEM_EVENT_FLAG_EMULATE      (1 << 5) /* Emulate the
instruction that caused the current mem_event */Line overflow, better stack the comment on top
Thanks
Andres> 
> /* Reasons for the memory event request */
> #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */
> 
> 
>

Razvan Cojocaru

2013-Jan-17 15:50 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Hello, and thanks for your comments.
>> +    if ( v->arch.hvm_vmx.mem_event_emulate )
>
> Lack of mem event support for AMD processors will be fixed as soon as
someone with interest (and time!) gets going. I don''t recall anything
fundamental standing in the way.
>
> So it''s best if you lift this field into the generic hvm level.
That would make it v->arch.mem_event_emulate, correct?
> Tim''s point is that you won''t get all the mem events,
because the instruction can easily touch multiple pages. It''s a
question that addresses the need for this patch in the first place.
Yes, I did get Tim''s point. Sorry if I haven''t been able to
make that as
clear as I should have in my reply to Tim''s email.
> One potential (hairy) fix is to have all get_page_from_gfn check for and
emit a mem event. It''s a bit of rat hole, because we''ll need
to pass the intended permissions down the stack, check against mem event status,
etc etc. It will help extend mem event to catch all hypervisor-based accesses
that currently it mostly can''t, as well as foreign mappings.
It''s certainly not for the faint of heart.
Fortunately, for the purposes of my patch, the simple solution is enough.
>> +#define MEM_EVENT_FLAG_EMULATE      (1 << 5) /* Emulate the
instruction that caused the current mem_event */
> Line overflow, better stack the comment on top
No problem.

Thank you,
Razvan Cojocaru

Razvan Cojocaru

2013-Jan-21 23:13 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Hello,
> Tim''s point is that you won''t get all the mem events,
because the instruction can easily touch multiple pages. It''s a
question that addresses the need for this patch in the first place.
what would be an example of such an instruction?

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-22 12:31 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

At 01:13 +0200 on 22 Jan (1358817224), Razvan Cojocaru
wrote:> Hello,
> 
> > Tim''s point is that you won''t get all the mem
events, because the instruction can easily touch multiple pages. It''s a
question that addresses the need for this patch in the first place.
> 
> what would be an example of such an instruction?
x86 instructions aren''t aligned, so any instruction that encodes as
more
than one byte can touch two pages just in instruction fetch, if it
overlaps the end of a page.  The same goes for explicit memory operands,
so an instruction like MOVSW that has two memory operands can touch six
pages - two for the instruction fetch and two for each operand. 

Each of those accesses is to a virtual address; in 64-bit mode a TLB
miss can add four more memory accesses to walk the pagetables, so we''re
looking at a worst case of 25 pages of memory that might be
touched by a successful MOVSW (the top-level pagetable only counts once).

But what if the last access caused a page fault?  After up to 24
accesses, the CPU now needs to access the IDT to figure out what to do
with #PF (+4 = 28), and the stack to push an exception (+8 = 36), and if
the OS is using a task gate then it needs the old and new TSSes (+8 44).  And if
the stack faults, we nede to do the whole thing again for
#DF (-1, +12 = 55).  Now that''s a pretty unlikely scenario (and I may
have got some of the details wrong) but the upshot is: a single x86
instruction can access enormous amounts of memory, so turning off
protection and single-stepping, especially if you don''t trust the OS,
is
exposing a lot more than the single frame you took the first fault on.

Cheers,

Tim.

Razvan Cojocaru

2013-Jan-22 12:53 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> #DF (-1, +12 = 55).  Now that''s a pretty unlikely scenario (and I
may
> have got some of the details wrong) but the upshot is: a single x86
> instruction can access enormous amounts of memory, so turning off
> protection and single-stepping, especially if you don''t trust the
OS, is
> exposing a lot more than the single frame you took the first fault on.
Thank you, Tim, for clearing that up. Now, ''touching'' a page
is quite
different from ''writing to'' a page, and I''m really
only interested in
the latter. So, in a scenario where reads are permitted by default and 
we''re only interested in writes, are we still talking about these 
limitations? A MOVSW, for example, only needs to write to a single page, 
even though it does touch more pages in read mode.

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-22 13:20 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

At 14:53 +0200 on 22 Jan (1358866429), Razvan Cojocaru
wrote:> >#DF (-1, +12 = 55).  Now that''s a pretty unlikely scenario
(and I may
> >have got some of the details wrong) but the upshot is: a single x86
> >instruction can access enormous amounts of memory, so turning off
> >protection and single-stepping, especially if you don''t trust
the OS, is
> >exposing a lot more than the single frame you took the first fault on.
> 
> Thank you, Tim, for clearing that up. Now, ''touching'' a
page is quite
> different from ''writing to'' a page, and I''m
really only interested in
> the latter. So, in a scenario where reads are permitted by default and 
> we''re only interested in writes, are we still talking about these 
> limitations? A MOVSW, for example, only needs to write to a single page, 
> even though it does touch more pages in read mode.
Ok, talking only about writes, we have the destination operand, plus all
the pagetables (for setting Accessed bits) plus any stacks and TSSes
needed in delivering faults; something like 32 pages for the full
double-fault scenario.

Tim

Razvan Cojocaru

2013-Jan-22 13:47 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> Ok, talking only about writes, we have the destination operand, plus all
> the pagetables (for setting Accessed bits) plus any stacks and TSSes
> needed in delivering faults; something like 32 pages for the full
> double-fault scenario.
I see, but then, even setting aside Andres'' argument that having all 
possible events sent to userspace is far from trivial, doing so would 
completely cripple the monitored domain speed-wise. Imagine having 
userspace having to decide if it allows a write 32 times per one 
instruction.

Even with its limitations, this patch at least gives us a shot at 
looking at what the domain does with memory - the existing model 
(releasing the page completely to allow the write) is not fit for 
anything like this at all.

So far, between the requirements of reasonable demands on the domU, and 
satisfactory levels of control provided by the received mem_events, at 
least as far as writes go, the patch''s done it''s job quite
nicely.

Thanks,
Razvan Cojocaru

Andres Lagar-Cavilla

2013-Jan-22 14:02 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

On Jan 22, 2013, at 8:47 AM, Razvan Cojocaru <rzvncj@gmail.com> wrote:
>> Ok, talking only about writes, we have the destination operand, plus
all
>> the pagetables (for setting Accessed bits) plus any stacks and TSSes
>> needed in delivering faults; something like 32 pages for the full
>> double-fault scenario.
> 
> I see, but then, even setting aside Andres'' argument that having
all possible events sent to userspace is far from trivial, doing so would
completely cripple the monitored domain speed-wise. Imagine having userspace
having to decide if it allows a write 32 times per one instruction.
Razvan, have you tried the n2rwx mode? It''ll cause an event for the
first access to each page. It''ll automatically patch the access (hence
2 rwx) to allow execution to continue unhindered.

So user-space doesn''t decide anything, but gets informed about
everything (within the constraints of mem event, i.e. no foreign mappings, no
Xen mappings).

If you want to know about every write and decide on every write… you
can''t have your cake and eat it too, right?

Andres> 
> Even with its limitations, this patch at least gives us a shot at looking
at what the domain does with memory - the existing model (releasing the page
completely to allow the write) is not fit for anything like this at all.
> 
> So far, between the requirements of reasonable demands on the domU, and
satisfactory levels of control provided by the received mem_events, at least as
far as writes go, the patch''s done it''s job quite nicely.
> 
> Thanks,
> Razvan Cojocaru

Razvan Cojocaru

2013-Jan-22 14:22 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> Razvan, have you tried the n2rwx mode? It''ll cause an event for
the first access to each page. It''ll automatically patch the access
(hence 2 rwx) to allow execution to continue unhindered.
>
> So user-space doesn''t decide anything, but gets informed about
everything (within the constraints of mem event, i.e. no foreign mappings, no
Xen mappings).
Hello, yes, I did look at that. Unfortunately, my userspace application 
needs to be notified about subsequent writes to the same page, and once 
the page goes to rwx mode there are no further events. So that''s not a 
very useful mechanism.
> If you want to know about every write and decide on every write… you
can''t have your cake and eat it too, right?
Right.

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-22 14:26 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

At 15:47 +0200 on 22 Jan (1358869654), Razvan Cojocaru
wrote:> >Ok, talking only about writes, we have the destination operand, plus
all
> >the pagetables (for setting Accessed bits) plus any stacks and TSSes
> >needed in delivering faults; something like 32 pages for the full
> >double-fault scenario.
> 
> I see, but then, even setting aside Andres'' argument that having
all
> possible events sent to userspace is far from trivial, doing so would 
> completely cripple the monitored domain speed-wise. Imagine having 
> userspace having to decide if it allows a write 32 times per one 
> instruction.
Well, the _common_ case is only to have one write.  Even rarer cases
like needing an Accessed bit or crossing a page will only be two writes.

But sending all the events from userspace doesn''t necessarily solve the
problem.  The guest could be overwriting the instruction stream from
another vcpu, or by DMA from disk, so that by the time you''ve returned
to Xen the instruction you emulate isn''t even the one that faulted.

The only properly safe way to allow exactly one exception to your rules
is to emulate the instruction in user-space.  (Well, that or somehow
move your policy into Xen and do the emulation there, but I''m quite
strongly opposed to that).
> Even with its limitations, this patch at least gives us a shot at 
> looking at what the domain does with memory - the existing model 
> (releasing the page completely to allow the write) is not fit for 
> anything like this at all.
If you''re just using this to gather statistics about how often a page
gets written, you could use sampling; you don''t need to see _every_
write.
> So far, between the requirements of reasonable demands on the domU, and 
> satisfactory levels of control provided by the received mem_events, at 
> least as far as writes go, the patch''s done it''s job
quite nicely.
It might be helpful if you could give us a clear description of exactly
what problem you''re trying to solve.

Tim.

Razvan Cojocaru

2013-Jan-22 14:45 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> The only properly safe way to allow exactly one exception to your rules
> is to emulate the instruction in user-space.  (Well, that or somehow
> move your policy into Xen and do the emulation there, but I''m
quite
> strongly opposed to that).
Is there an example of that somewhere in the Xen source code tree?
> If you''re just using this to gather statistics about how often a
page
> gets written, you could use sampling; you don''t need to see
_every_
> write.
I''m not gathering statistics.
> It might be helpful if you could give us a clear description of exactly
> what problem you''re trying to solve.
I''m watching for suspicious activity on the domU. If any occurs, the 
domU should be paused (at least the VCPU in question). A dom0 userspace 
application should decide what constitutes suspicious activity, with (1) 
the least possible slowing down of the domU, and (2) with as little 
"false positive" writes allowed as possible (ideally zero, if
there''s a
way that doesn''t go against requirement (1)).

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-24 11:05 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

At 16:45 +0200 on 22 Jan (1358873134), Razvan Cojocaru
wrote:> >The only properly safe way to allow exactly one exception to your rules
> >is to emulate the instruction in user-space.  (Well, that or somehow
> >move your policy into Xen and do the emulation there, but I''m
quite
> >strongly opposed to that).
> 
> Is there an example of that somewhere in the Xen source code tree?
I don''t think so. 

It occurs to me that if you''re willing to rely on the Xen x86_emulate()
emulator, the model that we use for emulated MMIO might be better.
There, Xen emulates the instruction directly in the fault handler and
sends individual memory accesses to qemu for emulation.  qemu receives
them as a series of ioreqs (basically, address/size/data tuples).

So you could, for example: 
 - invent up a new p2m type (probably based very closely on p2m_ram_ro,
   maybe you could even just use p2m_ram_ro). 
 - Use the HVMOP_set_mem_type to mark the pages you want readonly.
 - Use Julien Grall''s new ioreq interfaces to register your helper
   as the handler for the pages you care about.

Then your user-space helper will get told about each actual write,
rather than each faulting instruction.  If the write is OK, the helper
will map the target address and do the write.

Have a look at, e.g. 
http://lists.xen.org/archives/html/xen-devel/2012-08/msg01767.html
for Julien''s multiple-ioreq-handlers code;  I''m not sure what
the
current state of that is, except that it doesn''t seem to be checked in
yet. 

Cheers,

Tim.

Razvan Cojocaru

2013-Jan-24 11:34 UTC

head link

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

> Have a look at, e.g.
> http://lists.xen.org/archives/html/xen-devel/2012-08/msg01767.html
> for Julien''s multiple-ioreq-handlers code;  I''m not sure
what the
> current state of that is, except that it doesn''t seem to be
checked in
> yet.
Thank you for your suggestion, I''ll take a look at that.

Thanks,
Razvan Cojocaru

Xen devel - Jan 2013 - [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

[PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault

Re: [PATCH V2] mem_event: Allow emulating an instruction that caused a page fault