thr3ads.net - Xen devel - hvm_emulate

If this information is useful, please help other people find it:
Share via:

Razvan Cojocaru

2012-Dec-28 14:34 UTC

hvm_emulate_one() usage

Hello,

I have a dom0 userspace application that receives mem_events. Mem_events 
are being received if a page fault occurs, and until I clear the page 
access rights I keep receiving the event in a loop. If I do clear the 
page access rights, I will no longer receive mem_events for said page.

What I thought I''d do was to add a new flag to the mem_event response 
(MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in 
p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c:

mem_event_get_response(d, &rsp);

if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE )
{
     struct hvm_emulate_ctxt ctx[1] = {};
     struct vcpu *current_vcpu = current;

     set_current(d->vcpu[rsp.vcpu_id]);

     hvm_emulate_prepare(ctx, guest_cpu_user_regs());
     hvm_emulate_one(ctx);

     set_current(current_vcpu);
}

The code is supposed to go past the write instruction (without lifting 
the page access restrictions). What it does seem to achieve is this:

(XEN) ----[ Xen-4.1.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    6
(XEN) RIP:    e008:[<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10
(XEN) RFLAGS: 0000000000010203   CONTEXT: hypervisor
(XEN) rax: 0000000000004824   rbx: ffff83013c0c7ba0   rcx: 0000000000000008
(XEN) rdx: 0000000000000005   rsi: ffff83013c0c7f18   rdi: ffff8300bfca8000
(XEN) rbp: ffff83013c0c7f18   rsp: ffff83013c0c7b50   r8:  0000000000000002
(XEN) r9:  0000000000000002   r10: ffff82c48020af40   r11: 0000000000000282
(XEN) r12: ffff8300bfff2000   r13: ffff88012b478b18   r14: 00007fffd669c4c0
(XEN) r15: ffff83013c0c7e48   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 000000005d6c4000   cr2: 000000000221e538
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83013c0c7b50:
(XEN)    ffff82c4801a1a91 ffff83013f986000 ffff83013f986000 ffff83013c0c7f18
(XEN)    ffff82c4801ce0e1 0000000500050000 000000000003f31a 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff83013f986000 ffff8300bfff2000 ffff83013c0c7e48
(XEN)    ffff82c4801d1d81 ffff8300bfcac000 ffff82c4801d05c5 ffff83013c0c7f18
(XEN)    ffff82c4801a1447 ffff8300bfcac000 0000000000d7e004 0000000000d7e004
(XEN)    ffff83013c0c7e48 ffff88012b478b18 00007fffd669c4c0 ffff83013c0c7e48
(XEN)    ffff82c48014eb79 0000000000000000 000000000005d6f9 ffff82f600badf20
(XEN)    0000000000000000 4000000000000000 ffff82f600badf20 0000000000000000
(XEN)    ffff88012fc0b928 0000000000000001 ffff82c48016bc4b ffff82f600badf20
(XEN)    ffff82c48016c0b8 ffff83013c0ac000 ffff83013c0ac000 ffff82f600bb1940
(XEN)    000000000000000f ffff83013c0c7f18 ffff83013c0ac000 ffff82f600bb1940
(XEN)    fffffffffffffff3 0000000000d7e004 ffff83013c0c7e48 ffff88012b478b18
(XEN) Xen call trace:
(XEN)    [<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10
(XEN)    [<ffff82c4801a1a91>] hvm_emulate_prepare+0x31/0x80
(XEN)    [<ffff82c4801ce0e1>] p2m_mem_access_resume+0xe1/0x120
(XEN)    [<ffff82c4801d1d81>] mem_access_domctl+0x21/0x30
(XEN)    [<ffff82c4801d05c5>] mem_event_domctl+0x295/0x3b0
(XEN)    [<ffff82c4801a1447>] hvmemul_do_pio+0x27/0x30
(XEN)    [<ffff82c48014eb79>] arch_do_domctl+0x2e9/0x28a0
(XEN)    [<ffff82c48016bc4b>] get_page_type+0xb/0x20
(XEN)    [<ffff82c48016c0b8>] get_page_and_type_from_pagenr+0x78/0xe0
(XEN)    [<ffff82c4801025bb>] do_domctl+0xfb/0x10b0
(XEN)    [<ffff82c4801f2fa6>] ept_get_entry+0x136/0x250
(XEN)    [<ffff82c480180965>] copy_to_user+0x25/0x70
(XEN)    [<ffff82c4801f8778>] syscall_enter+0x88/0x8d
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 6:
(XEN) FATAL TRAP: vector = 6 (invalid opcode)
(XEN) ****************************************

I could find no documentation on either the hvm_*(), or the cpu-related 
functions. Obviously the hvm_emulate_prepare() call crashes the 
hypervisor, most likely because of the guest_cpu_user_regs() parameter, 
but "regs" is not being passed to p2m_mem_access_resume() (like it is 
being passed to p2m_mem_access_check()). I would appreciate your help in 
figuring out how to implement this.

Thanks, and happy holidays,
Razvan Cojocaru

Andrew Cooper

2012-Dec-28 22:27 UTC

head link

Re: hvm_emulate_one() usage

On 28/12/2012 14:34, Razvan Cojocaru wrote:> Hello,
>
> I have a dom0 userspace application that receives mem_events. Mem_events 
> are being received if a page fault occurs, and until I clear the page 
> access rights I keep receiving the event in a loop. If I do clear the 
> page access rights, I will no longer receive mem_events for said page.
>
> What I thought I''d do was to add a new flag to the mem_event
response
> (MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in 
> p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c:
>
> mem_event_get_response(d, &rsp);
>
> if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE )
> {
>      struct hvm_emulate_ctxt ctx[1] = {};
>      struct vcpu *current_vcpu = current;
>
>      set_current(d->vcpu[rsp.vcpu_id]);
Not that I can help you with your problem specifically, but
set_current() here ...
>
>      hvm_emulate_prepare(ctx, guest_cpu_user_regs());
>      hvm_emulate_one(ctx);
>
>      set_current(current_vcpu);
and here are absolutely wrong and will cause bad things to happen. (As
demonstrated by the crash below)

set_current() is only for use with scheduling, and sets which vcpu is
"current" on this pcpu.  As the code currently stands, there is a
thundering great race condition where this particular vcpu might be
current on 2 pcpus at once.

Other than above, which will certainly break the scheduling code,
"current" is used everywhere in the Xen code, so your call to
hvm_emulate_prepare is using the real "current" vcpus registers, with
information from the wrong "current" cpu, including cs and ss segment
registers, which is then going to be interpreted incorrectly as they
will being used in the wrong vcms/gdt.

By this point, bets are certainly on that stuff will break.


Can you describe exactly what behaviour you are attempting to achieve
with this?  It seems to me that you are wanting to step a paused HVM
vcpu on by one instruction based off a hypercall from dom0 ?

~Andrew
> }
>
> The code is supposed to go past the write instruction (without lifting 
> the page access restrictions). What it does seem to achieve is this:
>
> (XEN) ----[ Xen-4.1.2  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    6
> (XEN) RIP:    e008:[<ffff82c4801bf4ea>]
vmx_get_interrupt_shadow+0xa/0x10
> (XEN) RFLAGS: 0000000000010203   CONTEXT: hypervisor
> (XEN) rax: 0000000000004824   rbx: ffff83013c0c7ba0   rcx: 0000000000000008
> (XEN) rdx: 0000000000000005   rsi: ffff83013c0c7f18   rdi: ffff8300bfca8000
> (XEN) rbp: ffff83013c0c7f18   rsp: ffff83013c0c7b50   r8:  0000000000000002
> (XEN) r9:  0000000000000002   r10: ffff82c48020af40   r11: 0000000000000282
> (XEN) r12: ffff8300bfff2000   r13: ffff88012b478b18   r14: 00007fffd669c4c0
> (XEN) r15: ffff83013c0c7e48   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 000000005d6c4000   cr2: 000000000221e538
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83013c0c7b50:
> (XEN)    ffff82c4801a1a91 ffff83013f986000 ffff83013f986000
ffff83013c0c7f18
> (XEN)    ffff82c4801ce0e1 0000000500050000 000000000003f31a
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 ffff83013f986000 ffff8300bfff2000
ffff83013c0c7e48
> (XEN)    ffff82c4801d1d81 ffff8300bfcac000 ffff82c4801d05c5
ffff83013c0c7f18
> (XEN)    ffff82c4801a1447 ffff8300bfcac000 0000000000d7e004
0000000000d7e004
> (XEN)    ffff83013c0c7e48 ffff88012b478b18 00007fffd669c4c0
ffff83013c0c7e48
> (XEN)    ffff82c48014eb79 0000000000000000 000000000005d6f9
ffff82f600badf20
> (XEN)    0000000000000000 4000000000000000 ffff82f600badf20
0000000000000000
> (XEN)    ffff88012fc0b928 0000000000000001 ffff82c48016bc4b
ffff82f600badf20
> (XEN)    ffff82c48016c0b8 ffff83013c0ac000 ffff83013c0ac000
ffff82f600bb1940
> (XEN)    000000000000000f ffff83013c0c7f18 ffff83013c0ac000
ffff82f600bb1940
> (XEN)    fffffffffffffff3 0000000000d7e004 ffff83013c0c7e48
ffff88012b478b18
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10
> (XEN)    [<ffff82c4801a1a91>] hvm_emulate_prepare+0x31/0x80
> (XEN)    [<ffff82c4801ce0e1>] p2m_mem_access_resume+0xe1/0x120
> (XEN)    [<ffff82c4801d1d81>] mem_access_domctl+0x21/0x30
> (XEN)    [<ffff82c4801d05c5>] mem_event_domctl+0x295/0x3b0
> (XEN)    [<ffff82c4801a1447>] hvmemul_do_pio+0x27/0x30
> (XEN)    [<ffff82c48014eb79>] arch_do_domctl+0x2e9/0x28a0
> (XEN)    [<ffff82c48016bc4b>] get_page_type+0xb/0x20
> (XEN)    [<ffff82c48016c0b8>] get_page_and_type_from_pagenr+0x78/0xe0
> (XEN)    [<ffff82c4801025bb>] do_domctl+0xfb/0x10b0
> (XEN)    [<ffff82c4801f2fa6>] ept_get_entry+0x136/0x250
> (XEN)    [<ffff82c480180965>] copy_to_user+0x25/0x70
> (XEN)    [<ffff82c4801f8778>] syscall_enter+0x88/0x8d
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 6:
> (XEN) FATAL TRAP: vector = 6 (invalid opcode)
> (XEN) ****************************************
>
> I could find no documentation on either the hvm_*(), or the cpu-related 
> functions. Obviously the hvm_emulate_prepare() call crashes the 
> hypervisor, most likely because of the guest_cpu_user_regs() parameter, 
> but "regs" is not being passed to p2m_mem_access_resume() (like
it is
> being passed to p2m_mem_access_check()). I would appreciate your help in 
> figuring out how to implement this.
>
> Thanks, and happy holidays,
> Razvan Cojocaru
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Razvan Cojocaru

2012-Dec-28 23:29 UTC

head link

Re: hvm_emulate_one() usage

Hello, thanks for the reply!
> Not that I can help you with your problem specifically, but
> set_current() here ...
> 
>>
>>      hvm_emulate_prepare(ctx, guest_cpu_user_regs());
>>      hvm_emulate_one(ctx);
>>
>>      set_current(current_vcpu);
> 
> and here are absolutely wrong and will cause bad things to happen. (As
> demonstrated by the crash below)
Right.
> "current" is used everywhere in the Xen code, so your call to
> hvm_emulate_prepare is using the real "current" vcpus registers,
with
> information from the wrong "current" cpu, including cs and ss
segment
> registers, which is then going to be interpreted incorrectly as they
> will being used in the wrong vcms/gdt.
I see, that''s what I was trying to avoid with the set_current() call -
I
had hoped that it would tell guest_cpu_user_regs() what vcpu to use.

That was my only hope, as in the context of p2m_mem_access_resume() I
don''t have the "struct cpu_user_regs *regs" parameter that I
have access
to in p2m_mem_access_check().
> Can you describe exactly what behaviour you are attempting to achieve
> with this?  It seems to me that you are wanting to step a paused HVM
> vcpu on by one instruction based off a hypercall from dom0 ?
That''s basically it, yes. In the hypervisor, tell dom0 that a mem_event
happened (a write attempt happened on a rx page), and let dom0 decide if
the write should happen or not (without dom0 setting the page to rwx and
losing future events on that same page). If dom0 decides that the write
should go ahead, it should signal this with a special flag in the
response it puts in the mem_event ring buffer, and the hypervisor should
then step the paused vcpu by one instruction (the write instruction).

This does work if I step in p2m_mem_access_check() (where I have access
to the "regs" parameter), before putting the mem_event request in the
ring buffer (and without any set_current() funny business), but that''s
not acceptable behaviour because then dom0 gets notified _after_ the
write, and it''s important for the notification to occur before the
write
(so that dom0 could stop the write from happening if it needs to).

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-10 13:16 UTC

head link

Re: hvm_emulate_one() usage

Hi, 

At 16:34 +0200 on 28 Dec (1356712445), Razvan Cojocaru
wrote:> Hello,
> 
> I have a dom0 userspace application that receives mem_events. Mem_events 
> are being received if a page fault occurs, and until I clear the page 
> access rights I keep receiving the event in a loop. If I do clear the 
> page access rights, I will no longer receive mem_events for said page.
> 
> What I thought I''d do was to add a new flag to the mem_event
response
> (MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in 
> p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c:
> 
> mem_event_get_response(d, &rsp);
> 
> if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE )
> {
>     struct hvm_emulate_ctxt ctx[1] = {};
>     struct vcpu *current_vcpu = current;
> 
>     set_current(d->vcpu[rsp.vcpu_id]);
This won''t work -- as Andrew pointed out, set_current() can only happen
safely as part of a proper context switch.  If you want to cause the
vcpu to single-step, I think it''s better to follow the existing
debugger
code, which marks the vcpu for single-stepping and the schedules it as
usual.

How about (from your user-space tool):
 - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping.
 - respond to the mem_event that you''re handling, causing the vcpu to
be
   unpaused.
Then when the vcpu is scheduled, it will single-step in its own context,
and you''ll get another mem_event (assuming you''ve set
HVM_PARAM_MEMORY_EVENT_SINGLE_STEP).  Once that happens:
 - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping.
 - respond to that event to unpause the vcpu. 

I guess if you''re tyring to have some special case in the single-step
handler that allows it to write to a page that it normally coudn''t you
might need to add an interface for controlling that.

Cheers,

Tim.

Razvan Cojocaru

2013-Jan-10 14:10 UTC

head link

Re: hvm_emulate_one() usage

> Hi,
Hello Tim, thank you for your answer!
> How about (from your user-space tool):
>   - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping.
>   - respond to the mem_event that you''re handling, causing the
vcpu to be
>     unpaused.
> Then when the vcpu is scheduled, it will single-step in its own context,
> and you''ll get another mem_event (assuming you''ve set
> HVM_PARAM_MEMORY_EVENT_SINGLE_STEP).  Once that happens:
>   - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping.
>   - respond to that event to unpause the vcpu.
>
> I guess if you''re tyring to have some special case in the
single-step
> handler that allows it to write to a page that it normally coudn''t
you
> might need to add an interface for controlling that.
Thanks, looks like something like that is the only way this would 
theoretically work. The problem is, for each allowed (emulated) write - 
which is the ''normal'' case - there would be 3 dom0 <->
hypervisor
roundtrips (2 fault mem_events and 1 single step mem_event). Since 
writes that need to be allowed do happen quite a lot, the domU would 
become very slow.

I was very much hoping to be able to do this with only one (page fault) 
mem_event per emulated write instruction.

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-10 14:23 UTC

head link

Re: hvm_emulate_one() usage

At 16:10 +0200 on 10 Jan (1357834233), Razvan Cojocaru
wrote:> >Hi,
> 
> Hello Tim, thank you for your answer!
> 
> >How about (from your user-space tool):
> >  - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping.
> >  - respond to the mem_event that you''re handling, causing the
vcpu to be
> >    unpaused.
> >Then when the vcpu is scheduled, it will single-step in its own
context,
> >and you''ll get another mem_event (assuming you''ve set
> >HVM_PARAM_MEMORY_EVENT_SINGLE_STEP).  Once that happens:
> >  - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping.
> >  - respond to that event to unpause the vcpu.
> >
> >I guess if you''re tyring to have some special case in the
single-step
> >handler that allows it to write to a page that it normally
coudn''t you
> >might need to add an interface for controlling that.
> 
> Thanks, looks like something like that is the only way this would 
> theoretically work. The problem is, for each allowed (emulated) write - 
> which is the ''normal'' case - there would be 3 dom0
<-> hypervisor
> roundtrips (2 fault mem_events and 1 single step mem_event). Since 
> writes that need to be allowed do happen quite a lot, the domU would 
> become very slow.
> 
> I was very much hoping to be able to do this with only one (page fault) 
> mem_event per emulated write instruction.
I''m sure that can be done.  The trick is to make sure the emulation
happens in the guest context (i.e. when the guest is scheduled).  You
could do that by (e.g.) defining a new mem_access type ''single-step
writes'' where a write fault triggers a single-step emulation in the
fault handler as well as an asynchronous mem-event.

Tim.

Razvan Cojocaru

2013-Jan-10 14:31 UTC

head link

Re: hvm_emulate_one() usage

>> I was very much hoping to be able to do this with only one (page fault)
>> mem_event per emulated write instruction.
>
> I''m sure that can be done.  The trick is to make sure the
emulation
> happens in the guest context (i.e. when the guest is scheduled).  You
> could do that by (e.g.) defining a new mem_access type
''single-step
> writes'' where a write fault triggers a single-step emulation in
the
> fault handler as well as an asynchronous mem-event.
That''s what I''m doing now (albeit with the plain 
MEM_EVENT_REASON_VIOLATION) - I''m emulating the write in 
p2m_mem_access_check(), where I''m in the guest context, just before 
putting the mem_event in the ring buffer.

The problem is, I don''t want to do that. :) I want to stop certain 
writes _before_ they happen, and emulating the write instruction there 
first performs the write, and then notifies dom0 userspace about it.
The ideal sequence would be: 1. notify userspace about a would-be write, 
2. get the reply from userspace, 3. only write if userspace said OK. The 
point is that I don''t know if the write should be allowed to happen or 
not until userspace replies.

Thanks,
Razvan Cojocaru

Tim Deegan

2013-Jan-10 14:58 UTC

head link

Re: hvm_emulate_one() usage

At 16:31 +0200 on 10 Jan (1357835505), Razvan Cojocaru
wrote:> >>I was very much hoping to be able to do this with only one (page
fault)
> >>mem_event per emulated write instruction.
> >
> >I''m sure that can be done.  The trick is to make sure the
emulation
> >happens in the guest context (i.e. when the guest is scheduled).  You
> >could do that by (e.g.) defining a new mem_access type
''single-step
> >writes'' where a write fault triggers a single-step emulation
in the
> >fault handler as well as an asynchronous mem-event.
> 
> That''s what I''m doing now (albeit with the plain 
> MEM_EVENT_REASON_VIOLATION) - I''m emulating the write in 
> p2m_mem_access_check(), where I''m in the guest context, just
before
> putting the mem_event in the ring buffer.
> 
> The problem is, I don''t want to do that. :) I want to stop certain
> writes _before_ they happen, and emulating the write instruction there 
> first performs the write, and then notifies dom0 userspace about it.
> The ideal sequence would be: 1. notify userspace about a would-be write, 
> 2. get the reply from userspace, 3. only write if userspace said OK. The 
> point is that I don''t know if the write should be allowed to
happen or
> not until userspace replies.
In that case, in your resume handler you could siet a flag in the vcpu
struct to say it should be single-stepped once the next time it''s
scheduled, and make your call to hvm_emulate_one() from the vmentry path
somewhere.  Presumably you''d also want to save some state saying
exactly
which access permissions the vcpu should be ignoring during the
emulation.

VMX already has similar code for the cases where it has to emulate
real-mode instructions; it sets arch.hvm_vcpu.u.vmx.vmx_emulate and
detects it in vmx/entry.S.  You could do something siilar for other
cases.  But anything you add to the vmenter path would have to be
extremely careful not to add any overhead to the normal case. 

Tim.

Xen devel - Dec 2012 - hvm_emulate_one() usage

hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage

Re: hvm_emulate_one() usage