Hello, I have a dom0 userspace application that receives mem_events. Mem_events are being received if a page fault occurs, and until I clear the page access rights I keep receiving the event in a loop. If I do clear the page access rights, I will no longer receive mem_events for said page. What I thought I''d do was to add a new flag to the mem_event response (MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c: mem_event_get_response(d, &rsp); if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE ) { struct hvm_emulate_ctxt ctx[1] = {}; struct vcpu *current_vcpu = current; set_current(d->vcpu[rsp.vcpu_id]); hvm_emulate_prepare(ctx, guest_cpu_user_regs()); hvm_emulate_one(ctx); set_current(current_vcpu); } The code is supposed to go past the write instruction (without lifting the page access restrictions). What it does seem to achieve is this: (XEN) ----[ Xen-4.1.2 x86_64 debug=n Not tainted ]---- (XEN) CPU: 6 (XEN) RIP: e008:[<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10 (XEN) RFLAGS: 0000000000010203 CONTEXT: hypervisor (XEN) rax: 0000000000004824 rbx: ffff83013c0c7ba0 rcx: 0000000000000008 (XEN) rdx: 0000000000000005 rsi: ffff83013c0c7f18 rdi: ffff8300bfca8000 (XEN) rbp: ffff83013c0c7f18 rsp: ffff83013c0c7b50 r8: 0000000000000002 (XEN) r9: 0000000000000002 r10: ffff82c48020af40 r11: 0000000000000282 (XEN) r12: ffff8300bfff2000 r13: ffff88012b478b18 r14: 00007fffd669c4c0 (XEN) r15: ffff83013c0c7e48 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 000000005d6c4000 cr2: 000000000221e538 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83013c0c7b50: (XEN) ffff82c4801a1a91 ffff83013f986000 ffff83013f986000 ffff83013c0c7f18 (XEN) ffff82c4801ce0e1 0000000500050000 000000000003f31a 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffff83013f986000 ffff8300bfff2000 ffff83013c0c7e48 (XEN) ffff82c4801d1d81 ffff8300bfcac000 ffff82c4801d05c5 ffff83013c0c7f18 (XEN) ffff82c4801a1447 ffff8300bfcac000 0000000000d7e004 0000000000d7e004 (XEN) ffff83013c0c7e48 ffff88012b478b18 00007fffd669c4c0 ffff83013c0c7e48 (XEN) ffff82c48014eb79 0000000000000000 000000000005d6f9 ffff82f600badf20 (XEN) 0000000000000000 4000000000000000 ffff82f600badf20 0000000000000000 (XEN) ffff88012fc0b928 0000000000000001 ffff82c48016bc4b ffff82f600badf20 (XEN) ffff82c48016c0b8 ffff83013c0ac000 ffff83013c0ac000 ffff82f600bb1940 (XEN) 000000000000000f ffff83013c0c7f18 ffff83013c0ac000 ffff82f600bb1940 (XEN) fffffffffffffff3 0000000000d7e004 ffff83013c0c7e48 ffff88012b478b18 (XEN) Xen call trace: (XEN) [<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10 (XEN) [<ffff82c4801a1a91>] hvm_emulate_prepare+0x31/0x80 (XEN) [<ffff82c4801ce0e1>] p2m_mem_access_resume+0xe1/0x120 (XEN) [<ffff82c4801d1d81>] mem_access_domctl+0x21/0x30 (XEN) [<ffff82c4801d05c5>] mem_event_domctl+0x295/0x3b0 (XEN) [<ffff82c4801a1447>] hvmemul_do_pio+0x27/0x30 (XEN) [<ffff82c48014eb79>] arch_do_domctl+0x2e9/0x28a0 (XEN) [<ffff82c48016bc4b>] get_page_type+0xb/0x20 (XEN) [<ffff82c48016c0b8>] get_page_and_type_from_pagenr+0x78/0xe0 (XEN) [<ffff82c4801025bb>] do_domctl+0xfb/0x10b0 (XEN) [<ffff82c4801f2fa6>] ept_get_entry+0x136/0x250 (XEN) [<ffff82c480180965>] copy_to_user+0x25/0x70 (XEN) [<ffff82c4801f8778>] syscall_enter+0x88/0x8d (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 6: (XEN) FATAL TRAP: vector = 6 (invalid opcode) (XEN) **************************************** I could find no documentation on either the hvm_*(), or the cpu-related functions. Obviously the hvm_emulate_prepare() call crashes the hypervisor, most likely because of the guest_cpu_user_regs() parameter, but "regs" is not being passed to p2m_mem_access_resume() (like it is being passed to p2m_mem_access_check()). I would appreciate your help in figuring out how to implement this. Thanks, and happy holidays, Razvan Cojocaru
On 28/12/2012 14:34, Razvan Cojocaru wrote:> Hello, > > I have a dom0 userspace application that receives mem_events. Mem_events > are being received if a page fault occurs, and until I clear the page > access rights I keep receiving the event in a loop. If I do clear the > page access rights, I will no longer receive mem_events for said page. > > What I thought I''d do was to add a new flag to the mem_event response > (MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in > p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c: > > mem_event_get_response(d, &rsp); > > if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE ) > { > struct hvm_emulate_ctxt ctx[1] = {}; > struct vcpu *current_vcpu = current; > > set_current(d->vcpu[rsp.vcpu_id]);Not that I can help you with your problem specifically, but set_current() here ...> > hvm_emulate_prepare(ctx, guest_cpu_user_regs()); > hvm_emulate_one(ctx); > > set_current(current_vcpu);and here are absolutely wrong and will cause bad things to happen. (As demonstrated by the crash below) set_current() is only for use with scheduling, and sets which vcpu is "current" on this pcpu. As the code currently stands, there is a thundering great race condition where this particular vcpu might be current on 2 pcpus at once. Other than above, which will certainly break the scheduling code, "current" is used everywhere in the Xen code, so your call to hvm_emulate_prepare is using the real "current" vcpus registers, with information from the wrong "current" cpu, including cs and ss segment registers, which is then going to be interpreted incorrectly as they will being used in the wrong vcms/gdt. By this point, bets are certainly on that stuff will break. Can you describe exactly what behaviour you are attempting to achieve with this? It seems to me that you are wanting to step a paused HVM vcpu on by one instruction based off a hypercall from dom0 ? ~Andrew> } > > The code is supposed to go past the write instruction (without lifting > the page access restrictions). What it does seem to achieve is this: > > (XEN) ----[ Xen-4.1.2 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 6 > (XEN) RIP: e008:[<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10 > (XEN) RFLAGS: 0000000000010203 CONTEXT: hypervisor > (XEN) rax: 0000000000004824 rbx: ffff83013c0c7ba0 rcx: 0000000000000008 > (XEN) rdx: 0000000000000005 rsi: ffff83013c0c7f18 rdi: ffff8300bfca8000 > (XEN) rbp: ffff83013c0c7f18 rsp: ffff83013c0c7b50 r8: 0000000000000002 > (XEN) r9: 0000000000000002 r10: ffff82c48020af40 r11: 0000000000000282 > (XEN) r12: ffff8300bfff2000 r13: ffff88012b478b18 r14: 00007fffd669c4c0 > (XEN) r15: ffff83013c0c7e48 cr0: 0000000080050033 cr4: 00000000000026f0 > (XEN) cr3: 000000005d6c4000 cr2: 000000000221e538 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83013c0c7b50: > (XEN) ffff82c4801a1a91 ffff83013f986000 ffff83013f986000 ffff83013c0c7f18 > (XEN) ffff82c4801ce0e1 0000000500050000 000000000003f31a 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff83013f986000 ffff8300bfff2000 ffff83013c0c7e48 > (XEN) ffff82c4801d1d81 ffff8300bfcac000 ffff82c4801d05c5 ffff83013c0c7f18 > (XEN) ffff82c4801a1447 ffff8300bfcac000 0000000000d7e004 0000000000d7e004 > (XEN) ffff83013c0c7e48 ffff88012b478b18 00007fffd669c4c0 ffff83013c0c7e48 > (XEN) ffff82c48014eb79 0000000000000000 000000000005d6f9 ffff82f600badf20 > (XEN) 0000000000000000 4000000000000000 ffff82f600badf20 0000000000000000 > (XEN) ffff88012fc0b928 0000000000000001 ffff82c48016bc4b ffff82f600badf20 > (XEN) ffff82c48016c0b8 ffff83013c0ac000 ffff83013c0ac000 ffff82f600bb1940 > (XEN) 000000000000000f ffff83013c0c7f18 ffff83013c0ac000 ffff82f600bb1940 > (XEN) fffffffffffffff3 0000000000d7e004 ffff83013c0c7e48 ffff88012b478b18 > (XEN) Xen call trace: > (XEN) [<ffff82c4801bf4ea>] vmx_get_interrupt_shadow+0xa/0x10 > (XEN) [<ffff82c4801a1a91>] hvm_emulate_prepare+0x31/0x80 > (XEN) [<ffff82c4801ce0e1>] p2m_mem_access_resume+0xe1/0x120 > (XEN) [<ffff82c4801d1d81>] mem_access_domctl+0x21/0x30 > (XEN) [<ffff82c4801d05c5>] mem_event_domctl+0x295/0x3b0 > (XEN) [<ffff82c4801a1447>] hvmemul_do_pio+0x27/0x30 > (XEN) [<ffff82c48014eb79>] arch_do_domctl+0x2e9/0x28a0 > (XEN) [<ffff82c48016bc4b>] get_page_type+0xb/0x20 > (XEN) [<ffff82c48016c0b8>] get_page_and_type_from_pagenr+0x78/0xe0 > (XEN) [<ffff82c4801025bb>] do_domctl+0xfb/0x10b0 > (XEN) [<ffff82c4801f2fa6>] ept_get_entry+0x136/0x250 > (XEN) [<ffff82c480180965>] copy_to_user+0x25/0x70 > (XEN) [<ffff82c4801f8778>] syscall_enter+0x88/0x8d > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 6: > (XEN) FATAL TRAP: vector = 6 (invalid opcode) > (XEN) **************************************** > > I could find no documentation on either the hvm_*(), or the cpu-related > functions. Obviously the hvm_emulate_prepare() call crashes the > hypervisor, most likely because of the guest_cpu_user_regs() parameter, > but "regs" is not being passed to p2m_mem_access_resume() (like it is > being passed to p2m_mem_access_check()). I would appreciate your help in > figuring out how to implement this. > > Thanks, and happy holidays, > Razvan Cojocaru > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Hello, thanks for the reply!> Not that I can help you with your problem specifically, but > set_current() here ... > >> >> hvm_emulate_prepare(ctx, guest_cpu_user_regs()); >> hvm_emulate_one(ctx); >> >> set_current(current_vcpu); > > and here are absolutely wrong and will cause bad things to happen. (As > demonstrated by the crash below)Right.> "current" is used everywhere in the Xen code, so your call to > hvm_emulate_prepare is using the real "current" vcpus registers, with > information from the wrong "current" cpu, including cs and ss segment > registers, which is then going to be interpreted incorrectly as they > will being used in the wrong vcms/gdt.I see, that''s what I was trying to avoid with the set_current() call - I had hoped that it would tell guest_cpu_user_regs() what vcpu to use. That was my only hope, as in the context of p2m_mem_access_resume() I don''t have the "struct cpu_user_regs *regs" parameter that I have access to in p2m_mem_access_check().> Can you describe exactly what behaviour you are attempting to achieve > with this? It seems to me that you are wanting to step a paused HVM > vcpu on by one instruction based off a hypercall from dom0 ?That''s basically it, yes. In the hypervisor, tell dom0 that a mem_event happened (a write attempt happened on a rx page), and let dom0 decide if the write should happen or not (without dom0 setting the page to rwx and losing future events on that same page). If dom0 decides that the write should go ahead, it should signal this with a special flag in the response it puts in the mem_event ring buffer, and the hypervisor should then step the paused vcpu by one instruction (the write instruction). This does work if I step in p2m_mem_access_check() (where I have access to the "regs" parameter), before putting the mem_event request in the ring buffer (and without any set_current() funny business), but that''s not acceptable behaviour because then dom0 gets notified _after_ the write, and it''s important for the notification to occur before the write (so that dom0 could stop the write from happening if it needs to). Thanks, Razvan Cojocaru
Hi, At 16:34 +0200 on 28 Dec (1356712445), Razvan Cojocaru wrote:> Hello, > > I have a dom0 userspace application that receives mem_events. Mem_events > are being received if a page fault occurs, and until I clear the page > access rights I keep receiving the event in a loop. If I do clear the > page access rights, I will no longer receive mem_events for said page. > > What I thought I''d do was to add a new flag to the mem_event response > (MEM_EVENT_FLAG_EMULATE_WRITE), and have this code execute in > p2m_mem_access_resume() in xen/arch/x86/mm/p2m.c: > > mem_event_get_response(d, &rsp); > > if ( rsp.flags & MEM_EVENT_FLAG_EMULATE_WRITE ) > { > struct hvm_emulate_ctxt ctx[1] = {}; > struct vcpu *current_vcpu = current; > > set_current(d->vcpu[rsp.vcpu_id]);This won''t work -- as Andrew pointed out, set_current() can only happen safely as part of a proper context switch. If you want to cause the vcpu to single-step, I think it''s better to follow the existing debugger code, which marks the vcpu for single-stepping and the schedules it as usual. How about (from your user-space tool): - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping. - respond to the mem_event that you''re handling, causing the vcpu to be unpaused. Then when the vcpu is scheduled, it will single-step in its own context, and you''ll get another mem_event (assuming you''ve set HVM_PARAM_MEMORY_EVENT_SINGLE_STEP). Once that happens: - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping. - respond to that event to unpause the vcpu. I guess if you''re tyring to have some special case in the single-step handler that allows it to write to a page that it normally coudn''t you might need to add an interface for controlling that. Cheers, Tim.
> Hi,Hello Tim, thank you for your answer!> How about (from your user-space tool): > - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping. > - respond to the mem_event that you''re handling, causing the vcpu to be > unpaused. > Then when the vcpu is scheduled, it will single-step in its own context, > and you''ll get another mem_event (assuming you''ve set > HVM_PARAM_MEMORY_EVENT_SINGLE_STEP). Once that happens: > - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping. > - respond to that event to unpause the vcpu. > > I guess if you''re tyring to have some special case in the single-step > handler that allows it to write to a page that it normally coudn''t you > might need to add an interface for controlling that.Thanks, looks like something like that is the only way this would theoretically work. The problem is, for each allowed (emulated) write - which is the ''normal'' case - there would be 3 dom0 <-> hypervisor roundtrips (2 fault mem_events and 1 single step mem_event). Since writes that need to be allowed do happen quite a lot, the domU would become very slow. I was very much hoping to be able to do this with only one (page fault) mem_event per emulated write instruction. Thanks, Razvan Cojocaru
At 16:10 +0200 on 10 Jan (1357834233), Razvan Cojocaru wrote:> >Hi, > > Hello Tim, thank you for your answer! > > >How about (from your user-space tool): > > - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON to enable single-stepping. > > - respond to the mem_event that you''re handling, causing the vcpu to be > > unpaused. > >Then when the vcpu is scheduled, it will single-step in its own context, > >and you''ll get another mem_event (assuming you''ve set > >HVM_PARAM_MEMORY_EVENT_SINGLE_STEP). Once that happens: > > - use XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF to disable single-stepping. > > - respond to that event to unpause the vcpu. > > > >I guess if you''re tyring to have some special case in the single-step > >handler that allows it to write to a page that it normally coudn''t you > >might need to add an interface for controlling that. > > Thanks, looks like something like that is the only way this would > theoretically work. The problem is, for each allowed (emulated) write - > which is the ''normal'' case - there would be 3 dom0 <-> hypervisor > roundtrips (2 fault mem_events and 1 single step mem_event). Since > writes that need to be allowed do happen quite a lot, the domU would > become very slow. > > I was very much hoping to be able to do this with only one (page fault) > mem_event per emulated write instruction.I''m sure that can be done. The trick is to make sure the emulation happens in the guest context (i.e. when the guest is scheduled). You could do that by (e.g.) defining a new mem_access type ''single-step writes'' where a write fault triggers a single-step emulation in the fault handler as well as an asynchronous mem-event. Tim.
>> I was very much hoping to be able to do this with only one (page fault) >> mem_event per emulated write instruction. > > I''m sure that can be done. The trick is to make sure the emulation > happens in the guest context (i.e. when the guest is scheduled). You > could do that by (e.g.) defining a new mem_access type ''single-step > writes'' where a write fault triggers a single-step emulation in the > fault handler as well as an asynchronous mem-event.That''s what I''m doing now (albeit with the plain MEM_EVENT_REASON_VIOLATION) - I''m emulating the write in p2m_mem_access_check(), where I''m in the guest context, just before putting the mem_event in the ring buffer. The problem is, I don''t want to do that. :) I want to stop certain writes _before_ they happen, and emulating the write instruction there first performs the write, and then notifies dom0 userspace about it. The ideal sequence would be: 1. notify userspace about a would-be write, 2. get the reply from userspace, 3. only write if userspace said OK. The point is that I don''t know if the write should be allowed to happen or not until userspace replies. Thanks, Razvan Cojocaru
At 16:31 +0200 on 10 Jan (1357835505), Razvan Cojocaru wrote:> >>I was very much hoping to be able to do this with only one (page fault) > >>mem_event per emulated write instruction. > > > >I''m sure that can be done. The trick is to make sure the emulation > >happens in the guest context (i.e. when the guest is scheduled). You > >could do that by (e.g.) defining a new mem_access type ''single-step > >writes'' where a write fault triggers a single-step emulation in the > >fault handler as well as an asynchronous mem-event. > > That''s what I''m doing now (albeit with the plain > MEM_EVENT_REASON_VIOLATION) - I''m emulating the write in > p2m_mem_access_check(), where I''m in the guest context, just before > putting the mem_event in the ring buffer. > > The problem is, I don''t want to do that. :) I want to stop certain > writes _before_ they happen, and emulating the write instruction there > first performs the write, and then notifies dom0 userspace about it. > The ideal sequence would be: 1. notify userspace about a would-be write, > 2. get the reply from userspace, 3. only write if userspace said OK. The > point is that I don''t know if the write should be allowed to happen or > not until userspace replies.In that case, in your resume handler you could siet a flag in the vcpu struct to say it should be single-stepped once the next time it''s scheduled, and make your call to hvm_emulate_one() from the vmentry path somewhere. Presumably you''d also want to save some state saying exactly which access permissions the vcpu should be ignoring during the emulation. VMX already has similar code for the cases where it has to emulate real-mode instructions; it sets arch.hvm_vcpu.u.vmx.vmx_emulate and detects it in vmx/entry.S. You could do something siilar for other cases. But anything you add to the vmenter path would have to be extremely careful not to add any overhead to the normal case. Tim.