For my research, I need to run a SMP hvm guest in log-dirty mode and
after the first log-dirty fault, instead of making the page r/w, I need to log
the next 128 reads & writes by the vcpus. After logging this many
accesses, I set the page
to rw as is the case with usual log-dirty mode. Basically, the page
access changes to
p2m_access_n after the first log-dirty fault and is then reverted to
p2m->default_access
after 128 accesses.
Log-dirty allows me to log only one write access. In order to log
multiple read/write accesses,
I resorted to *emulating* the instructions that cause the page fault.
(I guess I could also play around with the trap flags & single
stepping the guest, but thats a last resort).
My initial attempts to do this with shadow paging proved to be too
painful and cumbersome.
So I switched to HAP, and am using an Intel Xeon machine with EPT support.
I have a 32bit debian guest with 2.6.24 kernel and a 64-bit 2.6.32
pvops dom0, & xen-unstable.
I can see that the vmx_vmexit_handler does some emulation for select
operations (e.g., msr)
So, I assume that when the code faults with EXIT_REASON_EPT_VIOLATION
and jumps into
hvm.c:hvm_hap_nested_page_fault(), it is either due to MMIO/PoD/LogDirty
Is it right to assume that when the EPT_VIOLATION fault occurs, the
instruction in question
intends to do only simple reads/writes to the page? No MSRs, rdtscs,
cr3 switches, etc,
as they are caught and emulated in the vmexit handler
I added the following code in
xen/arch/x86/hvm/hvm.c:hvm_hap_nested_page_fault(),
where the majority of log-dirty bits get set,
/* Spurious fault? PoD and log-dirty also take this path. */
if ( p2m_is_ram(p2mt) )
{
if ((p2ma != p2m_access_rx2rw) && (p2mt & p2m_ram_logdirty)
&& access_valid && (mfn_x(mfn) != INVALID_MFN) &&
!access_x)
{
if (pg->emulation_count >127)
{
emul_end:
/* Set page as r/w in the EPT.
Give rwx access to page since earlier access was
no-access (hack)
*/
p2m_change_type(v->domain, gfn, p2m_ram_logdirty,
p2m_ram_rw);
paging_mark_dirty(mfn)
}
else
{
struct hvm_emulate_ctxt ctxt;
struct cpu_user_regs = get_cpu_user_regs();
int rc;
/* Emulate */
hvm_emulate_prepare(&ctxt, regs);
rc = hvm_emulate_one(&ctxt);
hvm_emulate_writeback(&ctxt);
/* If emulation failed, give the page read/write access
and dont tinker with it again. */
if (rc != X86EMUL_OKAY) goto emul_end;
/* revoke all access to the page, so that we trap on next access.
* the function below is exactly same as
p2m_change_type(), except that it takes the
* access type also as a parameter, instead of setting the
access to p2m->default_access.
*/
p2m_change_type_access(v->domain, gfn, p2m_ram_logdirty,
p2m_ram_logdirty, p2m_access_n);
/* set bit in vcpu''s log-dirty bitmap */
vcpu_mark_dirty(v, mfn);
pg->emul_count++;
/* Seems to make no difference - with/without this call */
ept_sync_domain(v->domain);
}
return 1;
}
else
{
paging_mark_dirty(v->domain, mfn_x(mfn));
p2m_change_type(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
return 1;
}
}
When I enable the log dirty mode, I see a bunch of emulation failures
with exception code
X86EMUL_UNHANDLEABLE and then a vm-entry failure saying invalid guest state.
(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest
state (0).
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x000000008005003b, shadow=0x000000008005003b,
gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x00000000000026d0, shadow=0x0000000000000690,
gh_mask=ffffffffffffffff
(XEN) CR3: actual=0x0000000037c90000, target_count=0
(XEN) target0=0000000000000000, target1=0000000000000000
(XEN) target2=0000000000000000, target3=0000000000000000
(XEN) RSP = 0x00000000f7c7ff84 (0x00000000f7c7ff84) RIP 0x00000000c03123e3
(0x00000000c03123e3)
(XEN) RFLAGS=0x0000000000000086 (0x0000000000000086) DR7 = 0x0000000000000400
(XEN) Sysenter RSP=00000000c1fb1300 CS:RIP=0060:00000000c0104330
(XEN) CS: sel=0x0060, attr=0x0c09b, limit=0xffffffff, base=0x0000000000000000
(XEN) DS: sel=0x007b, attr=0x0c0f3, limit=0xffffffff, base=0x0000000000000000
(XEN) SS: sel=0x0068, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
(XEN) ES: sel=0x007b, attr=0x0c0f3, limit=0xffffffff, base=0x0000000000000000
(XEN) FS: sel=0x00d8, attr=0x08093, limit=0xffffffff, base=0x0000000001b63000
(XEN) GS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
(XEN) GDTR: limit=0x000000ff, base=0x00000000c1fac000
(XEN) LDTR: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
(XEN) IDTR: limit=0x000007ff, base=0x00000000c03f8000
(XEN) TR: sel=0x0080, attr=0x0008b, limit=0x00002073, base=0x00000000c1faf100
(XEN) Guest PAT = 0x0007040600070406
(XEN) TSC Offset = fffffe96f2a9e5c0
(XEN) DebugCtl=0000000000000000 DebugExceptions=0000000000000000
(XEN) Interruptibility=0000 ActivityState=0000
(XEN) *** Host State ***
(XEN) RSP = 0xffff83082636ff90 RIP = 0xffff82c4801d0d40
(XEN) CS=e008 DS=0000 ES=0000 FS=0000 GS=0000 SS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff8308263f5b00
(XEN) GDTBase=ffff830826359000 IDTBase=ffff830826365000
(XEN) CR0=000000008005003b CR3=000000083f7f0000 CR4=00000000000026f0
(XEN) Sysenter RSP=ffff83082636ffc0 CS:RIP=e008:ffff82c480218670
(XEN) Host PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=0000003f CPUBased=b6a065fa SecondaryExec=0000006b
(XEN) EntryControls=000051ff ExitControls=000fefff
(XEN) ExceptionBitmap=00040040
(XEN) VMEntry: intr_info=800000ef errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
(XEN) reason=80000021 qualification=00000000
(XEN) IDTVectoring: info=800000ef errcode=00000000
(XEN) TPR Threshold = 0x00
(XEN) EPT pointer = 0x000000083f7fe01e
(XEN) Virtual processor ID = 0x0042
(XEN) **************************************
(XEN) domain_crash called from vmx.c:2161
(XEN) Domain 1 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-4.2-unstable-crew x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: 0060:[<00000000c03123e3>]
(XEN) RFLAGS: 0000000000000086 CONTEXT: hvm guest
(XEN) rax: 00000000f7c0de80 rbx: 00000000f7c0de8c rcx: 00000000f7cba700
(XEN) rdx: 00000000f7c0de80 rsi: 00000000f7c0de80 rdi: 00000000f7c0de80
(XEN) rbp: 00000000f7c0de84 rsp: 00000000f7c7ff84 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 0000000000000690
(XEN) cr3: 0000000037c90000 cr2: 00000000b7f69dcc
(XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 cs: 0060
Any pointers on how to resolve this issue?
Shriram