Dear all, I don''t understand why 64-bit mini-OS doesn''t check against nested Xen events (in entry.S). In 32-bit code, a critical section is defined and re-entrant is checked. A fix-up routine is executed to coalesce the stack frames if that''s the case, because there is a window for nested events to happen after re-enabling event delivery and before a direct iret. In 64-bit mini-OS, however, a critical section is defined but not used. My understanding is that ideally, one could a) unmask event and do an direct iret, but check against nested events, try to fix stack frames if that happens; or b) do not re-enable event and use hypercall iret to return (no need to fix-up stack frames). 64-bit mini-OS seems to adopt a mixed use of both (in HYPERVISOR_IRET). mini-OS doesn''t have an userspace, so unless an NMI happened, it always perform interrupt/exception return with machine instruction iret, without checking against nested events. This is wrong to me. Am I missing something here? Please advise. Thank you, Xu -- xu :q!
Xu Zhang, le Tue 23 Oct 2012 17:43:28 -0500, a écrit :> 64-bit mini-OS seems to adopt a mixed use of both (in HYPERVISOR_IRET). > mini-OS doesn''t have an userspace, so unless an NMI happened, it always > perform interrupt/exception return with machine instruction iret, without > checking against nested events. This is wrong to me. Am I missing something > here?I don''t think you are missing anything. Cc-ing Grzegorz, who actually wrote the code. Samuel
Hello, Xu Zhang, le Mon 05 Nov 2012 23:56:04 -0600, a écrit :> I haven''t seen any updates on this matter, so I try to come up with a fix.Thanks!> Generally speaking, I want to mimic 32-bit mini-OS behaviour,Ok. Probably a good thing :)> The "git diff" output is attached to this email. I only did some naive > "tests" by running it inside gdb.Do you mean that you actually ran a stack merging, even if only in gdb? I would trust that well enough.> I am wondering is there a test suite for mini-OS?You can simply run it, giving it a blk dev (or also a fb dev), it will run app_main(), which keeps reading stuff, and draws squares on the fb. Ideally you should manage to trigger fixups, by stuffing a wait loop in the critical section.> A follow-up question is that given this fix, do you still need > hypercall iret?Actually it''s worse than that, see below.> + movq %rbx,5*8(%rsp)...> + > + # check against re-entrance > + movq RIP(%rsp),%rbx > + cmpq $scrit,%rbx > + jb 10f > + cmpq $ecrit,%rbx > + jb critical_region_fixup > + > +10: movq RBX(%rsp),%rbx # restore rbxDo we really need to restore %rbx? I don''t think do_hypervisor_callback uses it.> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32)Here we would also need a fixup table for the code at hypercall_page! A nicer fix would be to inline the hypercall code here. That said, I have to say I don''t know any detail about the NMI flag, I don''t see it defined in the Intel manual, and I don''t see it being set in the Xen hypervisor. Unless somebody knows more about it, I would assume that it currently never happens, and simply stuff a ud2 /* TODO: fix re-entrance fixup for hypercall in NMI case (when does that happen actually?) */ before the jmp, to catch it if it ever does happen. Also, it would be good to check against critical section size change, in case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. For instance, stuff right after the table: .if (ecrit-scrit) != (critical_fixup_table_end - critical_fixup_table) .error "The critical has changed, the fixup table needs updating" .endif More generally, some cleanup could go along the patch, but I''d say keep it as you have done, focused on only the fixup code, to have it as such in the repository history, and then we could clean some things afterwards. Samuel
Samuel, Thanks for the reply. On 11/10/2012 07:52 AM, Samuel Thibault wrote:> You can simply run it, giving it a blk dev (or also a fb dev), it will > run app_main(), which keeps reading stuff, and draws squares on the fb. > > Ideally you should manage to trigger fixups, by stuffing a wait loop in > the critical section.I''ll try this in the following week. Thank you for pointing it out.>> A follow-up question is that given this fix, do you still need >> hypercall iret? > Actually it''s worse than that, see below. > >> + movq %rbx,5*8(%rsp) > ... >> + >> + # check against re-entrance >> + movq RIP(%rsp),%rbx >> + cmpq $scrit,%rbx >> + jb 10f >> + cmpq $ecrit,%rbx >> + jb critical_region_fixup >> + >> +10: movq RBX(%rsp),%rbx # restore rbx > Do we really need to restore %rbx? I don''t think do_hypervisor_callback > uses it.No I don''t think so. I was just being pre-cautious. :-)>> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) > Here we would also need a fixup table for the code at hypercall_page!Nice catch!> A nicer fix would be to inline the hypercall code here. That said, I > have to say I don''t know any detail about the NMI flag, I don''t see it > defined in the Intel manual, and I don''t see it being set in the Xen > hypervisor. Unless somebody knows more about it, I would assume that it > currently never happens, and simply stuff a > > ud2 /* TODO: fix re-entrance fixup for hypercall in NMI case (when does that happen actually?) */ > > before the jmp, to catch it if it ever does happen.I don''t know that either. I tried to rise guest domain an NMI with "xl trigger", but it seems that only works for HVM hosts.> > Also, it would be good to check against critical section size change, in > case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. > For instance, stuff right after the table: > > .if (ecrit-scrit) != (critical_fixup_table_end - critical_fixup_table) > .error "The critical has changed, the fixup table needs updating" > .endifTotally agree. And it somewhat saves the obligation of checking if the table size matches the one of critical section by looking at disassemble output. :-)> > More generally, some cleanup could go along the patch, but I''d say > keep it as you have done, focused on only the fixup code, to have it > as such in the repository history, and then we could clean some things > afterwards. > > Samuel >I really appreciate your feedback. Thanks again, Samuel! Xu
>>> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp >>> hypercall_page + (__HYPERVISOR_iret * 32) >> Here we would also need a fixup table for the code at hypercall_page! > Nice catch! > >> A nicer fix would be to inline the hypercall code here.After a second thought, it seems to me such "expansion" of hypercall_page into fixup table is not necessary (and wrong). Insead, we can and need to mask out events before we doing a hypercall iret. I think it is safe (and must, see below) to do this because: 1. if event is disabled: doesn''t hurt to mask it again; 2. if event is enabled: we disable event, and jumps to hypercall_page to make a hypercall iret, which eventually calls do_iret: In do_iret, line 309: /* Restore upcall mask from supplied EFLAGS.IF. */ vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF); No matter what the current value of upcall mask is, do_iret always retores it to the negation of rflags.IF saved on stack. In other words, it is safe to mask upcall mask before hypercall iret. And in fact, I am afraid it''s necessary that we make it a must, because even one can guard against nested events in guest OS, but can never do so on hypercall page (because the code is in Xen). We should never jump to hypercall_page iret with events enabled. So we have to set upcall mask before the jump. I guess that''s one of the reason why Xen always restores upcall mask from stack.>> Also, it would be good to check against critical section size change, in >> case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. >> For instance, stuff right after the table: >> >> .if (ecrit-scrit) != (critical_fixup_table_end - >> critical_fixup_table) >> .error "The critical has changed, the fixup table needs updating" >> .endif > Totally agree. And it somewhat saves the obligation of checking if the > table size matches the one of critical section by looking at > disassemble output. :-) > >Correct me if I am wrong, I think hypercall_page is mapped at runtime to guest OS by Xen. It''s not actually part of the critical section of guest OS, at least not at compile time. So having a fixup table for the code at hypercall_page would cause such check to fail (mismatch). Following the discussion above, we could easily avoid such fixup table by mask out the events. I don''t know if this make sense at all. :-)
Xu Zhang, le Tue 13 Nov 2012 19:49:27 -0600, a écrit :> 1. if event is disabled: doesn''t hurt to mask it again; > 2. if event is enabled: we disable event, and jumps to hypercall_page to > make a hypercall iret, which eventually calls do_iret: > > In do_iret, line 309: > /* Restore upcall mask from supplied EFLAGS.IF. */ > vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & > X86_EFLAGS_IF);Ah, right. Disabling events just before the jmp seems all right to me then.> Correct me if I am wrong, I think hypercall_page is mapped at runtime to > guest OS by Xen. It''s not actually part of the critical section of guest OS, > at least not at compile time.Sure. I meant it''d mean a second fixup table, but who knows what code is there, it could be tampering with the stack.> Following the discussion above, we could easily avoid such fixup table > by mask out the events.Completely. Samuel
On Sat, 2012-11-10 at 13:52 +0000, Samuel Thibault wrote:> > + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) > > Here we would also need a fixup table for the code at hypercall_page! > A nicer fix would be to inline the hypercall code here. That said, I > have to say I don''t know any detail about the NMI flag, I don''t see it > defined in the Intel manual, and I don''t see it being set in the Xen > hypervisor.It''s a "software defined" bit (i.e. stolen out of eflags :-() used in the classic-Xen Linux guest to indicate that return must go via HYPERVISOR_iret (rather than direct iret), I think in order to let the hypervisor implement the correct semantics wrt NMI masking. The hypervisor itself doesn''t actually know anything about the bit and the guest could implement this some other way if it wanted (doesn''t 64 bit minios *always* use HYPERVISOR_iret?).> Unless somebody knows more about it, I would assume that it > currently never happens,That''s probably a pretty good assumption, you would expect NMIs only in dom0 (real NMIs getting forwarded) or in a guest which you explicitly sent an NMI to via the toolstack, which I doubt you would do to an unsuspecting mini-os domain (not in the expectation of anything good happening at least). Ian.
On Sun, 2012-11-18 at 17:43 +0000, Samuel Thibault wrote:> > Correct me if I am wrong, I think hypercall_page is mapped at runtime to > > guest OS by Xen. It''s not actually part of the critical section of guest OS, > > at least not at compile time. > > Sure. I meant it''d mean a second fixup table, but who knows what code is > there, it could be tampering with the stack.In practice it isn''t. Since the hypercall page is a blackbox to the guest I don''t think it is really reasonable for it to be messing with the stack without makings its own arrangements for correctness of nesting etc. Ian.