John Levon
2007-Oct-15 02:17 UTC
[Xen-devel] 3.1.1 breaks 64-bit backwards compatibility in guest syscall handler
No 64-bit process works under Solaris dom0 and Xen 3.1. The problem is an ABI breakage, though as it''s not documented as far as I can find out, it''s somewhat of a grey area. In 3.0.4, the syscall trampoline did: 325 /* pushq $FLAT_KERNEL_CS32 */ 326 stack[16] = 0x68; 327 *(u32 *)&stack[17] = FLAT_KERNEL_CS32; 328 329 /* jmp syscall_enter */ 330 stack[21] = 0xe9; 331 *(u32 *)&stack[22] = (char *)syscall_enter - &stack[26]; ENTRY(syscall_enter) sti movl $FLAT_KERNEL_SS,24(%rsp) pushq %rcx pushq $0 movl $TRAP_syscall,4(%rsp) SAVE_ALL Thus %rcx and %r11 (the user %rip/%rflags as per syscall insn) were saved in UREGS_rcx/r11, and appeared as such in the guest''s syscall context. We were relying on being able to just pop the stack into %rcx/%r11 and get syscall-like values. This was broken by: changeset: 15095:a06a28ebad5d user: kfraser@localhost.localdomain ... + /* movq $syscall_enter,%r11 */ + stack[21] = 0x49; + stack[22] = 0xbb; + *(void **)&stack[23] = (void *)syscall_enter; + + /* jmpq *%r11 */ + stack[31] = 0x41; + stack[32] = 0xff; + stack[33] = 0xe3; ... In particular it corrupted the user %rflags such that X86_EFLAGS_AC was getting set, and breaking the world. (Actually Solaris still booted, which is quite surprising.) So, why the register indirect jmp? An obvious fix is for us to snarf %rip and %rflags out of the syscall stack instead of using the registers, but this does mean another arbitrary difference between metal and Xen. Then again, the syscall entry point is already radically different, and we''ve probably discovered this early enough for us to just fix all our guests, presuming nobody else was making this mistake. (Yes, it would have been nice if we''d managed to test 3.1.1, but we just didn''t find the resources by the deadline.) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-15 07:30 UTC
Re: [Xen-devel] 3.1.1 breaks 64-bit backwards compatibility in guest syscall handler
As you say, this aspect of the interface was undocumented and unused by Linux, hence it was not picked up in our own testing. However, the change was unintentional and the old behaviour was rather more sensible -- we shouldn''t clobber registers for no good reason. Also the fix is a one-liner which will have no measurable impact on syscall performance. The hassle is that if we want to define rcx/r11 conservation in the ABI then we really need to roll 3.1.2. Which is annoying because it was totally avoidable -- the change has been in xen-unstable for five months, and was part of the original 3.1.1 release candidate a month ago. I''m going to hassle you to arrange regression testing for your guests every time we roll a release from now on! -- Keir On 15/10/07 03:17, "John Levon" <levon@movementarian.org> wrote:> > No 64-bit process works under Solaris dom0 and Xen 3.1. The problem is an ABI > breakage, though as it''s not documented as far as I can find out, it''s > somewhat > of a grey area. > > In 3.0.4, the syscall trampoline did: > > 325 /* pushq $FLAT_KERNEL_CS32 */ > 326 stack[16] = 0x68; > 327 *(u32 *)&stack[17] = FLAT_KERNEL_CS32; > 328 > 329 /* jmp syscall_enter */ > 330 stack[21] = 0xe9; > 331 *(u32 *)&stack[22] = (char *)syscall_enter - &stack[26]; > > ENTRY(syscall_enter) > sti > movl $FLAT_KERNEL_SS,24(%rsp) > pushq %rcx > pushq $0 > movl $TRAP_syscall,4(%rsp) > SAVE_ALL > > Thus %rcx and %r11 (the user %rip/%rflags as per syscall insn) were saved in > UREGS_rcx/r11, and appeared as such in the guest''s syscall context. We were > relying on being able to just pop the stack into %rcx/%r11 and get > syscall-like > values. This was broken by: > > changeset: 15095:a06a28ebad5d > user: kfraser@localhost.localdomain > > ... > + /* movq $syscall_enter,%r11 */ > + stack[21] = 0x49; > + stack[22] = 0xbb; > + *(void **)&stack[23] = (void *)syscall_enter; > + > + /* jmpq *%r11 */ > + stack[31] = 0x41; > + stack[32] = 0xff; > + stack[33] = 0xe3; > ... > > In particular it corrupted the user %rflags such that X86_EFLAGS_AC was > getting > set, and breaking the world. (Actually Solaris still booted, which is quite > surprising.) > > So, why the register indirect jmp? > > An obvious fix is for us to snarf %rip and %rflags out of the syscall stack > instead of using the registers, but this does mean another arbitrary > difference > between metal and Xen. Then again, the syscall entry point is already > radically > different, and we''ve probably discovered this early enough for us to just fix > all our guests, presuming nobody else was making this mistake. > > (Yes, it would have been nice if we''d managed to test 3.1.1, but we just > didn''t find the resources by the deadline.) > > regards > john > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel