Chris Smowton
2013-Oct-22 00:18 UTC
[LLVMdev] System call miscompilation using the fast register allocator
Hi, Apologies this is a bit lengthy. TLDR: I'm using Dragonegg + LLVM 3.2 and uClibc, and am finding that using the Fast register allocator (i.e. -optimize-regalloc=0) causes miscompilation of setsockopt calls (5-arg system calls). The problem doesn't happen with the default register allocation path selected. It can be worked around by manually simplifying the system call setup sequence. I'm looking to find out whether this is a bug with the fast register allocator, or whether the Linux headers' description of the system call setup sequence, or gcc/Dragonegg's interpretation of such, is faulty and the allocator just so happens to expose the bug. Now the long version: I'm building a simple test program that uses a 5-argument system call using LLVM 3.2, like: int main(int argc, char** argv) { int val = 1; socklen_t len = 4; return setsockopt(-1, SOL_SOCKET, TCP_CORK, &val, len); } setsockopt is provided by uclibc, and is available as LLVM, leading to optimised LLVM code like: define i32 @main(i32 %argc, i8** nocapture %argv) unnamed_addr nounwind uwtable { entry: %val = alloca i32, align 4 store i32 1, i32* %val, align 4 %0 = ptrtoint i32* %val to i64 call void asm sideeffect "", "{r8}"(i64 4) nounwind call void asm sideeffect "", "{r10}"(i64 %0) nounwind call void asm sideeffect "", "{rdx}"(i64 3) nounwind call void asm sideeffect "", "{rsi}"(i64 1) nounwind call void asm sideeffect "", "{rdi}"(i64 -1) nounwind %1 = call i64 asm sideeffect "", "={rdi}"() nounwind %2 = call i64 asm sideeffect "", "={rsi}"() nounwind %3 = call i64 asm sideeffect "", "={rdx}"() nounwind %4 = call i64 asm sideeffect "", "={r10}"() nounwind %5 = call i64 asm sideeffect "", "={r8}"() nounwind %asmtmp.i = call i64 asm sideeffect "syscall\0A\09", "={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64 54, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5) nounwind, !srcloc !0 SOL_SOCKET = 1 and TCP_CORK = 3, so so far so good. Although the inline asm seems a bit odd: these are derived from bits/syscalls.h, and whilst it's clear what they're *trying* to do, I'm not sure about this construction: call void asm sideeffect "", "{somereg}"(i64 X) nounwind %1 = call i64 asm sideeffect "", "={somereg}"() nounwind Intuitively this seems like whilst it instructs to put X in somereg and then to read somereg, it doesn't say that somereg must remain the same in the meantime! Lowering to x86 using -optimize-regalloc=0, and therefore the fast register allocator, then leads to code like: 400190: c7 44 24 fc 01 00 00 movl $0x1,-0x4(%rsp) 400197: 00 400198: 41 b8 04 00 00 00 mov $0x4,%r8d 40019e: 4c 8d 54 24 fc lea -0x4(%rsp),%r10 4001a3: ba 03 00 00 00 mov $0x3,%edx ; Sets rdx/edx to the correct 3rd arg (3 == TCP_CORK) 4001a8: be 01 00 00 00 mov $0x1,%esi 4001ad: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx ; Clobbers the 3rd arg! 4001b4: 48 89 d7 mov %rdx,%rdi ; Uses the clobbering value to set up the 1st arg 4001b7: b8 36 00 00 00 mov $0x36,%eax ; Syscall number 4001bc: 48 89 54 24 f0 mov %rdx,-0x10(%rsp) 4001c1: 0f 05 syscall ; RDX (= arg 3) still clobbered Here there is trouble: the x86-64 Linux ABI says the syscall number goes in eax, then the args go [rdi, rsi, rdx, r10, r8] from left to right. However as noted in line, it clobbers RDX after it has been set up for the call! Checking with strace indeed we see: setsockopt(-1, SOL_SOCKET, 0xffffffff /* SO_??? */, [1], 4) To compare, a version built without using -optimize-regalloc=0 produces x86: 400198: 41 b8 04 00 00 00 mov $0x4,%r8d 40019e: 4c 8d 54 24 fc lea -0x4(%rsp),%r10 4001a3: ba 03 00 00 00 mov $0x3,%edx 4001a8: be 01 00 00 00 mov $0x1,%esi 4001ad: 49 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%r9 4001b4: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 4001bb: b8 36 00 00 00 mov $0x36,%eax 4001c0: 0f 05 syscall Interestingly this sets R9, which would take the 6th argument if this was a 6-arg call, suggesting the syscall sequence is being special-cased as it is not mentioned at all in the IR for the call. This works as expected, yielding correct strace: setsockopt(-1, SOL_SOCKET, SO_TYPE, [1], 4) (TCP_CORK == SO_TYPE). Above, I said I suspected that the list of "asm sideeffect" calls doesn't actually express the right constraints. Is that true? Considering the line that actually makes the system call already specifies register constraints, is there any need for the lines that write individual values to registers, then read them for no apparent purpose? In short, is this whole problem down to bits/syscalls.h making unwarranted assumptions about the compiler, and we just get lucky with the default/greedy register allocator? If this is wrong, and the IR *does* correctly express "put these values in these registers and syscall", where should I start figuring out how and why the allocator feels free to clobber RDX when it should be set up for the call? I tried running the final IR->x86 lowering with -print-after-all, and it appears all is well after 'Two-Address instruction pass': MOV32mi <fi#0>, 1, %noreg, 0, %noreg, 1; mem:ST4[%val] %vreg3<def> = MOV64ri64i32 4; GR64:%vreg3 %R8<def> = COPY %vreg3; GR64:%vreg3 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8 %vreg4<def> = LEA64r <fi#0>, 1, %noreg, 0, %noreg; GR64:%vreg4 %R10<def> = COPY %vreg4; GR64:%vreg4 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10 %vreg5<def> = MOV64ri64i32 3; GR64:%vreg5 %RDX<def> = COPY %vreg5; GR64:%vreg5 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX %vreg6<def> = MOV64ri64i32 1; GR64:%vreg6 %RSI<def> = COPY %vreg6; GR64:%vreg6 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI %vreg7<def> = MOV64ri32 -1; GR64:%vreg7 %RDI<def> = COPY %vreg7; GR64:%vreg7 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def> %vreg8<def> = COPY %RDI; GR64:%vreg8 %vreg2<def> = MOV64ri64i32 54; GR64:%vreg2 INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def> %vreg9<def> = COPY %RSI; GR64:%vreg9 INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def> %vreg10<def> = COPY %RDX; GR64:%vreg10 INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def> %vreg11<def> = COPY %R10; GR64:%vreg11 INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def> %vreg12<def> = COPY %R8; GR64:%vreg12 %RDI<def> = COPY %vreg8; GR64:%vreg8 %RSI<def> = COPY %vreg9; GR64:%vreg9 %RDX<def> = COPY %vreg10; GR64:%vreg10 %R10<def> = COPY %vreg11; GR64:%vreg11 %R8<def> = COPY %vreg12; GR64:%vreg12 INLINEASM <es:syscall > [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>, $1:[reguse tiedto:$0], %vreg2<tied3>, $2:[reguse], %RDI, $3:[reguse], %RSI, $4:[reguse], %RDX, $5:[reguse], %R10, $6:[reguse], %R8, $7:[clobber], %EFLAGS<earlyclobber,imp-def>, $8:[clobber], %CX<earlyclobber,imp-def>, $9:[clobber], %R11<earlyclobber,imp-def>, <<badref>>; GR64:%vreg2 BUT there is trouble after "Prologue/Epilogue Insertion & Frame Finalization": BB#0: derived from LLVM BB %entry MOV32mi %RSP, 1, %noreg, -4, %noreg, 1; mem:ST4[%val] %R8<def> = MOV64ri64i32 4 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R8<kill> %R10<def> = LEA64r %RSP, 1, %noreg, -4, %noreg INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %R10<kill> %RDX<def> = MOV64ri64i32 3 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDX<kill> %RSI<def> = MOV64ri64i32 1 INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RSI<kill> %RDX<def> = MOV64ri32 -1 %RDI<def> = COPY %RDX INLINEASM <es:> [sideeffect] [attdialect], $0:[reguse], %RDI<kill> INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDI<imp-def> %RAX<def> = MOV64ri64i32 54 INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RSI<imp-def> MOV64mr %RSP, 1, %noreg, -16, %noreg, %RDX<kill>; mem:ST8[FixedStack1] INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %RDX<imp-def> INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R10<imp-def> INLINEASM <es:> [sideeffect] [attdialect], $0:[regdef], %R8<imp-def> INLINEASM <es:syscall > [sideeffect] [attdialect], $0:[regdef], %RAX<imp-def,tied5>, $1:[reguse tiedto:$0], %RAX<kill,tied3>, $2:[reguse], %RDI<kill>, $3:[reguse], %RSI<kill>, $4:[reguse], %RDX<kill>, $5:[reguse], %R10<kill>, $6:[reguse], %R8<kill>, $7:[clobber], %EFLAGS<earlyclobber,imp-def>, $8:[clobber], %CX<earlyclobber,imp-def>, $9:[clobber], %R11<earlyclobber,imp-def>, <<badref>> Here you can see the clobber happening as RDX is assigned for the second time. Finally, I tried manually editing out the apparently superfluous asm statements from the LLVM IR, giving me a simpler program like this: %val = alloca i32, align 4 store i32 1, i32* %val, align 4 %0 = ptrtoint i32* %val to i64 %asmtmp.i = call i64 asm sideeffect "syscall\0A\09", "={ax},0,{rdi},{rsi},{rdx},{r10},{r8},~{fpsr},~{flags},~{cx},~{r11},~{cc},~{memory}"(i64 54, i64 -1, i64 1, i64 3, i64 %0, i64 4) nounwind, !srcloc !0 As you can see the arguments are now directly specified; the constraints remain the same. This compiles correctly: the corresponding x86 code is: 400190: c7 44 24 fc 01 00 00 movl $0x1,-0x4(%rsp) 400197: 00 400198: b8 36 00 00 00 mov $0x36,%eax 40019d: 48 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%rcx 4001a4: be 01 00 00 00 mov $0x1,%esi 4001a9: ba 03 00 00 00 mov $0x3,%edx 4001ae: 4c 8d 54 24 fc lea -0x4(%rsp),%r10 4001b3: 41 b8 04 00 00 00 mov $0x4,%r8d 4001b9: 48 89 cf mov %rcx,%rdi 4001bc: 48 89 4c 24 f0 mov %rcx,-0x10(%rsp) 4001c1: 0f 05 syscall Note the use of RCX, not RDX, as a temporary, avoiding clobbering RDX. This suggests to me that the allocator is correctly preserving registers, and that the old IR is too loose, and so the question is likely how Dragonegg should compile the syscall C / inline asm code to LLVM IR. However I'd really appreciate anyone confirming or denying my suspicions, as I'm kind of learning as I go here! Chris
Chris Smowton
2013-Oct-22 01:19 UTC
[LLVMdev] System call miscompilation using the fast register allocator
A small correction to my last message: I had thought the syscall macros used by uclibc were borrowed from the Linux kernel includes, but in fact they're packaged with uclibc, and can be seen here: git.uclibc.org/uClibc/tree/libc/sysdeps/linux/x86_64/bits/syscalls.h Chris
Reasonably Related Threads
- Incorrect placement of an instruction after PostRAScheduler pass
- [LLVMdev] Inline asm bug?
- Instruction selection problem with type i64 - mistaken as v8i64?
- Instruction selection problem with type i64 - mistaken as v8i64?
- Problems with Inline ASM expressions generated in the back end