Bharathi Seshadri via llvm-dev
2018-Dec-04 06:22 UTC
[llvm-dev] Incorrect placement of an instruction after PostRAScheduler pass
Hi, I’m facing a crash issue (--target=arm-linux-gnueabi -march=armv8-a+crc -mfloat-abi=hard) and debugging the problem, I found that an intended branch was not taken due to bad code generation after the Post RA Scheduler pass. A CMPri instruction after an INLINEASM block (which inturn contains a cmp, bne instruction) is being moved before the INLINEASM block incorrectly resulting in two consecutive bne instructions. I do not have a small convenient test case and there are several inline functions involved, but I hope to explain the problem with relevant code snippets. Any suggestions on what/where to look for the problem within llvm would be very useful. In the generated code, there are two consecutive bne instructions which appears to be the problem; the cmp instruction at offset 198 is actually associated with the bne at offset 1bc. Instructions from 1ac to 1b8 are coming from an inline asm. After the “Post RA top-down list latency scheduler” pass, the cmp instruction has been moved before the inline asm causing the issue. 198: e35c0001 cmp ip, #1 <<<<<<<<<<<<<<<<<< 19c: e023700e eor r7, r3, lr 1a0: e0022003 and r2, r2, r3 1a4: e0087007 and r7, r8, r7 1a8: e1820007 orr r0, r2, r7 1ac: e1b42e9f ldaexd r2, r3, [r4] 1b0: e1a46f90 strexd r6, r0, [r4] 1b4: e3560000 cmp r6, #0 1b8: 1afffffb bne 1ac <xxxxx+0x1ac> 1bc: 1a000002 bne 1cc <xxxx+0x1cc> <<<<<<<<<<<<<<<<<<< 1c0: e1a00005 mov r0, r5 1c4: e3a01001 mov r1, #1 This is the relevant C code: if (__builtin_expect((lock_flag == LOCK),1)) { // First comparison using lock_flag, which is a function argument and an enum with values 0, 1, 2. lock=foo_lock(vaddr); } old = func(flags, vaddr, data); if (__builtin_expect((lock_flag == LOCK),1)) { // Second comparison using lock_flag foo_unlock(lock); } The inline asm comes from the below function that is called within func(). static inline void write64 (volatile void *p, uint64_t val) { uint64_t tmp=0,prev=0; asm volatile( "3: ldaexd %[prev], %H[prev], [%[p]];" " strexd %[tmp], %[val], %H[val], [%[p]];" " cmp %[tmp],#0;" " bne 3b;" : [prev] "=&r" (prev), [tmp] "=&r" (tmp) : [p] "r" (p), [val] "r" (val) : "memory"); } ------------------------ # *** IR Dump Before Post RA top-down list latency scheduler ***: BB#6: derived from LLVM BB %if.else Live Ins: %LR %R2 %R3 %R4 %R7 %R12 Predecessors according to CFG: BB#0 CMPri %R12, 1, pred:14, pred:%noreg, %CPSR<imp-def> <<<<<<<<<<< First comparison using lock_flag ; R12 holds lock_flag %R5<def> = IMPLICIT_DEF Bcc <BB#21>, pred:1, pred:%CPSR<kill> <<<<<<<<<<< Successors according to CFG: BB#7(0x40000000 / 0x80000000 50.00%) BB#21(0x40000000 / 0x80000000 = 50.00%) …… BB#21: derived from LLVM BB %if.end Live Ins: %LR %R2 %R3 %R4 %R5 %R7 %R12 Predecessors according to CFG: BB#6 BB#20 BB#9 BB#14 INLINEASM <es:ldaexd $0, ${0:H}, [$1];> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:GPRPair], %R8_R9<earlyclobber,def>, $1:[reguse:GPR], %R4, <!3> %R0<def> = ANDri %R7, 1, pred:14, pred:%noreg, opt:%CPSR<def> %R0<def> = MOVr %LR, pred:1, pred:%CPSR<kill>, opt:%noreg %R1<def> = EORrr %R0, %LR, pred:14, pred:%noreg, opt:%noreg %R1<def> = ANDrr %R9, %R1<kill>, pred:14, pred:%noreg, opt:%noreg %R2<def> = ADDrr %R8, %R2<kill>, pred:14, pred:%noreg, opt:%CPSR<def> %R3<def> = ADCrr %R9, %R3<kill>, pred:14, pred:%noreg, opt:%noreg, %CPSR<imp-use,kill> %R0<def> = ANDrr %R3<kill>, %R0<kill>, pred:14, pred:%noreg, opt:%noreg %R1<def> = ORRrr %R0<kill>, %R1<kill>, pred:14, pred:%noreg, opt:%noreg, %R0_R1<imp-def> %R3<def> = ANDri %R7<kill>, 2, pred:14, pred:%noreg, opt:%CPSR<def> %R3<def> = MVNi 0, pred:1, pred:%CPSR<kill>, opt:%noreg %R7<def> = EORrr %R3, %LR<kill>, pred:14, pred:%noreg, opt:%noreg %R7<def> = ANDrr %R8, %R7<kill>, pred:14, pred:%noreg, opt:%noreg %R2<def> = ANDrr %R2<kill>, %R3<kill>, pred:14, pred:%noreg, opt:%noreg %R0<def> = ORRrr %R2<kill>, %R7<kill>, pred:14, pred:%noreg, opt:%noreg, %R0_R1<imp-use,kill>, %R0_R1<imp-def> <<<<<<<<<<<<<<<<<<<<< INLINEASM <es:3: ldaexd $0, ${0:H}, [$2]; strexd $1, $3, ${3:H}, [$2]; cmp $1,#0; bne 3b;> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:GPRPair], %R2_R3<earlyclobber,def,dead>, $1:[regdef-ec:GPRPair], %R6_R7<earlyclobber,def,dead>, $2:[reguse:GPR], %R4<kill>, $3:[reguse:GPRPair], %R0_R1<kill>, <!11> CMPri %R12<kill>, 1, pred:14, pred:%noreg, %CPSR<imp-def> // Second comparison with lock_flag; R12 holds lock_flag Bcc <BB#23>, pred:1, pred:%CPSR<kill> <<<<<<<<<<<<<<<<<<<<<<< Successors according to CFG: BB#22(0x40000000 / 0x80000000 50.00%) BB#23(0x40000000 / 0x80000000 = 50.00%) ------------------------------- # *** IR Dump After Post RA top-down list latency scheduler ***: BB#21: derived from LLVM BB %if.end Live Ins: %LR %R2 %R3 %R4 %R5 %R7 %R12 Predecessors according to CFG: BB#6 BB#20 BB#9 BB#14 INLINEASM <es:ldaexd $0, ${0:H}, [$1];> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:GPRPair], %R8_R9<earlyclobber,def>, $1:[reguse:GPR], %R4, <!3> %R0<def> = ANDri %R7, 1, pred:14, pred:%noreg, opt:%CPSR<def> %R0<def> = MOVr %LR, pred:1, pred:%CPSR<kill>, opt:%noreg %R1<def> = EORrr %R0, %LR, pred:14, pred:%noreg, opt:%noreg %R2<def> = ADDrr %R8, %R2<kill>, pred:14, pred:%noreg, opt:%CPSR<def> %R1<def> = ANDrr %R9, %R1<kill>, pred:14, pred:%noreg, opt:%noreg %R3<def> = ADCrr %R9, %R3<kill>, pred:14, pred:%noreg, opt:%noreg, %CPSR<imp-use,kill> %R0<def> = ANDrr %R3<kill>, %R0<kill>, pred:14, pred:%noreg, opt:%noreg %R3<def> = ANDri %R7<kill>, 2, pred:14, pred:%noreg, opt:%CPSR<def> %R3<def> = MVNi 0, pred:1, pred:%CPSR<kill>, opt:%noreg %R1<def> = ORRrr %R0<kill>, %R1<kill>, pred:14, pred:%noreg, opt:%noreg, %R0_R1<imp-def> CMPri %R12<kill>, 1, pred:14, pred:%noreg, %CPSR<imp-def> <<<<<<<<<<<<<<<<< %R7<def> = EORrr %R3, %LR<kill>, pred:14, pred:%noreg, opt:%noreg %R2<def> = ANDrr %R2<kill>, %R3<kill>, pred:14, pred:%noreg, opt:%noreg %R7<def> = ANDrr %R8, %R7<kill>, pred:14, pred:%noreg, opt:%noreg %R0<def> = ORRrr %R2<kill>, %R7<kill>, pred:14, pred:%noreg, opt:%noreg, %R0_R1<imp-use,kill>, %R0_R1<imp-def> INLINEASM <es:3: ldaexd $0, ${0:H}, [$2]; strexd $1, $3, ${3:H}, [$2]; cmp $1,#0; bne 3b;> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:GPRPair], %R2_R3<earlyclobber,def,dead>, $1:[regdef-ec:GPRPair], %R6_R7<earlyclobber,def,dead>, $2:[reguse:GPR], %R4<kill>, $3:[reguse:GPRPair], %R0_R1<kill>, <!11> Bcc <BB#23>, pred:1, pred:%CPSR<kill> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Successors according to CFG: BB#22(0x40000000 / 0x80000000 50.00%) BB#23(0x40000000 / 0x80000000 = 50.00%) -------------- After the Post RA Scheduling pass, note that CMPri %R12 has been moved up the INLINE ASM. Since the inline asm has a {cmp, bne} pair, it is incorrect to move the CMPri before the inline assembly block. How do I prevent CMPri from being overridden by the cmp within the inline asm block. Could it be that some dependency is incorrectly specified ? Thanks, Bharathi
Tim Northover via llvm-dev
2018-Dec-04 07:23 UTC
[llvm-dev] Incorrect placement of an instruction after PostRAScheduler pass
Hi Bharathi, On Tue, 4 Dec 2018 at 06:23, Bharathi Seshadri via llvm-dev <llvm-dev at lists.llvm.org> wrote:> After the Post RA Scheduling pass, note that CMPri %R12 has been moved > up the INLINE ASM. Since the inline asm has a {cmp, bne} pair, it is > incorrect to move the CMPri before the inline assembly block. > How do I prevent CMPri from being overridden by the cmp within the > inline asm block. Could it be that some dependency is incorrectly > specified ?You need to tell clang that you're clobbering CPSR in the asm block so that it doesn't think it can remain live across it. For some reason that's spelled "cc" in ARM land. So you'd write something like: asm("ldaxr ..." : /* outputs */ : /* inputs */ : "cc"); Cheers. Tim.
Bharathi Seshadri via llvm-dev
2018-Dec-04 18:02 UTC
[llvm-dev] Incorrect placement of an instruction after PostRAScheduler pass
Thanks Tim, that worked. Bharathi On Mon, Dec 3, 2018 at 11:23 PM Tim Northover <t.p.northover at gmail.com> wrote:> > Hi Bharathi, > > On Tue, 4 Dec 2018 at 06:23, Bharathi Seshadri via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > After the Post RA Scheduling pass, note that CMPri %R12 has been moved > > up the INLINE ASM. Since the inline asm has a {cmp, bne} pair, it is > > incorrect to move the CMPri before the inline assembly block. > > How do I prevent CMPri from being overridden by the cmp within the > > inline asm block. Could it be that some dependency is incorrectly > > specified ? > > You need to tell clang that you're clobbering CPSR in the asm block so > that it doesn't think it can remain live across it. For some reason > that's spelled "cc" in ARM land. So you'd write something like: > > asm("ldaxr ..." : /* outputs */ : /* inputs */ : "cc"); > > Cheers. > > Tim.