Phil Tomson via llvm-dev
2016-Dec-22 00:45 UTC
[llvm-dev] struct bitfield regression between 3.6 and 3.9 (using -O0)
Here's our testcase: #include <stdio.h> struct flags { unsigned frog: 1; unsigned foo : 1; unsigned bar : 1; unsigned bat : 1; unsigned baz : 1; unsigned bam : 1; }; int main() { struct flags flags; flags.bar = 1; flags.foo = 1; if (flags.foo == 1) { printf("Pass\n"); return 0; } else { printf("FAIL\n"); return 1; } } when we compile this using LLVM 3.9 we get the "FAIL" message. However, when we compile in LLVM 3.6 it passes. (this is only an issue with -O0, higher levels of optimization work fine) After some investigation we discovered the problem, here's the relevant part of our assembly generated by LVM 3.9: load r0, r510, 24, 8 slr r0, r0, 1, 8 cmpimm r0, r0, 1, 0, 8, SNE bitop1 r0, r0, 1<<0, AND, 64 jct .LBB0_2, r0, 0, N jrel .LBB0_1 Notice the slr (shift logical right) instruction there is shifting to the right 1 position in order to get flags.foo into bit 0 of r0. But the problem is that the compare(cmpimm) is comparing not just the single bit but the whole value in r0 (an 8-bit value) against 1. If we insert a logical AND with '1' to mask r0 just prior to the compare it works fine. And as it turns out, we see that *and* in the LLVM IR generated using -O0 and -emit-llvm has the AND included: ... %bf.lshr = lshr i8 %bf.load4, 1 * %bf.clear5 = and i8 %bf.lshr, 1* %bf.cast = zext i8 %bf.clear5 to i32 %cmp = icmp eq i32 %bf.cast, 1 br i1 %cmp, label %if.then, label %if.else (compiled with: clang -O0 -emit-llvm -S failing.c -o failing.ll ) I reran passing -debug to llc to see what's happening at various stages of DAG optimization: clang -O0 -mllvm -debug -S failing.c -o failing.s The initial selection DAG has the AND op node: t22: i8 = srl t19, Constant:i64<1> * t23: i8 = and t22, Constant:i8<1>* t24: i32 = zero_extend t23 t27: i1 = setcc t24, Constant:i32<1>, seteq:ch t29: i1 = xor t27, Constant:i1<-1> t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> The Optimized lowered selection DAG does not contain the* AND* node, but it does have a truncate which would seem to stand in for it given the result is only 1bit wide and the xor following it is operating on 1-bit wide values: t22: i8 = srl t19, Constant:i64<1> t35: i1 = truncate t22 t29: i1 = xor t35, Constant:i1<-1> t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> Next we get to the Type-legalized selection DAG: t22: i8 = srl t19, Constant:i64<1> t40: i8 = xor t22, Constant:i8<1> t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48> t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> The truncate is now gone. Next we have the Optimzied type-legalized DAG: t22: i8 = srl t19, Constant:i64<1> t43: i8 = setcc t22, Constant:i8<1>, setne:ch t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48> t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> The *xor* has been replaced with a *setcc*. The legalized selection DAG is essentially the same. As is the optimized legalized selection DAG. So if t19 contains 0b00000110 then t22 contains 0b00000011 setcc then compares t22 with a constant 1 and since they're not equal (setne) it sets bit 0 of t43. brcond will then test bit 0 of t43 and since it's set it branches to the else branch (prints FAIL in this case) If instead t22 contained 0b00000001 (as would be the case if the mask was still there) the setcc would find both values to compare equal and since setne is specified the branch in brcond will not be taken (the correct behavior) Things seem to have gone wrong when the Type-legalized selection DAG was optimized and the *xor *node was changed to a *setcc *(and actually, the *xor* seems like it was more optimal than the *setcc *anyway)*. * Any ideas about why this is happening? [in 3.6 we don't see this issue, but then again, in 3.6 the assembly is a bit different: no srl is used to get at the foo field fo the struct] Phil -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161221/04b91837/attachment.html>
Friedman, Eli via llvm-dev
2016-Dec-22 18:29 UTC
[llvm-dev] struct bitfield regression between 3.6 and 3.9 (using -O0)
On 12/21/2016 4:45 PM, Phil Tomson via llvm-dev wrote:> Here's our testcase: > > #include <stdio.h> > > struct flags { > unsigned frog: 1; > unsigned foo : 1; > unsigned bar : 1; > unsigned bat : 1; > unsigned baz : 1; > unsigned bam : 1; > }; > > int main() { > struct flags flags; > flags.bar = 1; > flags.foo = 1; > if (flags.foo == 1) { > printf("Pass\n"); > return 0; > } else { > printf("FAIL\n"); > return 1; > } > } > > when we compile this using LLVM 3.9 we get the "FAIL" message. > However, when we compile in LLVM 3.6 it passes. (this is only an issue > with -O0, higher levels of optimization work fine) > > After some investigation we discovered the problem, here's the > relevant part of our assembly generated by LVM 3.9: > > load r0, r510, 24, 8 > slr r0, r0, 1, 8 > cmpimm r0, r0, 1, 0, 8, SNE > bitop1 r0, r0, 1<<0, AND, 64 > jct .LBB0_2, r0, 0, N > jrel .LBB0_1 > > Notice the slr (shift logical right) instruction there is shifting to > the right 1 position in order to get flags.foo into bit 0 of r0. But > the problem is that the compare(cmpimm) is comparing not just the > single bit but the whole value in r0 (an 8-bit value) against 1. If we > insert a logical AND with '1' to mask r0 just prior to the compare it > works fine. > > And as it turns out, we see that *and* in the LLVM IR generated using > -O0 and -emit-llvm has the AND included: > ... > %bf.lshr = lshr i8 %bf.load4, 1 > * %bf.clear5 = and i8 %bf.lshr, 1* > %bf.cast = zext i8 %bf.clear5 to i32 > %cmp = icmp eq i32 %bf.cast, 1 > br i1 %cmp, label %if.then, label %if.else > > (compiled with: clang -O0 -emit-llvm -S failing.c -o failing.ll ) > > I reran passing -debug to llc to see what's happening at various > stages of DAG optimization: > > clang -O0 -mllvm -debug -S failing.c -o failing.s > > The initial selection DAG has the AND op node: > > t22: i8 = srl t19, Constant:i64<1> > * t23: i8 = and t22, Constant:i8<1>* > t24: i32 = zero_extend t23 > t27: i1 = setcc t24, Constant:i32<1>, seteq:ch > t29: i1 = xor t27, Constant:i1<-1> > t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The Optimized lowered selection DAG does not contain the*AND* node, > but it does have a truncate which would seem to stand in for it given > the result is only 1bit wide and the xor following it is operating on > 1-bit wide values: > > t22: i8 = srl t19, Constant:i64<1> > t35: i1 = truncate t22 > t29: i1 = xor t35, Constant:i1<-1> > t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > Next we get to the Type-legalized selection DAG: > > t22: i8 = srl t19, Constant:i64<1> > t40: i8 = xor t22, Constant:i8<1> > t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The truncateis now gone. > > Next we have the Optimzied type-legalized DAG: > > t22: i8 = srl t19, Constant:i64<1> > t43: i8 = setcc t22, Constant:i8<1>, setne:ch > t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The***xor*has been replaced with a *setcc*. The legalized selection > DAG is essentially the same. As is the optimized legalized selection DAG. > > So if t19 contains 0b00000110then > t22 contains 0b00000011 > setccthen compares t22with a constant 1 and since they're not equal > (setne) it sets bit 0 of t43. > brcond will then test bit 0 of t43 and since it's set it branches to > the else branch (prints FAIL in this case) > > If instead t22 contained 0b00000001 (as would be the case if the mask > was still there) the setcc would find both values to compare equal and > since setne is specified the branch in brcondwill not be taken (the > correct behavior) > > Things seem to have gone wrong when the Type-legalized selection DAG > was optimized and the *xor *node was changed to a *setcc *(and > actually, the *xor*seems like it was more optimal than the *setcc > *anyway)*. * > > Any ideas about why this is happening?I would suggest starting with DAGTypeLegalizer::PromoteIntOp_BRCOND, I think... -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161222/2720d736/attachment.html>
Phil Tomson via llvm-dev
2016-Dec-23 01:45 UTC
[llvm-dev] struct bitfield regression between 3.6 and 3.9 (using -O0)
Given that this is compiled with -O0, would there a way to skip the Optimization of the Type-legalized selection DAG? It's fine until it optimizes the Type-legalized selection DAG into the Optimized Type-legalized selection DAG. Phil On Thu, Dec 22, 2016 at 10:29 AM, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 12/21/2016 4:45 PM, Phil Tomson via llvm-dev wrote: > > Here's our testcase: > > #include <stdio.h> > > struct flags { > unsigned frog: 1; > unsigned foo : 1; > unsigned bar : 1; > unsigned bat : 1; > unsigned baz : 1; > unsigned bam : 1; > }; > > int main() { > struct flags flags; > flags.bar = 1; > flags.foo = 1; > if (flags.foo == 1) { > printf("Pass\n"); > return 0; > } else { > printf("FAIL\n"); > return 1; > } > } > > when we compile this using LLVM 3.9 we get the "FAIL" message. However, > when we compile in LLVM 3.6 it passes. (this is only an issue with -O0, > higher levels of optimization work fine) > > After some investigation we discovered the problem, here's the relevant > part of our assembly generated by LVM 3.9: > > load r0, r510, 24, 8 > slr r0, r0, 1, 8 > cmpimm r0, r0, 1, 0, 8, SNE > bitop1 r0, r0, 1<<0, AND, 64 > jct .LBB0_2, r0, 0, N > jrel .LBB0_1 > > Notice the slr (shift logical right) instruction there is shifting to the > right 1 position in order to get flags.foo into bit 0 of r0. But the > problem is that the compare(cmpimm) is comparing not just the single bit > but the whole value in r0 (an 8-bit value) against 1. If we insert a > logical AND with '1' to mask r0 just prior to the compare it works fine. > > And as it turns out, we see that *and* in the LLVM IR generated using -O0 > and -emit-llvm has the AND included: > ... > %bf.lshr = lshr i8 %bf.load4, 1 > * %bf.clear5 = and i8 %bf.lshr, 1* > %bf.cast = zext i8 %bf.clear5 to i32 > %cmp = icmp eq i32 %bf.cast, 1 > br i1 %cmp, label %if.then, label %if.else > > (compiled with: clang -O0 -emit-llvm -S failing.c -o failing.ll ) > > I reran passing -debug to llc to see what's happening at various stages of > DAG optimization: > > clang -O0 -mllvm -debug -S failing.c -o failing.s > > The initial selection DAG has the AND op node: > > t22: i8 = srl t19, Constant:i64<1> > * t23: i8 = and t22, Constant:i8<1>* > t24: i32 = zero_extend t23 > t27: i1 = setcc t24, Constant:i32<1>, seteq:ch > t29: i1 = xor t27, Constant:i1<-1> > t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The Optimized lowered selection DAG does not contain the* AND* node, but > it does have a truncate which would seem to stand in for it given the > result is only 1bit wide and the xor following it is operating on 1-bit > wide values: > > t22: i8 = srl t19, Constant:i64<1> > t35: i1 = truncate t22 > t29: i1 = xor t35, Constant:i1<-1> > t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > Next we get to the Type-legalized selection DAG: > > t22: i8 = srl t19, Constant:i64<1> > t40: i8 = xor t22, Constant:i8<1> > t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The truncate is now gone. > > Next we have the Optimzied type-legalized DAG: > > t22: i8 = srl t19, Constant:i64<1> > t43: i8 = setcc t22, Constant:i8<1>, setne:ch > t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48> > t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98> > > The *xor* has been replaced with a *setcc*. The legalized selection DAG > is essentially the same. As is the optimized legalized selection DAG. > > So if t19 contains 0b00000110 then > t22 contains 0b00000011 > setcc then compares t22 with a constant 1 and since they're not equal ( > setne) it sets bit 0 of t43. > brcond will then test bit 0 of t43 and since it's set it branches to the > else branch (prints FAIL in this case) > > If instead t22 contained 0b00000001 (as would be the case if the mask was > still there) the setcc would find both values to compare equal and since setne > is specified the branch in brcond will not be taken (the correct behavior) > > Things seem to have gone wrong when the Type-legalized selection DAG was > optimized and the *xor *node was changed to a *setcc *(and actually, the > *xor* seems like it was more optimal than the *setcc *anyway)*. * > > Any ideas about why this is happening? > > > I would suggest starting with DAGTypeLegalizer::PromoteIntOp_BRCOND, I > think... > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161222/1612f960/attachment.html>
Possibly Parallel Threads
- struct bitfield regression between 3.6 and 3.9 (using -O0)
- struct bitfield regression between 3.6 and 3.9 (using -O0)
- How to constraint instructions reordering from patterns?
- How to constraint instructions reordering from patterns?
- How to constraint instructions reordering from patterns?