thr3ads.net - llvm dev - [llvm-dev] Wrong code bug after GVN/PRE? [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Mikael Holmén via llvm-dev

2017-Jan-13 08:31 UTC

[llvm-dev] Wrong code bug after GVN/PRE?

Hi,

I've stumbled upon a case where I think gvn does a bad (wrong) 
optimization. It's a bit messy to debug though so I'm not sure if I 
should just write a PR about it a let someone who knows the code look at 
it instead.

Anyway, for the bug to trigger I need to run the following passes in the 
same opt invocation:
  -sroa -instcombine -simplifycfg -instcombine -gvn

The problem seems to be that GVN::PerformLoadPRE triggers and I see a

GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2

printout.

If I instead first run

  -sroa -instcombine -simplifycfg -instcombine

and then

  -gvn

I don't get the

GVN REMOVING PRE LOAD

printout, and the resulting code looks ok to me.

Is this expected? Should the output differ in these two cases?


The input to gvn looks the same when running all passes and just gvn, 
but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI) 
behaves:

I get different results from

   // Step 1: Find the non-local dependencies of the load.
   LoadDepVect Deps;
   MD->getNonLocalPointerDependency(LI, Deps);

So we get different results from MemoryDependenceResults when invoking 
gvn alone or after a bunch of other passes.

Is this expected or not?


I tried to dig into 
MemoryDependenceResults::getNonLocalPointerDependency but I have to say 
I'm quite lost.

I ran some git bisect and as far as I can tell it starts going wrong 
with commit

  [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd

2016-09-01, but that commit doesn't change GVN/MemoryDependenceResults 
so I suppose it just changes the input to GVN so the problem suddenly 
surfaces.

Finally, the problem that I see is:

In the input we have something like

	for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
	{
		lb[step1] = step1 + 7;
		ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);

		switch(lb[step1]) {
		case 7:	CVAL_VERIFY(step1 == 0);
			CVAL_VERIFY(ub[step1] == 10);
                         __label(511);
			break;
		case 8:	CVAL_VERIFY(step1 == 1);
			CVAL_VERIFY(ub[step1] == 100);
                         __label(512);
			break;
		case 9:	CVAL_VERIFY(step1 == 2);
			CVAL_VERIFY(ub[step1] == 1000);
                         __label(513);
			break;
		default:;
		}

                 [...]
	}

	int i, j;
	for ( j = 10, i = 0; j < 101; j=j*10, i++ )
	{
		CVAL_VERIFY(ub[i] == j);
	}

So we have two loops. In the first one the array ub is written (low 
index first, high index last), and then in the second loop the content 
of ub is checked (low index first, high index last).

After gvn this is changed into something like

         int tmp;
	for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
	{
		lb[step1] = step1 + 7;
		ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);

		switch(lb[step1]) {
		case 7:	CVAL_VERIFY(step1 == 0);
                         tmp = ub[step1];
			CVAL_VERIFY(tmp == 10);
                         __label(511);
			break;
		case 8:	CVAL_VERIFY(step1 == 1);
                         tmp = ub[step1];
			CVAL_VERIFY(tmp == 100);
                         __label(512);
			break;
		case 9:	CVAL_VERIFY(step1 == 2);
                         tmp = ub[step1];
			CVAL_VERIFY(tmp == 1000);
                         __label(513);
			break;
		default:;
		}

                 [...]
	}

	int i, j;
	for ( j = 10, i = 0; j < 101; j=j*10, i++ )
	{
		CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
	}

Note the introduced variable tmp.

Also note that with this rewrite, after the first loop tmp will contain 
the value of the element at the highest index in ub, and then we use tmp 
in the first round of the second loop, but there we expect the value of 
the lowest index element in ub.

Any ideas? Should I file a PR?

Thanks,
Mikael
-------------- next part --------------
; llvm assembler.
target datalayout = "p:16:16"

%int4 = type i16
%int5 = type i16
%ptr7 = type %int4 *
%arr18 = type [4 x %int4]
%ptr20 = type %arr18 *
%arr22 = type [16 x %int4]
%ptr24 = type %arr22 *
%ptr26 = type %ptr20 *

declare void @CVAL_VERIFY_FUNC (%int4 %val.6.par) optsize minsize noinline;

define %int4 @main () optsize minsize {
%step1.7 = alloca %int4
%i.8 = alloca %int4
%j.9 = alloca %int4
%globcse1.10 = alloca %int4
%cach1.11 = alloca %int4
%liv1.12 = alloca %ptr20
%liv2.13 = alloca %ptr20
%liv3.14 = alloca %ptr20
%lb.15 = alloca %arr18
%ub.16 = alloca %arr18
store %ptr20 %lb.15, %ptr26 %liv2.13
store %ptr20 %ub.16, %ptr26 %liv1.12
store %int4 0, %ptr7 %step1.7
br label %bb1
bb1:
%_tmp5 = load %int4, %ptr7 %step1.7
%_tmp6 = add %int4 %_tmp5, 7
store %int4 %_tmp6, %ptr7 %cach1.11
%_tmp7 = load %int4, %ptr7 %cach1.11
%_tmp8 = load %ptr20, %ptr26 %liv2.13
%_tmp9 = load %int4, %ptr7 %step1.7
%_tmp10 = sext %int4 %_tmp9 to i64
%_tmp11 = getelementptr %arr18, %ptr20 %_tmp8, i16 0, i64 %_tmp10
store %int4 %_tmp7, %ptr7 %_tmp11
%_tmp12 = load %int4, %ptr7 %step1.7
%_tmp13 = icmp eq %int4 %_tmp12, 0
br i1 %_tmp13, label %bb3, label %bb4
bb3:
%_tmp14 = load %ptr20, %ptr26 %liv1.12
%_tmp15 = bitcast %ptr20 %_tmp14 to %ptr7
store %int4 10, %ptr7 %_tmp15
br label %bb5
bb4:
%_tmp16 = load %ptr20, %ptr26 %liv1.12
%_tmp17 = load %int4, %ptr7 %step1.7
%_tmp18 = sub %int4 %_tmp17, 1
%_tmp19 = sext %int4 %_tmp18 to i64
%_tmp20 = getelementptr %arr18, %ptr20 %_tmp16, i16 0, i64 %_tmp19
%_tmp21 = load %int4, %ptr7 %_tmp20
%_tmp22 = mul %int4 %_tmp21, 10
%_tmp23 = load %ptr20, %ptr26 %liv1.12
%_tmp24 = load %int4, %ptr7 %step1.7
%_tmp25 = sext %int4 %_tmp24 to i64
%_tmp26 = getelementptr %arr18, %ptr20 %_tmp23, i16 0, i64 %_tmp25
store %int4 %_tmp22, %ptr7 %_tmp26
br label %bb5
bb5:
%_tmp27 = load %int4, %ptr7 %cach1.11
%_tmp28 = icmp eq %int4 %_tmp27, 7
br i1 %_tmp28, label %bb6, label %bb_usw2
bb6:
%_tmp29 = load %int4, %ptr7 %step1.7
%_tmp30 = icmp eq %int4 %_tmp29, 0
%_tmp31 = select i1 %_tmp30, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp31)
%_tmp32 = load %ptr20, %ptr26 %liv1.12
%_tmp33 = load %int4, %ptr7 %step1.7
%_tmp34 = sext %int4 %_tmp33 to i64
%_tmp35 = getelementptr %arr18, %ptr20 %_tmp32, i16 0, i64 %_tmp34
%_tmp36 = load %int4, %ptr7 %_tmp35
%_tmp37 = icmp eq %int4 %_tmp36, 10
%_tmp38 = select i1 %_tmp37, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp38)
br label %bb9
bb_usw2:
%_tmp39 = load %int4, %ptr7 %cach1.11
%_tmp40 = icmp eq %int4 %_tmp39, 8
br i1 %_tmp40, label %bb7, label %bb8
bb7:
%_tmp41 = load %int4, %ptr7 %step1.7
%_tmp42 = icmp eq %int4 %_tmp41, 1
%_tmp43 = select i1 %_tmp42, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp43)
%_tmp44 = load %ptr20, %ptr26 %liv1.12
%_tmp45 = load %int4, %ptr7 %step1.7
%_tmp46 = sext %int4 %_tmp45 to i64
%_tmp47 = getelementptr %arr18, %ptr20 %_tmp44, i16 0, i64 %_tmp46
%_tmp48 = load %int4, %ptr7 %_tmp47
%_tmp49 = icmp eq %int4 %_tmp48, 100
%_tmp50 = select i1 %_tmp49, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp50)
br label %bb9
bb8:
%_tmp51 = load %int4, %ptr7 %step1.7
%_tmp52 = icmp eq %int4 %_tmp51, 2
%_tmp53 = select i1 %_tmp52, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp53)
%_tmp54 = load %ptr20, %ptr26 %liv1.12
%_tmp55 = load %int4, %ptr7 %step1.7
%_tmp56 = sext %int4 %_tmp55 to i64
%_tmp57 = getelementptr %arr18, %ptr20 %_tmp54, i16 0, i64 %_tmp56
%_tmp58 = load %int4, %ptr7 %_tmp57
%_tmp59 = icmp eq %int4 %_tmp58, 1000
%_tmp60 = select i1 %_tmp59, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp60)
br label %bb9
bb9:
%_tmp61 = load %int4, %ptr7 %cach1.11
%_tmp62 = icmp eq %int4 %_tmp61, 8
br i1 %_tmp62, label %bb10, label %bb_usw4
bb10:
%_tmp63 = load %int4, %ptr7 %step1.7
%_tmp64 = icmp eq %int4 %_tmp63, 1
%_tmp65 = select i1 %_tmp64, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp65)
br label %bb12
bb_usw4:
%_tmp66 = load %int4, %ptr7 %cach1.11
%_tmp67 = icmp eq %int4 %_tmp66, 9
br i1 %_tmp67, label %bb11, label %bb12
bb11:
%_tmp68 = load %int4, %ptr7 %step1.7
%_tmp69 = icmp eq %int4 %_tmp68, 2
%_tmp70 = select i1 %_tmp69, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp70)
br label %bb12
bb12:
%_tmp71 = load %int4, %ptr7 %step1.7
%_tmp72 = add %int4 %_tmp71, 1
store %int4 %_tmp72, %ptr7 %step1.7
%_tmp73 = load %int4, %ptr7 %step1.7
%_tmp74 = icmp slt %int4 %_tmp73, 3
br i1 %_tmp74, label %bb1, label %bb13
bb13:
store %int4 10, %ptr7 %j.9
store %ptr20 %ub.16, %ptr26 %liv3.14
store %int4 0, %ptr7 %i.8
br label %bb14
bb14:
%_tmp75 = load %ptr20, %ptr26 %liv3.14
%_tmp76 = load %int4, %ptr7 %i.8
%_tmp77 = sext %int4 %_tmp76 to i64
%_tmp78 = getelementptr %arr18, %ptr20 %_tmp75, i16 0, i64 %_tmp77
%_tmp79 = load %int4, %ptr7 %_tmp78
store %int4 %_tmp79, %ptr7 %globcse1.10
%_tmp84 = load %int4, %ptr7 %globcse1.10
%_tmp85 = load %int4, %ptr7 %j.9
%_tmp86 = icmp eq %int4 %_tmp84, %_tmp85
%_tmp87 = select i1 %_tmp86, %int4 1, %int4 0
call void @CVAL_VERIFY_FUNC(%int4 %_tmp87)
%_tmp88 = load %int4, %ptr7 %j.9
%_tmp89 = mul %int4 %_tmp88, 10
store %int4 %_tmp89, %ptr7 %j.9
%_tmp90 = load %int4, %ptr7 %i.8
%_tmp91 = add %int4 %_tmp90, 1
store %int4 %_tmp91, %ptr7 %i.8
%_tmp92 = load %int4, %ptr7 %j.9
%_tmp93 = icmp slt %int4 %_tmp92, 101
br i1 %_tmp93, label %bb14, label %bb16
bb16:
br label %bb17
bb17:
ret %int4 zeroinitializer
}

Friedman, Eli via llvm-dev

2017-Jan-13 18:31 UTC

head link

[llvm-dev] Wrong code bug after GVN/PRE?

On 1/13/2017 12:31 AM, Mikael Holmén via llvm-dev wrote:> Hi,
>
> I've stumbled upon a case where I think gvn does a bad (wrong) 
> optimization. It's a bit messy to debug though so I'm not sure if I
> should just write a PR about it a let someone who knows the code look 
> at it instead.
>
> Anyway, for the bug to trigger I need to run the following passes in 
> the same opt invocation:
>  -sroa -instcombine -simplifycfg -instcombine -gvn
>
> The problem seems to be that GVN::PerformLoadPRE triggers and I see a
>
> GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2
>
> printout.
>
> If I instead first run
>
>  -sroa -instcombine -simplifycfg -instcombine
>
> and then
>
>  -gvn
>
> I don't get the
>
> GVN REMOVING PRE LOAD
>
> printout, and the resulting code looks ok to me.
>
> Is this expected? Should the output differ in these two cases?
>
>
> The input to gvn looks the same when running all passes and just gvn, 
> but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI) 
> behaves:
>
> I get different results from
>
>   // Step 1: Find the non-local dependencies of the load.
>   LoadDepVect Deps;
>   MD->getNonLocalPointerDependency(LI, Deps);
>
> So we get different results from MemoryDependenceResults when invoking 
> gvn alone or after a bunch of other passes.
>
> Is this expected or not?
>
>
> I tried to dig into 
> MemoryDependenceResults::getNonLocalPointerDependency but I have to 
> say I'm quite lost.
>
> I ran some git bisect and as far as I can tell it starts going wrong 
> with commit
>
>  [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
>
> 2016-09-01, but that commit doesn't change GVN/MemoryDependenceResults 
> so I suppose it just changes the input to GVN so the problem suddenly 
> surfaces.
>
> Finally, the problem that I see is:
>
> In the input we have something like
>
>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>     {
>         lb[step1] = step1 + 7;
>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>
>         switch(lb[step1]) {
>         case 7:    CVAL_VERIFY(step1 == 0);
>             CVAL_VERIFY(ub[step1] == 10);
>                         __label(511);
>             break;
>         case 8:    CVAL_VERIFY(step1 == 1);
>             CVAL_VERIFY(ub[step1] == 100);
>                         __label(512);
>             break;
>         case 9:    CVAL_VERIFY(step1 == 2);
>             CVAL_VERIFY(ub[step1] == 1000);
>                         __label(513);
>             break;
>         default:;
>         }
>
>                 [...]
>     }
>
>     int i, j;
>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>     {
>         CVAL_VERIFY(ub[i] == j);
>     }
>
> So we have two loops. In the first one the array ub is written (low 
> index first, high index last), and then in the second loop the content 
> of ub is checked (low index first, high index last).
>
> After gvn this is changed into something like
>
>         int tmp;
>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>     {
>         lb[step1] = step1 + 7;
>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>
>         switch(lb[step1]) {
>         case 7:    CVAL_VERIFY(step1 == 0);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 10);
>                         __label(511);
>             break;
>         case 8:    CVAL_VERIFY(step1 == 1);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 100);
>                         __label(512);
>             break;
>         case 9:    CVAL_VERIFY(step1 == 2);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 1000);
>                         __label(513);
>             break;
>         default:;
>         }
>
>                 [...]
>     }
>
>     int i, j;
>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>     {
>         CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
>     }
>
> Note the introduced variable tmp.
>
> Also note that with this rewrite, after the first loop tmp will 
> contain the value of the element at the highest index in ub, and then 
> we use tmp in the first round of the second loop, but there we expect 
> the value of the lowest index element in ub.
>
> Any ideas? Should I file a PR? 
Ooh, wow, that's nasty: it looks like PRE is somehow deciding that 
ub[step1] and ub[0] are equivalent.  Please file a bug.

The reason you're having trouble splitting the "opt" invocation is
that
you aren't passing in "-preserve-ll-uselistorder"; it usually
doesn't
matter, but occasionally you'll run into an optimization which is 
sensitive to it.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Mehdi Amini via llvm-dev

2017-Jan-13 20:49 UTC

head link

[llvm-dev] Wrong code bug after GVN/PRE?

> On Jan 13, 2017, at 10:31 AM, Friedman, Eli via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 1/13/2017 12:31 AM, Mikael Holmén via llvm-dev wrote:
>> Hi,
>> 
>> I've stumbled upon a case where I think gvn does a bad (wrong)
optimization. It's a bit messy to debug though so I'm not sure if I
should just write a PR about it a let someone who knows the code look at it
instead.
>> 
>> Anyway, for the bug to trigger I need to run the following passes in
the same opt invocation:
>> -sroa -instcombine -simplifycfg -instcombine -gvn
>> 
>> The problem seems to be that GVN::PerformLoadPRE triggers and I see a
>> 
>> GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2
>> 
>> printout.
>> 
>> If I instead first run
>> 
>> -sroa -instcombine -simplifycfg -instcombine
>> 
>> and then
>> 
>> -gvn
>> 
>> I don't get the
>> 
>> GVN REMOVING PRE LOAD
>> 
>> printout, and the resulting code looks ok to me.
>> 
>> Is this expected? Should the output differ in these two cases?
>> 
>> 
>> The input to gvn looks the same when running all passes and just gvn,
but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI) behaves:
>> 
>> I get different results from
>> 
>>  // Step 1: Find the non-local dependencies of the load.
>>  LoadDepVect Deps;
>>  MD->getNonLocalPointerDependency(LI, Deps);
>> 
>> So we get different results from MemoryDependenceResults when invoking
gvn alone or after a bunch of other passes.
>> 
>> Is this expected or not?
>> 
>> 
>> I tried to dig into
MemoryDependenceResults::getNonLocalPointerDependency but I have to say I'm
quite lost.
>> 
>> I ran some git bisect and as far as I can tell it starts going wrong
with commit
>> 
>> [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
>> 
>> 2016-09-01, but that commit doesn't change
GVN/MemoryDependenceResults so I suppose it just changes the input to GVN so the
problem suddenly surfaces.
>> 
>> Finally, the problem that I see is:
>> 
>> In the input we have something like
>> 
>>    for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>    {
>>        lb[step1] = step1 + 7;
>>        ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>> 
>>        switch(lb[step1]) {
>>        case 7:    CVAL_VERIFY(step1 == 0);
>>            CVAL_VERIFY(ub[step1] == 10);
>>                        __label(511);
>>            break;
>>        case 8:    CVAL_VERIFY(step1 == 1);
>>            CVAL_VERIFY(ub[step1] == 100);
>>                        __label(512);
>>            break;
>>        case 9:    CVAL_VERIFY(step1 == 2);
>>            CVAL_VERIFY(ub[step1] == 1000);
>>                        __label(513);
>>            break;
>>        default:;
>>        }
>> 
>>                [...]
>>    }
>> 
>>    int i, j;
>>    for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>    {
>>        CVAL_VERIFY(ub[i] == j);
>>    }
>> 
>> So we have two loops. In the first one the array ub is written (low
index first, high index last), and then in the second loop the content of ub is
checked (low index first, high index last).
>> 
>> After gvn this is changed into something like
>> 
>>        int tmp;
>>    for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>    {
>>        lb[step1] = step1 + 7;
>>        ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>> 
>>        switch(lb[step1]) {
>>        case 7:    CVAL_VERIFY(step1 == 0);
>>                        tmp = ub[step1];
>>            CVAL_VERIFY(tmp == 10);
>>                        __label(511);
>>            break;
>>        case 8:    CVAL_VERIFY(step1 == 1);
>>                        tmp = ub[step1];
>>            CVAL_VERIFY(tmp == 100);
>>                        __label(512);
>>            break;
>>        case 9:    CVAL_VERIFY(step1 == 2);
>>                        tmp = ub[step1];
>>            CVAL_VERIFY(tmp == 1000);
>>                        __label(513);
>>            break;
>>        default:;
>>        }
>> 
>>                [...]
>>    }
>> 
>>    int i, j;
>>    for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>    {
>>        CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
>>    }
>> 
>> Note the introduced variable tmp.
>> 
>> Also note that with this rewrite, after the first loop tmp will contain
the value of the element at the highest index in ub, and then we use tmp in the
first round of the second loop, but there we expect the value of the lowest
index element in ub.
>> 
>> Any ideas? Should I file a PR? 
> 
> Ooh, wow, that's nasty: it looks like PRE is somehow deciding that
ub[step1] and ub[0] are equivalent.  Please file a bug.
> 
> The reason you're having trouble splitting the "opt"
invocation is that you aren't passing in
"-preserve-ll-uselistorder"; it usually doesn't matter, but
occasionally you'll run into an optimization which is sensitive to it.
Do we have known cases where use-list order sensitivity is not “a bug”?

— 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/70c4c47b/attachment.html>

Daniel Berlin via llvm-dev

2017-Jan-13 21:29 UTC

head link

[llvm-dev] Wrong code bug after GVN/PRE?

Yeah, there's a lot of things this could be.

On the memdep side:
Note that memdep is not actually properly updated in all cases by most
passes that claim to not invalidate it (they don't invalidate dependent
pointers, only pointers they directly touch).

There's already a bug filed about this.  So far we've only seen
missed-opt,
not wrong code from this.
But it should be possible to generate wrong code bugs with it (if you place
the load/store in a place that invalidates the cached dependent result) ,
so i wonder if this is one.

Otherwise, the other reason it could give different answers after running a
bunch of passes is the random scanning limits it has.

On the PRE side:
PHITranslateAddr goes through a bunch of crazy machinations to try to be
able to prove things about inter-iteration variables instead of just doing
phi translation.

LoopLoadElimination was written so that these could be removed, because
they are pretty much crazytown.



--Dan


On Fri, Jan 13, 2017 at 10:31 AM, Friedman, Eli via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 1/13/2017 12:31 AM, Mikael Holmén via llvm-dev wrote:
>
>> Hi,
>>
>> I've stumbled upon a case where I think gvn does a bad (wrong)
>> optimization. It's a bit messy to debug though so I'm not sure
if I should
>> just write a PR about it a let someone who knows the code look at it
>> instead.
>>
>> Anyway, for the bug to trigger I need to run the following passes in
the
>> same opt invocation:
>>  -sroa -instcombine -simplifycfg -instcombine -gvn
>>
>> The problem seems to be that GVN::PerformLoadPRE triggers and I see a
>>
>> GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2
>>
>> printout.
>>
>> If I instead first run
>>
>>  -sroa -instcombine -simplifycfg -instcombine
>>
>> and then
>>
>>  -gvn
>>
>> I don't get the
>>
>> GVN REMOVING PRE LOAD
>>
>> printout, and the resulting code looks ok to me.
>>
>> Is this expected? Should the output differ in these two cases?
>>
>>
>> The input to gvn looks the same when running all passes and just gvn,
but
>> I see a difference in how GVN::processNonLocalLoad(LoadInst *LI)
behaves:
>>
>> I get different results from
>>
>>   // Step 1: Find the non-local dependencies of the load.
>>   LoadDepVect Deps;
>>   MD->getNonLocalPointerDependency(LI, Deps);
>>
>> So we get different results from MemoryDependenceResults when invoking
>> gvn alone or after a bunch of other passes.
>>
>> Is this expected or not?
>>
>>
>> I tried to dig into
MemoryDependenceResults::getNonLocalPointerDependency
>> but I have to say I'm quite lost.
>>
>> I ran some git bisect and as far as I can tell it starts going wrong
with
>> commit
>>
>>  [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
>>
>> 2016-09-01, but that commit doesn't change
GVN/MemoryDependenceResults so
>> I suppose it just changes the input to GVN so the problem suddenly
surfaces.
>>
>> Finally, the problem that I see is:
>>
>> In the input we have something like
>>
>>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>     {
>>         lb[step1] = step1 + 7;
>>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>>
>>         switch(lb[step1]) {
>>         case 7:    CVAL_VERIFY(step1 == 0);
>>             CVAL_VERIFY(ub[step1] == 10);
>>                         __label(511);
>>             break;
>>         case 8:    CVAL_VERIFY(step1 == 1);
>>             CVAL_VERIFY(ub[step1] == 100);
>>                         __label(512);
>>             break;
>>         case 9:    CVAL_VERIFY(step1 == 2);
>>             CVAL_VERIFY(ub[step1] == 1000);
>>                         __label(513);
>>             break;
>>         default:;
>>         }
>>
>>                 [...]
>>     }
>>
>>     int i, j;
>>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>     {
>>         CVAL_VERIFY(ub[i] == j);
>>     }
>>
>> So we have two loops. In the first one the array ub is written (low
index
>> first, high index last), and then in the second loop the content of ub
is
>> checked (low index first, high index last).
>>
>> After gvn this is changed into something like
>>
>>         int tmp;
>>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>     {
>>         lb[step1] = step1 + 7;
>>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>>
>>         switch(lb[step1]) {
>>         case 7:    CVAL_VERIFY(step1 == 0);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 10);
>>                         __label(511);
>>             break;
>>         case 8:    CVAL_VERIFY(step1 == 1);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 100);
>>                         __label(512);
>>             break;
>>         case 9:    CVAL_VERIFY(step1 == 2);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 1000);
>>                         __label(513);
>>             break;
>>         default:;
>>         }
>>
>>                 [...]
>>     }
>>
>>     int i, j;
>>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>     {
>>         CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
>>     }
>>
>> Note the introduced variable tmp.
>>
>> Also note that with this rewrite, after the first loop tmp will contain
>> the value of the element at the highest index in ub, and then we use
tmp in
>> the first round of the second loop, but there we expect the value of
the
>> lowest index element in ub.
>>
>> Any ideas? Should I file a PR?
>>
>
> Ooh, wow, that's nasty: it looks like PRE is somehow deciding that
> ub[step1] and ub[0] are equivalent.  Please file a bug.
>
> The reason you're having trouble splitting the "opt"
invocation is that
> you aren't passing in "-preserve-ll-uselistorder"; it usually
doesn't
> matter, but occasionally you'll run into an optimization which is
sensitive
> to it.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/ca4313cf/attachment.html>

Mikael Holmén via llvm-dev

2017-Jan-16 10:20 UTC

head link

[llvm-dev] Wrong code bug after GVN/PRE?

Hi,

On 01/13/2017 07:31 PM, Friedman, Eli wrote:> On 1/13/2017 12:31 AM, Mikael Holmén via llvm-dev wrote:
>> Hi,
>>
>> I've stumbled upon a case where I think gvn does a bad (wrong)
>> optimization. It's a bit messy to debug though so I'm not sure
if I
>> should just write a PR about it a let someone who knows the code look
>> at it instead.
>>
>> Anyway, for the bug to trigger I need to run the following passes in
>> the same opt invocation:
>>  -sroa -instcombine -simplifycfg -instcombine -gvn
>>
>> The problem seems to be that GVN::PerformLoadPRE triggers and I see a
>>
>> GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2
>>
>> printout.
>>
>> If I instead first run
>>
>>  -sroa -instcombine -simplifycfg -instcombine
>>
>> and then
>>
>>  -gvn
>>
>> I don't get the
>>
>> GVN REMOVING PRE LOAD
>>
>> printout, and the resulting code looks ok to me.
>>
>> Is this expected? Should the output differ in these two cases?
>>
>>
>> The input to gvn looks the same when running all passes and just gvn,
>> but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI)
>> behaves:
>>
>> I get different results from
>>
>>   // Step 1: Find the non-local dependencies of the load.
>>   LoadDepVect Deps;
>>   MD->getNonLocalPointerDependency(LI, Deps);
>>
>> So we get different results from MemoryDependenceResults when invoking
>> gvn alone or after a bunch of other passes.
>>
>> Is this expected or not?
>>
>>
>> I tried to dig into
>> MemoryDependenceResults::getNonLocalPointerDependency but I have to
>> say I'm quite lost.
>>
>> I ran some git bisect and as far as I can tell it starts going wrong
>> with commit
>>
>>  [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
>>
>> 2016-09-01, but that commit doesn't change
GVN/MemoryDependenceResults
>> so I suppose it just changes the input to GVN so the problem suddenly
>> surfaces.
>>
>> Finally, the problem that I see is:
>>
>> In the input we have something like
>>
>>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>     {
>>         lb[step1] = step1 + 7;
>>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>>
>>         switch(lb[step1]) {
>>         case 7:    CVAL_VERIFY(step1 == 0);
>>             CVAL_VERIFY(ub[step1] == 10);
>>                         __label(511);
>>             break;
>>         case 8:    CVAL_VERIFY(step1 == 1);
>>             CVAL_VERIFY(ub[step1] == 100);
>>                         __label(512);
>>             break;
>>         case 9:    CVAL_VERIFY(step1 == 2);
>>             CVAL_VERIFY(ub[step1] == 1000);
>>                         __label(513);
>>             break;
>>         default:;
>>         }
>>
>>                 [...]
>>     }
>>
>>     int i, j;
>>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>     {
>>         CVAL_VERIFY(ub[i] == j);
>>     }
>>
>> So we have two loops. In the first one the array ub is written (low
>> index first, high index last), and then in the second loop the content
>> of ub is checked (low index first, high index last).
>>
>> After gvn this is changed into something like
>>
>>         int tmp;
>>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>>     {
>>         lb[step1] = step1 + 7;
>>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>>
>>         switch(lb[step1]) {
>>         case 7:    CVAL_VERIFY(step1 == 0);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 10);
>>                         __label(511);
>>             break;
>>         case 8:    CVAL_VERIFY(step1 == 1);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 100);
>>                         __label(512);
>>             break;
>>         case 9:    CVAL_VERIFY(step1 == 2);
>>                         tmp = ub[step1];
>>             CVAL_VERIFY(tmp == 1000);
>>                         __label(513);
>>             break;
>>         default:;
>>         }
>>
>>                 [...]
>>     }
>>
>>     int i, j;
>>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>>     {
>>         CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
>>     }
>>
>> Note the introduced variable tmp.
>>
>> Also note that with this rewrite, after the first loop tmp will
>> contain the value of the element at the highest index in ub, and then
>> we use tmp in the first round of the second loop, but there we expect
>> the value of the lowest index element in ub.
>>
>> Any ideas? Should I file a PR?
>
> Ooh, wow, that's nasty: it looks like PRE is somehow deciding that
> ub[step1] and ub[0] are equivalent.
Yes seems like it :/
>  Please file a bug.
Alright!

https://llvm.org/bugs/show_bug.cgi?id=31651
>
> The reason you're having trouble splitting the "opt"
invocation is that
> you aren't passing in "-preserve-ll-uselistorder"; it usually
doesn't
> matter, but occasionally you'll run into an optimization which is
> sensitive to it.
Aha! Yes, when I used -preserve-ll-uselistorder I could split the 
passes, so now I see the problem when just running GVN. Thanks!

Regards,
Mikael
>
> -Eli
>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jan 2017 - Wrong code bug after GVN/PRE?

[llvm-dev] Wrong code bug after GVN/PRE?

[llvm-dev] Wrong code bug after GVN/PRE?

[llvm-dev] Wrong code bug after GVN/PRE?

[llvm-dev] Wrong code bug after GVN/PRE?

[llvm-dev] Wrong code bug after GVN/PRE?

Possibly Parallel Threads