Hi all, I'm currently running into some problems with instcombine changing the type of alloca instructions. In particular, the PromoteCastOfAllocation looks at any allocation instruction that is used by a bitast. It does a few checks, but basically tries to change the type of the alloca instruction to the type pointed to by the bitcasted type. The current heuristic for determining if this is a good idea, is "do it if the aligment of the new type is larger than before" (Note that this is not just a safety check, since when the aligments are equal, the alloca is not changed). This heuristic seems to have been introduced to prevent infinite loops when there are multiple uses of the alloca. At first glance, changing the alloca type to remove a bitcast seems a sensible idea. However, the transformation seems quite ignorant of other uses of the alloca. It does properly update them by inserting extra bitcasts, but in a lot of cases, this means you can remove a single (or perhaps a few) bitcasts, while introducing a bunch of other bitcasts. IMHO, a better policy is needed. Another problem that occurs here, is that this transformation can replace a struct alloca with a single integer alloca. This, in turn, can confuse other passes like scalarrepl, and produce but ugly code. I've assembled a small example of this. declare void @bar(i16, i16) define internal void @empty(i32* %X) { ret void } define void @foo(i16 %A, i16 %B) { entry: ;; Alloc a struct %V = alloca {i16, i16} ;; Save our arguments into it %V1 = getelementptr {i16, i16}* %V, i32 0, i32 0 %V2 = getelementptr {i16, i16}* %V, i32 0, i32 1 store i16 %A, i16* %V1 store i16 %B, i16* %V2 ;; Load the values again %R1 = load i16* %V1 %R2 = load i16* %V2 ;; Pass them to an external function to make them used call void @bar(i16 %R1, i16 %R2) ;; Bitcast the alloca to i32& %V32 = bitcast {i16, i16}* %V to i32* ;; And make it appear used (at first glance, deadargelim will ;; can remove this use later on call void @empty(i32* %V32) ret void } The above program does almost nothing, but contains an alloca of {i16, i16} which is bitcasted to i32* somewhere. This i32* appears used, but can later be removed (you'll see why). Now, when I run instcombine over this code, I get the following (other functions left out, they are unchanged). define void @foo(i16 %A, i16 %B) { entry: %V = alloca i32 ; <i32*> [#uses=3] %tmpcast = bitcast i32* %V to { i16, i16 }* ; <{ i16, i16 }*> [#uses=1] %V1 = bitcast i32* %V to i16* ; <i16*> [#uses=2] %V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32 1 ; <i16*> [#uses=2] store i16 %A, i16* %V1, align 4 store i16 %B, i16* %V2 %R1 = load i16* %V1, align 4 ; <i16> [#uses=1] %R2 = load i16* %V2 ; <i16> [#uses=1] call void @bar( i16 %R1, i16 %R2 ) call void @empty( i32* %V ) ret void } Here, the alloca is replaced by an i32 alloca. This doesn't seem like an improvement to me, since now the proper uses of the struct (geps) need a bitcast first. Though this looks like a minor problem at best, it confuses scalarrepl. When I run -deadargelim -die -scalarrepl over the instcombined code (the first two to remove the i32* use and bitcast), I get the following ugly code: define void @foo(i16 %A, i16 %B) { entry: %A1 = zext i16 %A to i32 ; <i32> [#uses=1] %A12 = shl i32 %A1, 16 ; <i32> [#uses=1] %V.in.mask = and i32 undef, 65535 ; <i32> [#uses=1] %A12.ins = or i32 %V.in.mask, %A12 ; <i32> [#uses=1] %B9 = zext i16 %B to i32 ; <i32> [#uses=1] %V.in8.mask = and i32 %A12.ins, -65536 ; <i32> [#uses=1] %B9.ins = or i32 %V.in8.mask, %B9 ; <i32> [#uses=2] %R14 = lshr i32 %B9.ins, 16 ; <i32> [#uses=1] %R15 = trunc i32 %R14 to i16 ; <i16> [#uses=1] %R27 = trunc i32 %B9.ins to i16 ; <i16> [#uses=1] call void @bar( i16 %R15, i16 %R27 ) call void @empty( ) ret void } However, when I run -deadargelim -die -scalarrepl directly on the original code, I get the result I expect: define void @foo(i16 %A, i16 %B) { entry: call void @bar( i16 %A, i16 %B ) call void @empty( ) ret void } Now, I know that the shift/zext/trunc mess above might be supposed to be cleaned up by instcombine again (it isn't currently), I think that the actual problem lies with PromoteCastOfAllocation in instcombine. IMHO, the struct alloca shouldn't have been replaced with an i32 alloca in the first place. Also, in the code where I originally noticed this problem, the resulting code is more complex and less likely to be simplified again. In particular, instead of the @empty function I have a call to memmove, whose destination is passed as an argument to another function. ArgumentPromotion is able to split up this struct argument and remove the memmove, but by then the alloca is already screwed up by instcombine and scalarrepl is no longer able to properly expand it. Any comments on this problem? Do you agree that the current replacement policy of PromoteCastOfAllocation is a bit too wide? I think that some other policy should be implemented in PromoteCastOfAllocation, probably one that takes into account the number of bitcasts that can be removed and the number that would be introduced, or perhaps just refuse to change a struct alloca to an integer alloca, or... Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080908/617834b4/attachment.sig>
Hi Matthijs, Changing PromoteCastOfAllocation to not replace aggregate allocas with non-aggregate allocas if they have GEP users sounds reasonable to me. Finding the maximum alignment is sometimes still useful though, so it would be nice to update the alignment field of the alloca even if its type is left unchanged. Dan On Sep 8, 2008, at 8:57 AM, Matthijs Kooijman wrote:> Hi all, > > I'm currently running into some problems with instcombine changing > the type of > alloca instructions. In particular, the PromoteCastOfAllocation > looks at any > allocation instruction that is used by a bitast. > > It does a few checks, but basically tries to change the type of the > alloca > instruction to the type pointed to by the bitcasted type. The current > heuristic for determining if this is a good idea, is "do it if the > aligment of > the new type is larger than before" (Note that this is not just a > safety > check, since when the aligments are equal, the alloca is not > changed). This > heuristic seems to have been introduced to prevent infinite loops > when there > are multiple uses of the alloca. > > At first glance, changing the alloca type to remove a bitcast seems > a sensible > idea. However, the transformation seems quite ignorant of other uses > of the > alloca. It does properly update them by inserting extra bitcasts, > but in a lot > of cases, this means you can remove a single (or perhaps a few) > bitcasts, > while introducing a bunch of other bitcasts. IMHO, a better policy > is needed. > > Another problem that occurs here, is that this transformation can > replace a > struct alloca with a single integer alloca. This, in turn, can > confuse other > passes like scalarrepl, and produce but ugly code. I've assembled a > small > example of this. > > declare void @bar(i16, i16) > > define internal void @empty(i32* %X) { > ret void > } > > define void @foo(i16 %A, i16 %B) { > entry: > ;; Alloc a struct > %V = alloca {i16, i16} > ;; Save our arguments into it > %V1 = getelementptr {i16, i16}* %V, i32 0, i32 0 > %V2 = getelementptr {i16, i16}* %V, i32 0, i32 1 > store i16 %A, i16* %V1 > store i16 %B, i16* %V2 > > ;; Load the values again > %R1 = load i16* %V1 > %R2 = load i16* %V2 > ;; Pass them to an external function to make them used > call void @bar(i16 %R1, i16 %R2) > > ;; Bitcast the alloca to i32& > %V32 = bitcast {i16, i16}* %V to i32* > ;; And make it appear used (at first glance, deadargelim will > ;; can remove this use later on > call void @empty(i32* %V32) > ret void > } > > The above program does almost nothing, but contains an alloca of > {i16, i16} > which is bitcasted to i32* somewhere. This i32* appears used, but > can later be > removed (you'll see why). > > Now, when I run instcombine over this code, I get the following (other > functions left out, they are unchanged). > define void @foo(i16 %A, i16 %B) { > entry: > %V = alloca i32 ; <i32*> [#uses=3] > %tmpcast = bitcast i32* %V to { i16, i16 }* ; > <{ i16, i16 }*> [#uses=1] > %V1 = bitcast i32* %V to i16* ; <i16*> [#uses=2] > %V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32 > 1 ; <i16*> [#uses=2] > store i16 %A, i16* %V1, align 4 > store i16 %B, i16* %V2 > %R1 = load i16* %V1, align 4 ; <i16> [#uses=1] > %R2 = load i16* %V2 ; <i16> [#uses=1] > call void @bar( i16 %R1, i16 %R2 ) > call void @empty( i32* %V ) > ret void > } > > Here, the alloca is replaced by an i32 alloca. This doesn't seem > like an > improvement to me, since now the proper uses of the struct (geps) > need a > bitcast first. Though this looks like a minor problem at best, it > confuses > scalarrepl. > > When I run -deadargelim -die -scalarrepl over the instcombined code > (the first > two to remove the i32* use and bitcast), I get the following ugly > code: > define void @foo(i16 %A, i16 %B) { > entry: > %A1 = zext i16 %A to i32 ; <i32> [#uses=1] > %A12 = shl i32 %A1, 16 ; <i32> [#uses=1] > %V.in.mask = and i32 undef, 65535 ; <i32> [#uses=1] > %A12.ins = or i32 %V.in.mask, %A12 ; <i32> [#uses=1] > %B9 = zext i16 %B to i32 ; <i32> [#uses=1] > %V.in8.mask = and i32 %A12.ins, -65536 ; <i32> [#uses=1] > %B9.ins = or i32 %V.in8.mask, %B9 ; <i32> [#uses=2] > %R14 = lshr i32 %B9.ins, 16 ; <i32> [#uses=1] > %R15 = trunc i32 %R14 to i16 ; <i16> [#uses=1] > %R27 = trunc i32 %B9.ins to i16 ; <i16> [#uses=1] > call void @bar( i16 %R15, i16 %R27 ) > call void @empty( ) > ret void > } > > However, when I run -deadargelim -die -scalarrepl directly on the > original code, I get the result I expect: > define void @foo(i16 %A, i16 %B) { > entry: > call void @bar( i16 %A, i16 %B ) > call void @empty( ) > ret void > } > > Now, I know that the shift/zext/trunc mess above might be supposed > to be > cleaned up by instcombine again (it isn't currently), I think that > the actual > problem lies with PromoteCastOfAllocation in instcombine. IMHO, the > struct > alloca shouldn't have been replaced with an i32 alloca in the first > place. > > Also, in the code where I originally noticed this problem, the > resulting code > is more complex and less likely to be simplified again. In > particular, instead > of the @empty function I have a call to memmove, whose destination > is passed > as an argument to another function. ArgumentPromotion is able to > split up this > struct argument and remove the memmove, but by then the alloca is > already > screwed up by instcombine and scalarrepl is no longer able to > properly expand > it. > > > Any comments on this problem? Do you agree that the current > replacement policy > of PromoteCastOfAllocation is a bit too wide? > > I think that some other policy should be implemented in > PromoteCastOfAllocation, probably one that takes into account the > number of > bitcasts that can be removed and the number that would be > introduced, or > perhaps just refuse to change a struct alloca to an integer alloca, > or... > > > Gr. > > Matthijs > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Dan,> Changing PromoteCastOfAllocation to not replace aggregate allocas with > non-aggregate allocas if they have GEP users sounds reasonable to me.This sounds reasonable indeed, but still a bit arbitrary. Haven't figured out anything better yet, though.> Finding the maximum alignment is sometimes still useful though, so > it would be nice to update the alignment field of the alloca even if > its type is left unchanged.The maximizing of the alignment is done only looking at the type's ABI alignment, the actual alignment of the alloca instruction is not used. But what you suggest is that if the alloca is casted to some type that has higher alignment, you want that higher allignment propagated to the alloca instruction? I can see why this is useful, since bitcasting to a type with higher alignment can actually produce invalid code, I think? Or how does this work exactly? Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080913/a7cc8e2c/attachment.sig>