thr3ads.net - llvm dev - [LLVMdev] Overzealous PromoteCastOfAllocation [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Matthijs Kooijman

2008-Sep-08 15:57 UTC

[LLVMdev] Overzealous PromoteCastOfAllocation

Hi all,

I'm currently running into some problems with instcombine changing the type
of
alloca instructions. In particular, the PromoteCastOfAllocation looks at any
allocation instruction that is used by a bitast.

It does a few checks, but basically tries to change the type of the alloca
instruction to the type pointed to by the bitcasted type. The current
heuristic for determining if this is a good idea, is "do it if the aligment
of
the new type is larger than before" (Note that this is not just a safety
check, since when the aligments are equal, the alloca is not changed). This
heuristic seems to have been introduced to prevent infinite loops when there
are multiple uses of the alloca.

At first glance, changing the alloca type to remove a bitcast seems a sensible
idea. However, the transformation seems quite ignorant of other uses of the
alloca. It does properly update them by inserting extra bitcasts, but in a lot
of cases, this means you can remove a single (or perhaps a few) bitcasts,
while introducing a bunch of other bitcasts. IMHO, a better policy is needed.

Another problem that occurs here, is that this transformation can replace a
struct alloca with a single integer alloca. This, in turn, can confuse other
passes like scalarrepl, and produce but ugly code. I've assembled a small
example of this.

	declare void @bar(i16, i16)

	define internal void @empty(i32* %X) {
		ret void
	}

	define void @foo(i16 %A, i16 %B) {
	entry:
		;; Alloc a struct
		%V = alloca {i16, i16}
		;; Save our arguments into it
		%V1 = getelementptr {i16, i16}* %V, i32 0, i32 0
		%V2 = getelementptr {i16, i16}* %V, i32 0, i32 1
		store i16 %A, i16* %V1
		store i16 %B, i16* %V2

		;; Load the values again
		%R1 = load i16* %V1
		%R2 = load i16* %V2
		;; Pass them to an external function to make them used
		call void @bar(i16 %R1, i16 %R2)

		;; Bitcast the alloca to i32&
		%V32 = bitcast {i16, i16}* %V to i32*
		;; And make it appear used (at first glance, deadargelim will
		;; can remove this use later on
		call void @empty(i32* %V32)
		ret void
	}

The above program does almost nothing, but contains an alloca of {i16, i16}
which is bitcasted to i32* somewhere. This i32* appears used, but can later be
removed (you'll see why).

Now, when I run instcombine over this code, I get the following (other
functions left out, they are unchanged).
	define void @foo(i16 %A, i16 %B) {
	entry:
					%V = alloca i32         ; <i32*> [#uses=3]
					%tmpcast = bitcast i32* %V to { i16, i16 }*             ; <{ i16, i16
}*> [#uses=1]
					%V1 = bitcast i32* %V to i16*           ; <i16*> [#uses=2]
					%V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32 1                ;
<i16*> [#uses=2]
					store i16 %A, i16* %V1, align 4
					store i16 %B, i16* %V2
					%R1 = load i16* %V1, align 4            ; <i16> [#uses=1]
					%R2 = load i16* %V2             ; <i16> [#uses=1]
					call void @bar( i16 %R1, i16 %R2 )
					call void @empty( i32* %V )
					ret void
	}

Here, the alloca is replaced by an i32 alloca. This doesn't seem like an
improvement to me, since now the proper uses of the struct (geps) need a
bitcast first. Though this looks like a minor problem at best, it confuses
scalarrepl.

When I run -deadargelim -die -scalarrepl over the instcombined code (the first
two to remove the i32* use and bitcast), I get the following ugly code:
	define void @foo(i16 %A, i16 %B) {
	entry:
					%A1 = zext i16 %A to i32                ; <i32> [#uses=1]
					%A12 = shl i32 %A1, 16          ; <i32> [#uses=1]
					%V.in.mask = and i32 undef, 65535               ; <i32> [#uses=1]
					%A12.ins = or i32 %V.in.mask, %A12              ; <i32> [#uses=1]
					%B9 = zext i16 %B to i32                ; <i32> [#uses=1]
					%V.in8.mask = and i32 %A12.ins, -65536          ; <i32> [#uses=1]
					%B9.ins = or i32 %V.in8.mask, %B9               ; <i32> [#uses=2]
					%R14 = lshr i32 %B9.ins, 16             ; <i32> [#uses=1]
					%R15 = trunc i32 %R14 to i16            ; <i16> [#uses=1]
					%R27 = trunc i32 %B9.ins to i16         ; <i16> [#uses=1]
					call void @bar( i16 %R15, i16 %R27 )
					call void @empty( )
					ret void
	}

However, when I run -deadargelim -die -scalarrepl directly on the original code,
I get the result I expect:
	define void @foo(i16 %A, i16 %B) {
	entry:
					call void @bar( i16 %A, i16 %B )
					call void @empty( )
					ret void
	}

Now, I know that the shift/zext/trunc mess above might be supposed to be
cleaned up by instcombine again (it isn't currently), I think that the
actual
problem lies with PromoteCastOfAllocation in instcombine. IMHO, the struct
alloca shouldn't have been replaced with an i32 alloca in the first place.

Also, in the code where I originally noticed this problem, the resulting code
is more complex and less likely to be simplified again. In particular, instead
of the @empty function I have a call to memmove, whose destination is passed
as an argument to another function. ArgumentPromotion is able to split up this
struct argument and remove the memmove, but by then the alloca is already
screwed up by instcombine and scalarrepl is no longer able to properly expand
it.


Any comments on this problem? Do you agree that the current replacement policy
of PromoteCastOfAllocation is a bit too wide?

I think that some other policy should be implemented in
PromoteCastOfAllocation, probably one that takes into account the number of
bitcasts that can be removed and the number that would be introduced, or
perhaps just refuse to change a struct alloca to an integer alloca, or...


Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080908/617834b4/attachment.sig>

Dan Gohman

2008-Sep-08 18:53 UTC

head link

[LLVMdev] Overzealous PromoteCastOfAllocation

Hi Matthijs,

Changing PromoteCastOfAllocation to not replace aggregate allocas with
non-aggregate allocas if they have GEP users sounds reasonable to me.

Finding the maximum alignment is sometimes still useful though, so
it would be nice to update the alignment field of the alloca even if
its type is left unchanged.

Dan

On Sep 8, 2008, at 8:57 AM, Matthijs Kooijman wrote:
> Hi all,
>
> I'm currently running into some problems with instcombine changing  
> the type of
> alloca instructions. In particular, the PromoteCastOfAllocation  
> looks at any
> allocation instruction that is used by a bitast.
>
> It does a few checks, but basically tries to change the type of the  
> alloca
> instruction to the type pointed to by the bitcasted type. The current
> heuristic for determining if this is a good idea, is "do it if the  
> aligment of
> the new type is larger than before" (Note that this is not just a  
> safety
> check, since when the aligments are equal, the alloca is not  
> changed). This
> heuristic seems to have been introduced to prevent infinite loops  
> when there
> are multiple uses of the alloca.
>
> At first glance, changing the alloca type to remove a bitcast seems  
> a sensible
> idea. However, the transformation seems quite ignorant of other uses  
> of the
> alloca. It does properly update them by inserting extra bitcasts,  
> but in a lot
> of cases, this means you can remove a single (or perhaps a few)  
> bitcasts,
> while introducing a bunch of other bitcasts. IMHO, a better policy  
> is needed.
>
> Another problem that occurs here, is that this transformation can  
> replace a
> struct alloca with a single integer alloca. This, in turn, can  
> confuse other
> passes like scalarrepl, and produce but ugly code. I've assembled a  
> small
> example of this.
>
> 	declare void @bar(i16, i16)
>
> 	define internal void @empty(i32* %X) {
> 		ret void
> 	}
>
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 		;; Alloc a struct
> 		%V = alloca {i16, i16}
> 		;; Save our arguments into it
> 		%V1 = getelementptr {i16, i16}* %V, i32 0, i32 0
> 		%V2 = getelementptr {i16, i16}* %V, i32 0, i32 1
> 		store i16 %A, i16* %V1
> 		store i16 %B, i16* %V2
>
> 		;; Load the values again
> 		%R1 = load i16* %V1
> 		%R2 = load i16* %V2
> 		;; Pass them to an external function to make them used
> 		call void @bar(i16 %R1, i16 %R2)
>
> 		;; Bitcast the alloca to i32&
> 		%V32 = bitcast {i16, i16}* %V to i32*
> 		;; And make it appear used (at first glance, deadargelim will
> 		;; can remove this use later on
> 		call void @empty(i32* %V32)
> 		ret void
> 	}
>
> The above program does almost nothing, but contains an alloca of  
> {i16, i16}
> which is bitcasted to i32* somewhere. This i32* appears used, but  
> can later be
> removed (you'll see why).
>
> Now, when I run instcombine over this code, I get the following (other
> functions left out, they are unchanged).
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					%V = alloca i32         ; <i32*> [#uses=3]
> 					%tmpcast = bitcast i32* %V to { i16, i16 }*             ;  
> <{ i16, i16 }*> [#uses=1]
> 					%V1 = bitcast i32* %V to i16*           ; <i16*> [#uses=2]
> 					%V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32  
> 1                ; <i16*> [#uses=2]
> 					store i16 %A, i16* %V1, align 4
> 					store i16 %B, i16* %V2
> 					%R1 = load i16* %V1, align 4            ; <i16> [#uses=1]
> 					%R2 = load i16* %V2             ; <i16> [#uses=1]
> 					call void @bar( i16 %R1, i16 %R2 )
> 					call void @empty( i32* %V )
> 					ret void
> 	}
>
> Here, the alloca is replaced by an i32 alloca. This doesn't seem  
> like an
> improvement to me, since now the proper uses of the struct (geps)  
> need a
> bitcast first. Though this looks like a minor problem at best, it  
> confuses
> scalarrepl.
>
> When I run -deadargelim -die -scalarrepl over the instcombined code  
> (the first
> two to remove the i32* use and bitcast), I get the following ugly  
> code:
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					%A1 = zext i16 %A to i32                ; <i32> [#uses=1]
> 					%A12 = shl i32 %A1, 16          ; <i32> [#uses=1]
> 					%V.in.mask = and i32 undef, 65535               ; <i32>
[#uses=1]
> 					%A12.ins = or i32 %V.in.mask, %A12              ; <i32>
[#uses=1]
> 					%B9 = zext i16 %B to i32                ; <i32> [#uses=1]
> 					%V.in8.mask = and i32 %A12.ins, -65536          ; <i32>
[#uses=1]
> 					%B9.ins = or i32 %V.in8.mask, %B9               ; <i32>
[#uses=2]
> 					%R14 = lshr i32 %B9.ins, 16             ; <i32> [#uses=1]
> 					%R15 = trunc i32 %R14 to i16            ; <i16> [#uses=1]
> 					%R27 = trunc i32 %B9.ins to i16         ; <i16> [#uses=1]
> 					call void @bar( i16 %R15, i16 %R27 )
> 					call void @empty( )
> 					ret void
> 	}
>
> However, when I run -deadargelim -die -scalarrepl directly on the  
> original code, I get the result I expect:
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					call void @bar( i16 %A, i16 %B )
> 					call void @empty( )
> 					ret void
> 	}
>
> Now, I know that the shift/zext/trunc mess above might be supposed  
> to be
> cleaned up by instcombine again (it isn't currently), I think that  
> the actual
> problem lies with PromoteCastOfAllocation in instcombine. IMHO, the  
> struct
> alloca shouldn't have been replaced with an i32 alloca in the first  
> place.
>
> Also, in the code where I originally noticed this problem, the  
> resulting code
> is more complex and less likely to be simplified again. In  
> particular, instead
> of the @empty function I have a call to memmove, whose destination  
> is passed
> as an argument to another function. ArgumentPromotion is able to  
> split up this
> struct argument and remove the memmove, but by then the alloca is  
> already
> screwed up by instcombine and scalarrepl is no longer able to  
> properly expand
> it.
>
>
> Any comments on this problem? Do you agree that the current  
> replacement policy
> of PromoteCastOfAllocation is a bit too wide?
>
> I think that some other policy should be implemented in
> PromoteCastOfAllocation, probably one that takes into account the  
> number of
> bitcasts that can be removed and the number that would be  
> introduced, or
> perhaps just refuse to change a struct alloca to an integer alloca,  
> or...
>
>
> Gr.
>
> Matthijs
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Matthijs Kooijman

2008-Sep-13 20:07 UTC

head link

[LLVMdev] Overzealous PromoteCastOfAllocation

Hi Dan,
> Changing PromoteCastOfAllocation to not replace aggregate allocas with
> non-aggregate allocas if they have GEP users sounds reasonable to me.This sounds reasonable indeed, but still a bit arbitrary. Haven't figured
out
anything better yet, though.
> Finding the maximum alignment is sometimes still useful though, so
> it would be nice to update the alignment field of the alloca even if
> its type is left unchanged.The maximizing of the alignment is done only looking at the type's ABI
alignment, the actual alignment of the alloca instruction is not used.

But what you suggest is that if the alloca is casted to some type that has
higher alignment, you want that higher allignment propagated to the alloca
instruction? I can see why this is useful, since bitcasting to a type with
higher alignment can actually produce invalid code, I think? Or how does this
work exactly?

Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080913/a7cc8e2c/attachment.sig>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Sep 2008 - [LLVMdev] Overzealous PromoteCastOfAllocation

[LLVMdev] Overzealous PromoteCastOfAllocation

[LLVMdev] Overzealous PromoteCastOfAllocation

[LLVMdev] Overzealous PromoteCastOfAllocation

Maybe Matching Threads