thr3ads.net - llvm dev - [LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship [May 2011]

If this information is useful, please help other people find it:
Share via:

Chuck Zhao

2011-May-04 23:16 UTC

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

While working on my optimization pass (a Function Pass), I try to 
replicate a call instruction and insert it at some earlier location 
(similar to LICM).
However, the instruction I am trying to replicate has dependencies on an 
uncertain number of instructions that are used to generate an address.

A simple example (IR segment):

define void @foo() nounwind {
entry:
%a = alloca i32, align 4;                   ; [0]
   %b = alloca i32, align 4
   %c = alloca i32, align 4
   %d = alloca i32, align 4
   %e = alloca i32, align 4
   %f = alloca i32, align 4
   %A = alloca [100 x i32], align 4
   call void @start_ckpt() nounwind
   %0 = call i32 @puts(i8* getelementptr inbounds ([8 x i8]* @.str, i32 
0, i32 0)) nounwind
%a1 = bitcast i32* %a to i8*          ;  [1]
call void @bkp_memory(i8* %a1, i32 4) nounwind            ; [2] the 
function call instruction that I am about to replicate

I need to make a copy/clone of the call instruction [2]. However, 
I->clone() failed on [2], because [2] has dependency chained 
instructions that all will need to be cloned before itself can.
The back-trace of dependent instructions go to: [1] and [0]. Thus both 
[1] and [2] need to be cloned in order before replicate [2].


A more interesting example is given below (IR Segment):

bb1:                                              ; preds = %bb29, 
%bb30.preheader
   %match_length.354 = phi i32 [ %match_length.2.ph, %bb29 ], [ 2, 
%bb30.preheader ]
   %match_available.153 = phi i32 [ %match_available.0.ph, %bb29 ], [ 0, 
%bb30.preheader ]
   tail call void @start_ckpt() nounwind noinline
   %88 = load i32* @ins_h, align 4               ; [0]
   %89 = shl i32 %88, 5                                   ;[1]
   %90 = load i32* @strstart, align 4           ; [2]
   %91 = add i32 %90, 2                                ; [3]
   %92 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %91   
; [4]
   %93 = load i8* %92, align 1                      ; [5]
   %94 = zext i8 %93 to i32                          ; [6]
   %.masked = and i32 %89, 32736                ;[7]
   %95 = xor i32 %94, %.masked                    ;[8]
   %take_addr = getelementptr i32* @ins_h
   %96 = bitcast i32* %take_addr to i8*
   call void @bkp_memory(i8* %96, i32 4)
   store i32 %95, i32* @ins_h, align 4
   %97 = and i32 %90, 32767
   %.sum39 = or i32 %95, 32768
   %98 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39
   %99 = load i16* %98, align 2
   %100 = zext i16 %99 to i32
   %101 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %97
   %take_addr1 = getelementptr i16* %101
   %102 = bitcast i16* %take_addr1 to i8*
   call void @bkp_memory(i8* %102, i32 2)
   store i16 %99, i16* %101, align 2
   %103 = trunc i32 %90 to i16
   %.sum73 = or i32 %95, 32768                    ;[9]
   %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 
%.sum73    ;[10]
   %take_addr2 = getelementptr i16* %104 ; [11]
   %105 = bitcast i16* %take_addr2 to i8*   ;[12]
   call void @bkp_memory(i8* %105, i32 2); [13]

Again, attempting to replicate a call instruction [13]. But, this time 
the dependency chain is much longer and goes way further back, which 
includes [0] .. [12].

Knowing this dependency chain could be arbitrarily long, with multiple 
levels of use-def and def-use relationships. The termination conditions 
could be one of:
(1). local variable declaration (Alloca)
(2). global variable
(3). const

I think it will need a depth-first or breath-first search to identify 
all related instructions before the I->clone() can possibly happen.

I am asking if there is a known-good way of doing this, or some LLVM API 
provided, since making a copy of an existing LLVM instruction seems to 
be a very common task in code transformations.


Thank you very much

Chuck

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110504/f9136b61/attachment.html>

John Criswell

2011-May-05 00:20 UTC

head link

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

Dear Chuck,

I haven't read all of the details, but it seems that what you need to do 
is to clone defs before you clone any uses of the def.  To do that, you 
want to iterate over the instructions in dominator-tree order.

To do that, you first construct the dominator tree (there is an LLVM 
analysis pass that does that).  Then, you iterate over the basic blocks 
in the dominator tree from the root of the tree to the bottom; make sure 
you process each basic block before processing any of its children.  
When you process a basic block, you clone the instructions in that basic 
block.

This ensures that you process all defs before uses in a def-use chain 
for everything except phi-nodes.  Phi-nodes are special because they are 
not dominated by their inputs.  If your phi-node copying is simple, you 
want to create a dummy phi-node whose inputs are all undef.  After 
you've finished copying all instructions, you go back and change the 
operands of the phi-nodes to be what you want them to be.

The above works well for simple cases in which knowing where to insert 
phi-nodes is easy (e.g., you are inserting instrumentation in such a way 
that you are adding a phi-node that mirrors an existing phi-node within 
the program).  Knowing where to put phi-nodes can be difficult in 
general.  For such cases, you should make alloca'ed memory locations for 
each variable you are adding in the program.  You then use load/store 
instructions to read/write the data into/out of SSA variables, 
eliminating the need to add phi-nodes.  Then, when your pass is 
finished, run the mem2reg pass; it will convert your alloca'ed variables 
into SSA virtual registers and insert phi-nodes for you.

As an aside, knowing how the SSA construction algorithm works is what 
led me to understand how to visit defs before uses in LLVM IR.  I 
recommend reading Cytron et. al.'s paper 
(http://portal.acm.org/citation.cfm?id=115320).

-- John T.

On 5/4/11 6:16 PM, Chuck Zhao wrote:> While working on my optimization pass (a Function Pass), I try to 
> replicate a call instruction and insert it at some earlier location 
> (similar to LICM).
> However, the instruction I am trying to replicate has dependencies on 
> an uncertain number of instructions that are used to generate an address.
>
> A simple example (IR segment):
>
> define void @foo() nounwind {
> entry:
> %a = alloca i32, align 4;                   ; [0]
>   %b = alloca i32, align 4
>   %c = alloca i32, align 4
>   %d = alloca i32, align 4
>   %e = alloca i32, align 4
>   %f = alloca i32, align 4
>   %A = alloca [100 x i32], align 4
>   call void @start_ckpt() nounwind
>   %0 = call i32 @puts(i8* getelementptr inbounds ([8 x i8]* @.str, i32 
> 0, i32 0)) nounwind
> %a1 = bitcast i32* %a to i8*          ;  [1]
> call void @bkp_memory(i8* %a1, i32 4) nounwind            ; [2] the 
> function call instruction that I am about to replicate
>
> I need to make a copy/clone of the call instruction [2]. However, 
> I->clone() failed on [2], because [2] has dependency chained 
> instructions that all will need to be cloned before itself can.
> The back-trace of dependent instructions go to: [1] and [0]. Thus both 
> [1] and [2] need to be cloned in order before replicate [2].
>
>
> A more interesting example is given below (IR Segment):
>
> bb1:                                              ; preds = %bb29, 
> %bb30.preheader
>   %match_length.354 = phi i32 [ %match_length.2.ph, %bb29 ], [ 2, 
> %bb30.preheader ]
>   %match_available.153 = phi i32 [ %match_available.0.ph, %bb29 ], [ 
> 0, %bb30.preheader ]
>   tail call void @start_ckpt() nounwind noinline
>   %88 = load i32* @ins_h, align 4               ; [0]
>   %89 = shl i32 %88, 5                                   ;[1]
>   %90 = load i32* @strstart, align 4           ; [2]
>   %91 = add i32 %90, 2                                ; [3]
>   %92 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %91   
> ; [4]
>   %93 = load i8* %92, align 1                      ; [5]
>   %94 = zext i8 %93 to i32                          ; [6]
>   %.masked = and i32 %89, 32736                ;[7]
>   %95 = xor i32 %94, %.masked                    ;[8]
>   %take_addr = getelementptr i32* @ins_h
>   %96 = bitcast i32* %take_addr to i8*
>   call void @bkp_memory(i8* %96, i32 4)
>   store i32 %95, i32* @ins_h, align 4
>   %97 = and i32 %90, 32767
>   %.sum39 = or i32 %95, 32768
>   %98 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39
>   %99 = load i16* %98, align 2
>   %100 = zext i16 %99 to i32
>   %101 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %97
>   %take_addr1 = getelementptr i16* %101
>   %102 = bitcast i16* %take_addr1 to i8*
>   call void @bkp_memory(i8* %102, i32 2)
>   store i16 %99, i16* %101, align 2
>   %103 = trunc i32 %90 to i16
>   %.sum73 = or i32 %95, 32768                    ;[9]
>   %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 
> %.sum73    ;[10]
>   %take_addr2 = getelementptr i16* %104 ; [11]
>   %105 = bitcast i16* %take_addr2 to i8*   ;[12]
>   call void @bkp_memory(i8* %105, i32 2); [13]
>
> Again, attempting to replicate a call instruction [13]. But, this time 
> the dependency chain is much longer and goes way further back, which 
> includes [0] .. [12].
>
> Knowing this dependency chain could be arbitrarily long, with multiple 
> levels of use-def and def-use relationships. The termination 
> conditions could be one of:
> (1). local variable declaration (Alloca)
> (2). global variable
> (3). const
>
> I think it will need a depth-first or breath-first search to identify 
> all related instructions before the I->clone() can possibly happen.
>
> I am asking if there is a known-good way of doing this, or some LLVM 
> API provided, since making a copy of an existing LLVM instruction 
> seems to be a very common task in code transformations.
>
>
> Thank you very much
>
> Chuck
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110504/781be4d4/attachment.html>

Chuck Zhao

2011-May-09 02:11 UTC

head link

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

John, Ether,

Thank you for the reply.

With John's help and suggestions from Ether, I can now identify the full 
list of dependent instructions that I need to make a copy and insert 
somewhere else (before their original location).

Here is the list:

0    %90 = load i32* @strstart, align 4
1    %91 = add i32 %90, 2
2    %88 = load i32* @ins_h, align 4
3    %92 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %91
4    %89 = shl i32 %88, 5
5    %93 = load i8* %92, align 1
6    %.masked = and i32 %89, 32736
7    %94 = zext i8 %93 to i32
8    %95 = xor i32 %94, %.masked
9    %.sum73 = or i32 %95, 32768
10    %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73
11    %take_addr2 = getelementptr i16* %104
12    %105 = bitcast i16* %take_addr2 to i8*
13    call void @bkp_memory(i8* %105, i32 2)

But, when I try to insert them in the above order into a PREVIOUS 
location, the "Instruction does not dominate all uses!" still shows
up:

Here is the code I was trying to do the clone and insert:

std::vector<Instruction *>::iterator p;
Instruction * Pos = INSERT_POSITION;
for(p = coll.begin(); p != coll.end(); ++p){
   Instruction * CurI = *p;
   Instruction * CloneI = CurI->clone();
   CloneI->InsertAfter(Pos);
   Pos = CloneI;
}

i am not sure if this it the right approach to insert a list of 
dependent instructions. (For delete instructions, people need to collect 
them 1st before removing. Are there similar process for insertion?)
There doesn't seem to have Phi node in the list.

The instructions seem to be inside a single BasicBlock. Would it help?

Thank you very much

Chuck

P.S.
Here is the full list of errors:

Instruction does not dominate all uses!
   %88 = bitcast i16* %take_addr2 to i8*
   call void @bkp_memory(i8* %88, i32 2)
Instruction does not dominate all uses!
   %take_addr2 = getelementptr i16* %89
   %88 = bitcast i16* %take_addr2 to i8*
Instruction does not dominate all uses!
   %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73
   %take_addr2 = getelementptr i16* %89
Instruction does not dominate all uses!
   %.sum73 = or i32 %90, 32768
   %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73
Instruction does not dominate all uses!
   %90 = xor i32 %91, %.masked
   %.sum73 = or i32 %90, 32768
Instruction does not dominate all uses!
   %91 = zext i8 %92 to i32
   %90 = xor i32 %91, %.masked
Instruction does not dominate all uses!
   %92 = load i8* %94, align 1
   %91 = zext i8 %92 to i32
Instruction does not dominate all uses!
   %93 = shl i32 %95, 5
   %.masked = and i32 %93, 32736
Instruction does not dominate all uses!
   %94 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %96
   %92 = load i8* %94, align 1
Instruction does not dominate all uses!
   %95 = load i32* @ins_h, align 4
   %93 = shl i32 %95, 5
Instruction does not dominate all uses!
   %96 = add i32 %97, 2
   %94 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 %96
Instruction does not dominate all uses!
   %97 = load i32* @strstart, align 4
   %96 = add i32 %97, 2
Instruction does not dominate all uses!
   %90 = xor i32 %91, %.masked
   store i32 %90, i32* @ins_h, align 4
Instruction does not dominate all uses!
   %90 = xor i32 %91, %.masked
   %.sum39 = or i32 %90, 32768
Instruction does not dominate all uses!
   %.sum39 = or i32 %90, 32768
   %100 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39
Instruction does not dominate all uses!
   %100 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39
   %101 = load i16* %100, align 2
Instruction does not dominate all uses!
   %101 = load i16* %100, align 2
   %102 = zext i16 %101 to i32
Instruction does not dominate all uses!
   %101 = load i16* %100, align 2
   store i16 %101, i16* %103, align 2
Instruction does not dominate all uses!
   %89 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum73
   store i16 %105, i16* %89, align 2
Instruction does not dominate all uses!
   %101 = load i16* %100, align 2
   %108 = icmp ne i16 %101, 0
Instruction does not dominate all uses!
   %108 = icmp ne i16 %101, 0
   %or.cond47 = and i1 %108, %110
Instruction does not dominate all uses!
   %or.cond47 = and i1 %108, %110
   br i1 %or.cond47, label %bb3, label %bb9


Broken module found, compilation aborted!



On 5/4/2011 8:20 PM, John Criswell wrote:> Dear Chuck,
>
> I haven't read all of the details, but it seems that what you need to 
> do is to clone defs before you clone any uses of the def.  To do that, 
> you want to iterate over the instructions in dominator-tree order.
>
> To do that, you first construct the dominator tree (there is an LLVM 
> analysis pass that does that).  Then, you iterate over the basic 
> blocks in the dominator tree from the root of the tree to the bottom; 
> make sure you process each basic block before processing any of its 
> children.  When you process a basic block, you clone the instructions 
> in that basic block.
>
> This ensures that you process all defs before uses in a def-use chain 
> for everything except phi-nodes.  Phi-nodes are special because they 
> are not dominated by their inputs.  If your phi-node copying is 
> simple, you want to create a dummy phi-node whose inputs are all 
> undef.  After you've finished copying all instructions, you go back 
> and change the operands of the phi-nodes to be what you want them to be.
>
> The above works well for simple cases in which knowing where to insert 
> phi-nodes is easy (e.g., you are inserting instrumentation in such a 
> way that you are adding a phi-node that mirrors an existing phi-node 
> within the program).  Knowing where to put phi-nodes can be difficult 
> in general.  For such cases, you should make alloca'ed memory 
> locations for each variable you are adding in the program.  You then 
> use load/store instructions to read/write the data into/out of SSA 
> variables, eliminating the need to add phi-nodes.  Then, when your 
> pass is finished, run the mem2reg pass; it will convert your alloca'ed 
> variables into SSA virtual registers and insert phi-nodes for you.
>
> As an aside, knowing how the SSA construction algorithm works is what 
> led me to understand how to visit defs before uses in LLVM IR.  I 
> recommend reading Cytron et. al.'s paper 
> (http://portal.acm.org/citation.cfm?id=115320).
>
> -- John T.
>
>
> On 5/4/11 6:16 PM, Chuck Zhao wrote:
>> While working on my optimization pass (a Function Pass), I try to 
>> replicate a call instruction and insert it at some earlier location 
>> (similar to LICM).
>> However, the instruction I am trying to replicate has dependencies on 
>> an uncertain number of instructions that are used to generate an
address.
>>
>> A simple example (IR segment):
>>
>> define void @foo() nounwind {
>> entry:
>> %a = alloca i32, align 4;                   ; [0]
>>   %b = alloca i32, align 4
>>   %c = alloca i32, align 4
>>   %d = alloca i32, align 4
>>   %e = alloca i32, align 4
>>   %f = alloca i32, align 4
>>   %A = alloca [100 x i32], align 4
>>   call void @start_ckpt() nounwind
>>   %0 = call i32 @puts(i8* getelementptr inbounds ([8 x i8]* @.str, 
>> i32 0, i32 0)) nounwind
>> %a1 = bitcast i32* %a to i8*          ;  [1]
>> call void @bkp_memory(i8* %a1, i32 4) nounwind            ; [2] the 
>> function call instruction that I am about to replicate
>>
>> I need to make a copy/clone of the call instruction [2]. However, 
>> I->clone() failed on [2], because [2] has dependency chained 
>> instructions that all will need to be cloned before itself can.
>> The back-trace of dependent instructions go to: [1] and [0]. Thus 
>> both [1] and [2] need to be cloned in order before replicate [2].
>>
>>
>> A more interesting example is given below (IR Segment):
>>
>> bb1:                                              ; preds = %bb29, 
>> %bb30.preheader
>>   %match_length.354 = phi i32 [ %match_length.2.ph, %bb29 ], [ 2, 
>> %bb30.preheader ]
>>   %match_available.153 = phi i32 [ %match_available.0.ph, %bb29 ], [ 
>> 0, %bb30.preheader ]
>>   tail call void @start_ckpt() nounwind noinline
>>   %88 = load i32* @ins_h, align 4               ; [0]
>>   %89 = shl i32 %88, 5                                   ;[1]
>>   %90 = load i32* @strstart, align 4           ; [2]
>>   %91 = add i32 %90, 2                                ; [3]
>>   %92 = getelementptr inbounds [65536 x i8]* @window, i32 0, i32 
>> %91   ; [4]
>>   %93 = load i8* %92, align 1                      ; [5]
>>   %94 = zext i8 %93 to i32                          ; [6]
>>   %.masked = and i32 %89, 32736                ;[7]
>>   %95 = xor i32 %94, %.masked                    ;[8]
>>   %take_addr = getelementptr i32* @ins_h
>>   %96 = bitcast i32* %take_addr to i8*
>>   call void @bkp_memory(i8* %96, i32 4)
>>   store i32 %95, i32* @ins_h, align 4
>>   %97 = and i32 %90, 32767
>>   %.sum39 = or i32 %95, 32768
>>   %98 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %.sum39
>>   %99 = load i16* %98, align 2
>>   %100 = zext i16 %99 to i32
>>   %101 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 %97
>>   %take_addr1 = getelementptr i16* %101
>>   %102 = bitcast i16* %take_addr1 to i8*
>>   call void @bkp_memory(i8* %102, i32 2)
>>   store i16 %99, i16* %101, align 2
>>   %103 = trunc i32 %90 to i16
>>   %.sum73 = or i32 %95, 32768                    ;[9]
>>   %104 = getelementptr inbounds [65536 x i16]* @prev, i32 0, i32 
>> %.sum73    ;[10]
>>   %take_addr2 = getelementptr i16* %104 ; [11]
>>   %105 = bitcast i16* %take_addr2 to i8*   ;[12]
>>   call void @bkp_memory(i8* %105, i32 2); [13]
>>
>> Again, attempting to replicate a call instruction [13]. But, this 
>> time the dependency chain is much longer and goes way further back, 
>> which includes [0] .. [12].
>>
>> Knowing this dependency chain could be arbitrarily long, with 
>> multiple levels of use-def and def-use relationships. The termination 
>> conditions could be one of:
>> (1). local variable declaration (Alloca)
>> (2). global variable
>> (3). const
>>
>> I think it will need a depth-first or breath-first search to identify 
>> all related instructions before the I->clone() can possibly happen.
>>
>> I am asking if there is a known-good way of doing this, or some LLVM 
>> API provided, since making a copy of an existing LLVM instruction 
>> seems to be a very common task in code transformations.
>>
>>
>> Thank you very much
>>
>> Chuck
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu          http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110508/987ee53d/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - May 2011 - [LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

[LLVMdev] identifying all dependent instructions through multi-levels of def-use relationship

Reasonably Related Threads