Philip Reames
2014-Sep-10 16:50 UTC
[LLVMdev] failed folding with constant array with opt -O3
I came in to an email this morning that said basically the same thing for the reduced example we were looking at. However, the original IR it came from (before hand reduction) had the data layout set correctly, so there's probably still *something* going on. It's just not what I thought at first. :) Philip On 09/10/2014 02:26 AM, Roel Jordans wrote:> Looking at the -debug output of opt shows that SROA was skipped due to > missing target data. > > Adding something like: > > target datalayout = "e-p:32:32:32-i32:32:32" > > to the top seems sufficient to fix the issue at -O3. > > By defining the size and storage requirements for i32 SROA is capable > of rewriting the array load into a constant scalar load which can then > be further optimized. > > Cheers, > Roel > > On 09/09/14 18:30, Peng Cheng wrote: >> I have the following simplified llvm ir, which basically returns value >> based on the first value of a constant array. >> >> ---- >> ; ModuleID = 'simple_ir3.txt' >> >> @f.b = constant [1 x i32] [i32 1], align 4 ; constant array >> with value 1 at the first element >> >> define void @f(i32* nocapture %l0) { >> entry: >> %fc_ = alloca [1 x i32] >> %f.b.v = load [1 x i32]* @f.b >> store [1 x i32] %f.b.v, [1 x i32]* %fc_ >> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 ; load the first >> element of the constant array, which is actually 1 >> %1 = load i32* %0 >> %tobool = icmp ne i32 %1, 0 ; check the first element to >> see if it is 1, which is actually always true since the first element of >> constant array is 1 >> br i1 %tobool, label %2, label %4 >> >> ; <label>:2 ; true branch >> store i32 1, i32* %l0; >> %3 = load i32* %l0; >> br label %4 >> >> ; <label>:4 >> %storemerge = phi i32 [ %3, %2 ], [ 0, %entry ] >> store i32 %storemerge, i32* %l0 >> ret void >> } >> --- >> >> I ran opt -O3 simple_ir.txt -S, and got: >> >> --- >> ; ModuleID = 'simple_ir3.txt' >> >> @f.b = constant [1 x i32] [i32 1], align 4 >> >> ; Function Attrs: nounwind >> define void @f(i32* nocapture %l0) #0 { >> entry: >> %fc_ = alloca [1 x i32] >> store [1 x i32] [i32 1], [1 x i32]* %fc_ >> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 >> %1 = load i32* %0 >> %tobool = icmp eq i32 %1, 0 >> br i1 %tobool, label %3, label %2 >> >> ; <label>:2 ; preds = %entry >> store i32 1, i32* %l0 >> br label %3 >> >> ; <label>:3 ; preds = %entry, %2 >> %storemerge = phi i32 [ 1, %2 ], [ 0, %entry ] >> store i32 %storemerge, i32* %l0 >> ret void >> } >> >> attributes #0 = { nounwind } >> --- >> >> I would expect that the constant folding, or some other transformations, >> would be able to fold the constant to get the following ir: >> >> --- >> define void @f(i32* nocapture %l0) #0 { >> store i32 1, i32* %l0 >> ret void >> } >> --- >> >> How could I get the expected optimized ir? update the original ir, or >> use different set of transformations? >> >> Any suggestions or comments? >> >> >> Thanks, >> -Peng >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Peng Cheng
2014-Sep-10 19:00 UTC
[LLVMdev] failed folding with constant array with opt -O3
Thanks for your help! Providing data layout works for the example I have. I also feel there is still something else going on during the investigation. I have another simple ir, which basically does the same thing as the original example except initialize the array element by element instead of by an constant array. --- define void @f(i32* nocapture %l0) { entry: %fc_ = alloca [1 x i32] %0 = getelementptr inbounds [1 x i32]* %fc_, i32 0, i32 0 store i32 1, i32* %0, align 4 %1 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 %2 = load i32* %1 %tobool = icmp eq i32 %2, 0 br i1 %tobool, label %4, label %3 ; <label>:3 ; preds = %entry store i32 1, i32* %l0 br label %4 ; <label>:4 ; preds = %entry, %3 %storemerge = phi i32 [ 1, %3 ], [ 0, %entry ] store i32 %storemerge, i32* %l0 ret void } --- For this ir, without target data layout, opt -O2 got the expected optmization: --- ; ModuleID = 't1.txt' ; Function Attrs: nounwind define void @f(i32* nocapture %l0) #0 { store i32 1, i32* %l0 ret void } attributes #0 = { nounwind } --- By checking the ir after each transformation, I see that gvn removes the load expression and the transformations before do nothing. So, I ran opt -gvn on the ir, but did not get load expression eliminated. Checking into the gvn code. Looks like the memory dependency computation got different results for the load expression with -O2 or -gvn. With O2, the load is not clobbered, but with gvn alone, it says clobbered. Does that sound expected? On Wed, Sep 10, 2014 at 12:50 PM, Philip Reames <listmail at philipreames.com> wrote:> I came in to an email this morning that said basically the same thing for > the reduced example we were looking at. However, the original IR it came > from (before hand reduction) had the data layout set correctly, so there's > probably still *something* going on. It's just not what I thought at > first. :) > > Philip > > > > On 09/10/2014 02:26 AM, Roel Jordans wrote: > >> Looking at the -debug output of opt shows that SROA was skipped due to >> missing target data. >> >> Adding something like: >> >> target datalayout = "e-p:32:32:32-i32:32:32" >> >> to the top seems sufficient to fix the issue at -O3. >> >> By defining the size and storage requirements for i32 SROA is capable of >> rewriting the array load into a constant scalar load which can then be >> further optimized. >> >> Cheers, >> Roel >> >> On 09/09/14 18:30, Peng Cheng wrote: >> >>> I have the following simplified llvm ir, which basically returns value >>> based on the first value of a constant array. >>> >>> ---- >>> ; ModuleID = 'simple_ir3.txt' >>> >>> @f.b = constant [1 x i32] [i32 1], align 4 ; constant array >>> with value 1 at the first element >>> >>> define void @f(i32* nocapture %l0) { >>> entry: >>> %fc_ = alloca [1 x i32] >>> %f.b.v = load [1 x i32]* @f.b >>> store [1 x i32] %f.b.v, [1 x i32]* %fc_ >>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 ; load the first >>> element of the constant array, which is actually 1 >>> %1 = load i32* %0 >>> %tobool = icmp ne i32 %1, 0 ; check the first element to >>> see if it is 1, which is actually always true since the first element of >>> constant array is 1 >>> br i1 %tobool, label %2, label %4 >>> >>> ; <label>:2 ; true branch >>> store i32 1, i32* %l0; >>> %3 = load i32* %l0; >>> br label %4 >>> >>> ; <label>:4 >>> %storemerge = phi i32 [ %3, %2 ], [ 0, %entry ] >>> store i32 %storemerge, i32* %l0 >>> ret void >>> } >>> --- >>> >>> I ran opt -O3 simple_ir.txt -S, and got: >>> >>> --- >>> ; ModuleID = 'simple_ir3.txt' >>> >>> @f.b = constant [1 x i32] [i32 1], align 4 >>> >>> ; Function Attrs: nounwind >>> define void @f(i32* nocapture %l0) #0 { >>> entry: >>> %fc_ = alloca [1 x i32] >>> store [1 x i32] [i32 1], [1 x i32]* %fc_ >>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 >>> %1 = load i32* %0 >>> %tobool = icmp eq i32 %1, 0 >>> br i1 %tobool, label %3, label %2 >>> >>> ; <label>:2 ; preds = %entry >>> store i32 1, i32* %l0 >>> br label %3 >>> >>> ; <label>:3 ; preds = %entry, %2 >>> %storemerge = phi i32 [ 1, %2 ], [ 0, %entry ] >>> store i32 %storemerge, i32* %l0 >>> ret void >>> } >>> >>> attributes #0 = { nounwind } >>> --- >>> >>> I would expect that the constant folding, or some other transformations, >>> would be able to fold the constant to get the following ir: >>> >>> --- >>> define void @f(i32* nocapture %l0) #0 { >>> store i32 1, i32* %l0 >>> ret void >>> } >>> --- >>> >>> How could I get the expected optimized ir? update the original ir, or >>> use different set of transformations? >>> >>> Any suggestions or comments? >>> >>> >>> Thanks, >>> -Peng >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/6880e97d/attachment.html>
Robin Morisset
2014-Sep-10 19:53 UTC
[LLVMdev] failed folding with constant array with opt -O3
GVN relies on alias analysis passes which are implied by -O2 but not -gvn. If you look at the tests in test/Transforms/GVN/ the call is usually to opt -basicaa -gvn. Could you try adding -basicaa (basic alias analysis) to your example ? Best regards, Robin On Wed, Sep 10, 2014 at 12:00 PM, Peng Cheng <gm4cheng at gmail.com> wrote:> Thanks for your help! Providing data layout works for the example I have. > > > I also feel there is still something else going on during the > investigation. > > I have another simple ir, which basically does the same thing as the > original example except initialize the array element by element instead of > by an constant array. > > --- > define void @f(i32* nocapture %l0) { > entry: > %fc_ = alloca [1 x i32] > %0 = getelementptr inbounds [1 x i32]* %fc_, i32 0, i32 0 > store i32 1, i32* %0, align 4 > %1 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 > %2 = load i32* %1 > %tobool = icmp eq i32 %2, 0 > br i1 %tobool, label %4, label %3 > > ; <label>:3 ; preds = %entry > store i32 1, i32* %l0 > br label %4 > > ; <label>:4 ; preds = %entry, %3 > %storemerge = phi i32 [ 1, %3 ], [ 0, %entry ] > store i32 %storemerge, i32* %l0 > ret void > } > --- > > For this ir, without target data layout, > opt -O2 got the expected optmization: > > --- > ; ModuleID = 't1.txt' > > ; Function Attrs: nounwind > define void @f(i32* nocapture %l0) #0 { > store i32 1, i32* %l0 > ret void > } > > attributes #0 = { nounwind } > --- > > By checking the ir after each transformation, I see that gvn removes the > load expression and the transformations before do nothing. > > So, I ran opt -gvn on the ir, but did not get load expression eliminated. > > Checking into the gvn code. Looks like the memory dependency computation > got different results for the load expression with -O2 or -gvn. With O2, > the load is not clobbered, but with gvn alone, it says clobbered. > > Does that sound expected? > > > > > > > On Wed, Sep 10, 2014 at 12:50 PM, Philip Reames <listmail at philipreames.com > > wrote: > >> I came in to an email this morning that said basically the same thing for >> the reduced example we were looking at. However, the original IR it came >> from (before hand reduction) had the data layout set correctly, so there's >> probably still *something* going on. It's just not what I thought at >> first. :) >> >> Philip >> >> >> >> On 09/10/2014 02:26 AM, Roel Jordans wrote: >> >>> Looking at the -debug output of opt shows that SROA was skipped due to >>> missing target data. >>> >>> Adding something like: >>> >>> target datalayout = "e-p:32:32:32-i32:32:32" >>> >>> to the top seems sufficient to fix the issue at -O3. >>> >>> By defining the size and storage requirements for i32 SROA is capable of >>> rewriting the array load into a constant scalar load which can then be >>> further optimized. >>> >>> Cheers, >>> Roel >>> >>> On 09/09/14 18:30, Peng Cheng wrote: >>> >>>> I have the following simplified llvm ir, which basically returns value >>>> based on the first value of a constant array. >>>> >>>> ---- >>>> ; ModuleID = 'simple_ir3.txt' >>>> >>>> @f.b = constant [1 x i32] [i32 1], align 4 ; constant array >>>> with value 1 at the first element >>>> >>>> define void @f(i32* nocapture %l0) { >>>> entry: >>>> %fc_ = alloca [1 x i32] >>>> %f.b.v = load [1 x i32]* @f.b >>>> store [1 x i32] %f.b.v, [1 x i32]* %fc_ >>>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 ; load the first >>>> element of the constant array, which is actually 1 >>>> %1 = load i32* %0 >>>> %tobool = icmp ne i32 %1, 0 ; check the first element to >>>> see if it is 1, which is actually always true since the first element of >>>> constant array is 1 >>>> br i1 %tobool, label %2, label %4 >>>> >>>> ; <label>:2 ; true branch >>>> store i32 1, i32* %l0; >>>> %3 = load i32* %l0; >>>> br label %4 >>>> >>>> ; <label>:4 >>>> %storemerge = phi i32 [ %3, %2 ], [ 0, %entry ] >>>> store i32 %storemerge, i32* %l0 >>>> ret void >>>> } >>>> --- >>>> >>>> I ran opt -O3 simple_ir.txt -S, and got: >>>> >>>> --- >>>> ; ModuleID = 'simple_ir3.txt' >>>> >>>> @f.b = constant [1 x i32] [i32 1], align 4 >>>> >>>> ; Function Attrs: nounwind >>>> define void @f(i32* nocapture %l0) #0 { >>>> entry: >>>> %fc_ = alloca [1 x i32] >>>> store [1 x i32] [i32 1], [1 x i32]* %fc_ >>>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 >>>> %1 = load i32* %0 >>>> %tobool = icmp eq i32 %1, 0 >>>> br i1 %tobool, label %3, label %2 >>>> >>>> ; <label>:2 ; preds = %entry >>>> store i32 1, i32* %l0 >>>> br label %3 >>>> >>>> ; <label>:3 ; preds = %entry, %2 >>>> %storemerge = phi i32 [ 1, %2 ], [ 0, %entry ] >>>> store i32 %storemerge, i32* %l0 >>>> ret void >>>> } >>>> >>>> attributes #0 = { nounwind } >>>> --- >>>> >>>> I would expect that the constant folding, or some other transformations, >>>> would be able to fold the constant to get the following ir: >>>> >>>> --- >>>> define void @f(i32* nocapture %l0) #0 { >>>> store i32 1, i32* %l0 >>>> ret void >>>> } >>>> --- >>>> >>>> How could I get the expected optimized ir? update the original ir, or >>>> use different set of transformations? >>>> >>>> Any suggestions or comments? >>>> >>>> >>>> Thanks, >>>> -Peng >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/09b70df6/attachment.html>
Roel Jordans
2014-Sep-10 19:58 UTC
[LLVMdev] failed folding with constant array with opt -O3
Adding some form of alias analysis like -basicaa to the list should solve the clobbering issue. You're probably also interested in some the following to get your code completely optimized again: -constprop -simplifycfg -dse Cheers, Roel On 10/09/14 21:00, Peng Cheng wrote:> Thanks for your help! Providing data layout works for the example I have. > > I also feel there is still something else going on during the > investigation. > > I have another simple ir, which basically does the same thing as the > original example except initialize the array element by element instead > of by an constant array. > > --- > define void @f(i32* nocapture %l0) { > entry: > %fc_ = alloca [1 x i32] > %0 = getelementptr inbounds [1 x i32]* %fc_, i32 0, i32 0 > store i32 1, i32* %0, align 4 > %1 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 > %2 = load i32* %1 > %tobool = icmp eq i32 %2, 0 > br i1 %tobool, label %4, label %3 > > ; <label>:3 ; preds = %entry > store i32 1, i32* %l0 > br label %4 > > ; <label>:4 ; preds = %entry, %3 > %storemerge = phi i32 [ 1, %3 ], [ 0, %entry ] > store i32 %storemerge, i32* %l0 > ret void > } > --- > > For this ir, without target data layout, > opt -O2 got the expected optmization: > > --- > ; ModuleID = 't1.txt' > > ; Function Attrs: nounwind > define void @f(i32* nocapture %l0) #0 { > store i32 1, i32* %l0 > ret void > } > > attributes #0 = { nounwind } > --- > > By checking the ir after each transformation, I see that gvn removes the > load expression and the transformations before do nothing. > > So, I ran opt -gvn on the ir, but did not get load expression eliminated. > > Checking into the gvn code. Looks like the memory dependency > computation got different results for the load expression with -O2 or > -gvn. With O2, the load is not clobbered, but with gvn alone, it says > clobbered. > > Does that sound expected? > > > > > > > On Wed, Sep 10, 2014 at 12:50 PM, Philip Reames > <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: > > I came in to an email this morning that said basically the same > thing for the reduced example we were looking at. However, the > original IR it came from (before hand reduction) had the data layout > set correctly, so there's probably still *something* going on. It's > just not what I thought at first. :) > > Philip > > > > On 09/10/2014 02:26 AM, Roel Jordans wrote: > > Looking at the -debug output of opt shows that SROA was skipped > due to missing target data. > > Adding something like: > > target datalayout = "e-p:32:32:32-i32:32:32" > > to the top seems sufficient to fix the issue at -O3. > > By defining the size and storage requirements for i32 SROA is > capable of rewriting the array load into a constant scalar load > which can then be further optimized. > > Cheers, > Roel > > On 09/09/14 18:30, Peng Cheng wrote: > > I have the following simplified llvm ir, which basically > returns value > based on the first value of a constant array. > > ---- > ; ModuleID = 'simple_ir3.txt' > > @f.b = constant [1 x i32] [i32 1], align 4 ; > constant array > with value 1 at the first element > > define void @f(i32* nocapture %l0) { > entry: > %fc_ = alloca [1 x i32] > %f.b.v = load [1 x i32]* @f.b > store [1 x i32] %f.b.v, [1 x i32]* %fc_ > %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 ; load > the first > element of the constant array, which is actually 1 > %1 = load i32* %0 > %tobool = icmp ne i32 %1, 0 ; check the > first element to > see if it is 1, which is actually always true since the > first element of > constant array is 1 > br i1 %tobool, label %2, label %4 > > ; <label>:2 ; true branch > store i32 1, i32* %l0; > %3 = load i32* %l0; > br label %4 > > ; <label>:4 > %storemerge = phi i32 [ %3, %2 ], [ 0, %entry ] > store i32 %storemerge, i32* %l0 > ret void > } > --- > > I ran opt -O3 simple_ir.txt -S, and got: > > --- > ; ModuleID = 'simple_ir3.txt' > > @f.b = constant [1 x i32] [i32 1], align 4 > > ; Function Attrs: nounwind > define void @f(i32* nocapture %l0) #0 { > entry: > %fc_ = alloca [1 x i32] > store [1 x i32] [i32 1], [1 x i32]* %fc_ > %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 > %1 = load i32* %0 > %tobool = icmp eq i32 %1, 0 > br i1 %tobool, label %3, label %2 > > ; <label>:2 ; preds > %entry > store i32 1, i32* %l0 > br label %3 > > ; <label>:3 ; preds > %entry, %2 > %storemerge = phi i32 [ 1, %2 ], [ 0, %entry ] > store i32 %storemerge, i32* %l0 > ret void > } > > attributes #0 = { nounwind } > --- > > I would expect that the constant folding, or some other > transformations, > would be able to fold the constant to get the following ir: > > --- > define void @f(i32* nocapture %l0) #0 { > store i32 1, i32* %l0 > ret void > } > --- > > How could I get the expected optimized ir? update the > original ir, or > use different set of transformations? > > Any suggestions or comments? > > > Thanks, > -Peng > > > _________________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev > <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > _________________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev > <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > > _________________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev > <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > >