Louis Gerbarg
2014-May-22 23:54 UTC
[LLVMdev] RFC: Indexing of structs vs arrays in getelementpointer
On May 22, 2014, at 3:51 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Thu, May 22, 2014 at 4:42 PM, Louis Gerbarg <lgg at apple.com> wrote: > The problem that the above transform is technically illegal because “When indexing into a (optionally packed) structure, only i32 integer constants are allowed (when using a vector of indices they must all be the same i32 integer constant).” rule <http://llvm.org/docs/LangRef.html#getelementptr-instruction>. > > Wait, I don't follow. You don't violate this rule. The first index in GEP is *not* into the structure, it is along the implicit array of objects that any pointer refers to. Thus the i64 operand to the first GEP selects a different struct object in an array of them, and the first i64 operand to the second GEP re-selects the same struct object but then uses an i32 index into it. > > Perhaps you need a better example to show the illegal transform? That would help me understand the rest of your problem.Hmm… you are correct, it turns out you are correct, I only have to worry if it is constant after the second arg of the GEP. That should allow my case to continue to work. I am currently building a stage2 now to see if works cleanly. Having said that, there still exist cases where you are indexing into a homogenous struct like so. Consider the following two functions, which are the same except one is written using structs and the other is written using arrays: %struct1 = type { i32, i32 } %struct2 = type { %struct1, %struct1 } ; Function Attrs: ssp uwtable define i32 @test1(%struct2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { bb: br i1 %tmp4, label %bb1, label %bb2 bb1: ; preds = %bb5 %tmp10 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 0 br label %bb3 bb2: ; preds = %.lr.ph.i.i %tmp20 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 1 br label %bb3 bb3: ; preds = %bb2, %bb1 %phi = phi %struct1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] %tmp24 = getelementptr inbounds %struct1* %phi, i64 0, i32 1 %tmp25 = load i32* %tmp24, align 4 ret i32 %tmp25 } %array1 = type [2 x i32] %array2 = type [2 x %array1] ; Function Attrs: ssp uwtable define i32 @test2(%array2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { bb: br i1 %tmp4, label %bb1, label %bb2 bb1: ; preds = %bb5 %tmp10 = getelementptr inbounds %array2* %dm, i64 %tmp9, i32 0 br label %bb3 bb2: ; preds = %.lr.ph.i.i %tmp20 = getelementptr inbounds %array2* %dm, i64 %tmp19, i32 1 br label %bb3 bb3: ; preds = %bb2, %bb1 %phi = phi %array1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] %tmp24 = getelementptr inbounds %array1* %phi, i64 0, i32 1 %tmp25 = load i32* %tmp24, align 4 ret i32 %tmp25 } The @test1 function cannot have the optimization applied because the %tmp10 and %tmp20 GEPs vary by a field index into the struct, whereas @test2 can be optimized to: define i32 @test2([2 x [2 x i32]]* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { bb: br i1 %tmp4, label %bb1, label %bb2 bb1: ; preds = %bb br label %bb3 bb2: ; preds = %bb br label %bb3 bb3: ; preds = %bb2, %bb1 %0 = phi i32 [ 0, %bb1 ], [ 1, %bb2 ] %tmp24 = getelementptr inbounds [2 x [2 x i32]]* %dm, i64 %tmp9, i32 %0, i32 1 %tmp25 = load i32* %tmp24, align 4 ret i32 %tmp25 } Louis -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140522/435eb1e2/attachment.html>
Chandler Carruth
2014-May-23 00:09 UTC
[LLVMdev] RFC: Indexing of structs vs arrays in getelementpointer
On Thu, May 22, 2014 at 5:54 PM, Louis Gerbarg <lgg at apple.com> wrote:> Hmm… you are correct, it turns out you are correct, I only have to worry > if it is constant after the second arg of the GEP. That should allow my > case to continue to work. I am currently building a stage2 now to see if > works cleanly. >Cool. Hopefully we can make incremental progress on the problem while figuring out the right thing to do in the fully general IR case...> > Having said that, there still exist cases where you are indexing into a > homogenous struct like so. Consider the following two functions, which are > the same except one is written using structs and the other is written using > arrays: >Ah, delightful. I was pretty sure there was a more complex example that still exhibited the problem, but wanted to see it to think through things properly.> > %struct1 = type { i32, i32 } > > %struct2 = type { %struct1, %struct1 } > > ; Function Attrs: ssp uwtable > > define i32 @test1(%struct2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { > > bb: > > br i1 %tmp4, label %bb1, label %bb2 > > bb1: ; preds = %bb5 > > %tmp10 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 0 > > br label %bb3 > > > > bb2: ; preds = %.lr.ph.i.i > > %tmp20 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 1 > > br label %bb3 > > > > bb3: ; preds = %bb2, %bb1 > > %phi = phi %struct1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] > > %tmp24 = getelementptr inbounds %struct1* %phi, i64 0, i32 1 > > %tmp25 = load i32* %tmp24, align 4 > > ret i32 %tmp25 > > } > > > > > > %array1 = type [2 x i32] > > %array2 = type [2 x %array1] > > > > ; Function Attrs: ssp uwtable > > define i32 @test2(%array2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { > > bb: > > br i1 %tmp4, label %bb1, label %bb2 > > bb1: ; preds = %bb5 > > %tmp10 = getelementptr inbounds %array2* %dm, i64 %tmp9, i32 0 > > br label %bb3 > > > > bb2: ; preds = %.lr.ph.i.i > > %tmp20 = getelementptr inbounds %array2* %dm, i64 %tmp19, i32 1 > > br label %bb3 > > > > bb3: ; preds = %bb2, %bb1 > > %phi = phi %array1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] > > %tmp24 = getelementptr inbounds %array1* %phi, i64 0, i32 1 > > %tmp25 = load i32* %tmp24, align 4 > > ret i32 %tmp25 > > } > > > The @test1 function cannot have the optimization applied because the > %tmp10 and %tmp20 GEPs vary by a field index into the struct, whereas > @test2 can be optimized to: > > define i32 @test2([2 x [2 x i32]]* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { > bb: > br i1 %tmp4, label %bb1, label %bb2 > > bb1: ; preds = %bb > br label %bb3 > > bb2: ; preds = %bb > br label %bb3 > > bb3: ; preds = %bb2, %bb1 > %0 = phi i32 [ 0, %bb1 ], [ 1, %bb2 ] > %tmp24 = getelementptr inbounds [2 x [2 x i32]]* %dm, i64 %tmp9, i32 %0, > i32 1 > %tmp25 = load i32* %tmp24, align 4 > ret i32 %tmp25 > } > >So, here is why relaxing this rule is hard: Without a constant index, we don't know what sub-struct to use for interpreting the next index in the event of heterogeneity. Without this, we can't do anything, and so we definitionally preclude the transform. While clearly we can make this transform safely for homogenous structs, doing so seriously weakens the IR's guarantees. I'd not like to see us go that direction. Instead, if this transformation is indeed important (and it sounds like it is), I have a somewhat crazier idea: fold homogeneous struct type layers into arrays. Thoughts? (I've not thought about this much, there might be dragons here...) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140522/ea901ea2/attachment.html>
Hal Finkel
2014-May-23 02:19 UTC
[LLVMdev] RFC: Indexing of structs vs arrays in getelementpointer
----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Louis Gerbarg" <lgg at apple.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, May 22, 2014 7:09:49 PM > Subject: Re: [LLVMdev] RFC: Indexing of structs vs arrays in getelementpointer > > > > > > > On Thu, May 22, 2014 at 5:54 PM, Louis Gerbarg < lgg at apple.com > > wrote: > > > > Hmm… you are correct, it turns out you are correct, I only have to > worry if it is constant after the second arg of the GEP. That should > allow my case to continue to work. I am currently building a stage2 > now to see if works cleanly. > > > Cool. Hopefully we can make incremental progress on the problem while > figuring out the right thing to do in the fully general IR case... > > > > > > Having said that, there still exist cases where you are indexing into > a homogenous struct like so. Consider the following two functions, > which are the same except one is written using structs and the other > is written using arrays: > > > Ah, delightful. I was pretty sure there was a more complex example > that still exhibited the problem, but wanted to see it to think > through things properly. > > > > > > > > %struct1 = type { i32, i32 } > %struct2 = type { %struct1, %struct1 } > ; Function Attrs: ssp uwtable > define i32 @test1(%struct2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { > bb: > br i1 %tmp4, label %bb1, label %bb2 > bb1: ; preds = %bb5 > %tmp10 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 0 > br label %bb3 > > bb2: ; preds = %.lr.ph.i.i > %tmp20 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 1 > br label %bb3 > > bb3: ; preds = %bb2, %bb1 > %phi = phi %struct1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] > %tmp24 = getelementptr inbounds %struct1* %phi, i64 0, i32 1 > > %tmp25 = load i32* %tmp24, align 4 > ret i32 %tmp25 > } > > > %array1 = type [2 x i32] > %array2 = type [2 x %array1] > > ; Function Attrs: ssp uwtable > define i32 @test2(%array2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) { > bb: > br i1 %tmp4, label %bb1, label %bb2 > bb1: ; preds = %bb5 > %tmp10 = getelementptr inbounds %array2* %dm, i64 %tmp9, i32 0 > br label %bb3 > > bb2: ; preds = %.lr.ph.i.i > %tmp20 = getelementptr inbounds %array2* %dm, i64 %tmp19, i32 1 > br label %bb3 > > bb3: ; preds = %bb2, %bb1 > %phi = phi %array1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ] > %tmp24 = getelementptr inbounds %array1* %phi, i64 0, i32 1 > > %tmp25 = load i32* %tmp24, align 4 > ret i32 %tmp25 > } > > > The @test1 function cannot have the optimization applied because the > %tmp10 and %tmp20 GEPs vary by a field index into the struct, > whereas @test2 can be optimized to: > > > > > define i32 @test2([2 x [2 x i32]]* %dm, i1 %tmp4, i64 %tmp9, i64 > %tmp19) { > bb: > br i1 %tmp4, label %bb1, label %bb2 > > > bb1: ; preds = %bb > br label %bb3 > > > bb2: ; preds = %bb > br label %bb3 > > > bb3: ; preds = %bb2, %bb1 > %0 = phi i32 [ 0, %bb1 ], [ 1, %bb2 ] > %tmp24 = getelementptr inbounds [2 x [2 x i32]]* %dm, i64 %tmp9, i32 > %0, i32 1 > > %tmp25 = load i32* %tmp24, align 4 > ret i32 %tmp25 > } > > So, here is why relaxing this rule is hard: > > > Without a constant index, we don't know what sub-struct to use for > interpreting the next index in the event of heterogeneity. Without > this, we can't do anything, and so we definitionally preclude the > transform. > > > While clearly we can make this transform safely for homogenous > structs, doing so seriously weakens the IR's guarantees. I'd not > like to see us go that direction.To what guarantees are you referring? Thanks again, Hal> > > Instead, if this transformation is indeed important (and it sounds > like it is), I have a somewhat crazier idea: fold homogeneous struct > type layers into arrays. Thoughts? (I've not thought about this > much, there might be dragons here...) > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory