Chris Sakalis via llvm-dev
2016-Aug-31 08:58 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Thank you for the quick fix, I can no longer reproduce the issue. As far a releases go, I am guessing that this is going to be in 4.0? Best, Chris On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> Yeah, i just hope it doesn't regress scatter/gather vector code badly. > But at least it's correct now? > > > On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> ------------------------------ >> >> *From: *"Daniel Berlin" <dberlin at dberlin.org> >> *To: *"Philip Reames" <listmail at philipreames.com>, "Davide Italiano" < >> davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> >> *Cc: *"Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" < >> david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" < >> llvm-dev at lists.llvm.org> >> *Sent: *Tuesday, August 30, 2016 3:07:01 PM >> *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with >> llvm.masked.scatter/gather intrinsics >> >> This is now committed and a test added to GVN. >> >> Thanks! >> >> I suspect that, in practice, we'll get little benefit from handling this >> until our AA passes learn how to deal with (i.e. look back through) pointer >> vectors. >> >> -Hal >> >> >> >> On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org> >> wrote: >> >>> Okay, so then it sounds like, for now, the right fix is to stop marking >>> masked.gather and masked.scatter with intrarg* options. >>> >>> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> >>> wrote: >>> >>>> We might have specification bug here, but we appear to implement what >>>> we specified. argmemonly is specified as only considering pointer typed >>>> arguments. It's behavior on vector-of-pointers is unspecified, but would >>>> seem to fall into the same case as inttoptr of an integer argument (i.e. >>>> explicitly undefined). We could consider changing that. >>>> >>>> >>>> >>>> On 08/29/2016 11:59 AM, Daniel Berlin wrote: >>>> >>>> + a few others. >>>> >>>> After following this rabbit hole a bit, there are a lot of mutually >>>> recursive calls, etc, that may or may not do the right thing with vectors >>>> of pointers. >>>> I can fix *this* particular bug with the attached patch. >>>> >>>> However, it's mostly papering over stuff. Nothing seems to know what >>>> to do with a memorylocation that is a vector of pointers. They all expect >>>> memorylocation to be a single pointer location. >>>> >>>> I would chalk it up to "luck" that this patch fixes the bug. >>>> >>>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot of >>>> stuff anymore (we hacked AA nodes into it, and lots of stuff now tries to >>>> figure out the invariantess of the locations, blah blah blah), but it seems >>>> like a big job to figure out what to replace it with that will work for >>>> these cases. >>>> >>>> (I'm pretty positive if we just make it MemoryLocations, and have >>>> everything loop over the locations, the compiler will get a lot larger and >>>> a lot slower) >>>> >>>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org> >>>> wrote: >>>> >>>>> it would also, for that matter, say the same about an array of >>>>> pointers. >>>>> >>>>> It's not clear to me what will break if you change this to >>>>> isPtrOrPtrVectorTy. >>>>> >>>>> In fact, i know it doesn't fix this bug. >>>>> It's a pretty deep rabbit hole of things not quite prepared to >>>>> understand vectors of pointers. >>>>> >>>>> (we prepare memorylocations of them, but memory locations expect to be >>>>> one thing, not a group of things, etc). >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org> >>>>> wrote: >>>>> >>>>>> this is definitely a bug in AA. >>>>>> >>>>>> 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; >>>>>> ++I) { >>>>>> 226 const Value *Arg = *I; >>>>>> 227 if (!Arg->getType()->isPointerTy()) >>>>>> -> 228 continue; >>>>>> 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); >>>>>> 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, >>>>>> CS2ArgIdx, TLI); >>>>>> >>>>>> AliasAnalysis.cpp:228 >>>>>> >>>>>> It ignores every argument because they are vectors of pointers, not >>>>>> pointers. >>>>>> >>>>>> >>>>>> I'm surprised this has not broken anything before. It will never say >>>>>> two intrinsics with vectors of pointers mod/ref each other. >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org> >>>>>> wrote: >>>>>> >>>>>>> + Daniel Berlin >>>>>>> >>>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev >>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> > Hello everyone, >>>>>>> > >>>>>>> > I think I have found an gvn / alias analysis related bug, but >>>>>>> before opening >>>>>>> > an issue on the tracker I wanted to see if I am missing something. >>>>>>> I have >>>>>>> > the following testcase: >>>>>>> > >>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>> i32* %out) >>>>>>> >> { >>>>>>> >> entry: >>>>>>> >> ; Just some temporary storage >>>>>>> >> %tmp.0 = alloca i32 >>>>>>> >> %tmp.1 = alloca i32 >>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>> >> ; Read from in1 and in2 >>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in1, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in2, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in1 to the allocas >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>> >> ; Read in1 from the allocas >>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in2 to the allocas >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>> >> ; Read in2 from the allocas >>>>>>> >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in2 to out for good measure >>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0 >>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1 >>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>> >> ret void >>>>>>> >> } >>>>>>> > >>>>>>> > >>>>>>> > It uses a masked scatter operation to store a value to the two >>>>>>> allocas and >>>>>>> > then uses a masked gather operation to read that same value. This >>>>>>> is done >>>>>>> > twice in a row, with two different values. If I run this code >>>>>>> through the >>>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the >>>>>>> same as the >>>>>>> > first gather (%tmp.v.0) and it will be removed. After some >>>>>>> debugging, I >>>>>>> > realized that this is happening because the Memory Dependence >>>>>>> Analysis >>>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though >>>>>>> the scatter >>>>>>> > call in between changes the value stored at %tmp. This, in turn, is >>>>>>> > happening because the alias analysis is returning NoModRef for the >>>>>>> %tmp.v.1 >>>>>>> > and second scatter callsites. The resulting IR produces the wrong >>>>>>> result: >>>>>>> > >>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>> i32* %out) >>>>>>> >> { >>>>>>> >> entry: >>>>>>> >> %tmp.0 = alloca i32 >>>>>>> >> %tmp.1 = alloca i32 >>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in1, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in2, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>> >> ; The call to masked.gather is gone and now we are using the >>>>>>> old value >>>>>>> >> (%tmp.v.0) >>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0 >>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1 >>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>> >> ret void >>>>>>> >> } >>>>>>> > >>>>>>> > >>>>>>> > The old value read from %tmp is used, instead of the new one. I >>>>>>> tested this >>>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip (84cb7f4) >>>>>>> with the >>>>>>> > same result. >>>>>>> > >>>>>>> > I should mention that if I replace the second scatter with stores, >>>>>>> the issue >>>>>>> > persists. The only way to avoid it is to replace all scatter/gather >>>>>>> > intrinsics with equivalent sets of store/load, in which case the >>>>>>> MemDep >>>>>>> > returns the correct dependencies and the GVN pass will not remove >>>>>>> the second >>>>>>> > set of loads. >>>>>>> > >>>>>>> > So, my question is, is this a bug or am I doing something that I >>>>>>> shouldn't >>>>>>> > be in the IR? And if it is a bug, is it the AA analyses that >>>>>>> return the >>>>>>> > wrong result (I presume so) or should GVN handle such cases >>>>>>> differently? >>>>>>> > >>>>>>> > Best regards, >>>>>>> > >>>>>>> > Chris >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > LLVM Developers mailing list >>>>>>> > llvm-dev at lists.llvm.org >>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Davide >>>>>>> >>>>>>> "There are no solved problems; there are only problems that are more >>>>>>> or less solved" -- Henri Poincare >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >> >> >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/e40f5814/attachment.html>
Hal Finkel via llvm-dev
2016-Aug-31 13:07 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
----- Original Message -----> From: "Chris Sakalis" <chrissakalis at gmail.com> > To: "Daniel Berlin" <dberlin at dberlin.org> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" > <david.majnemer at gmail.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, > "Philip Reames" <listmail at philipreames.com>, "Davide Italiano" > <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > Sent: Wednesday, August 31, 2016 3:58:56 AM > Subject: Re: [llvm-dev] GVN / Alias Analysis issue with > llvm.masked.scatter/gather intrinsics> Thank you for the quick fix, I can no longer reproduce the issue. As > far a releases go, I am guessing that this is going to be in 4.0?Yes, and we can consider it for 3.9.1 as well. -Hal> Best,> Chris> On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin < dberlin at dberlin.org > > wrote:> > Yeah, i just hope it doesn't regress scatter/gather vector code > > badly. > > > But at least it's correct now? >> > On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel < hfinkel at anl.gov > > > wrote: >> > > > From: "Daniel Berlin" < dberlin at dberlin.org > > > > > > > > > > > To: "Philip Reames" < listmail at philipreames.com >, "Davide > > > > Italiano" > > > > < davide at freebsd.org >, "Chandler Carruth" < > > > > chandlerc at gmail.com > > > > > > > > > > > > > > > Cc: "Chris Sakalis" < chrissakalis at gmail.com >, "David > > > > Majnemer" > > > > < > > > > david.majnemer at gmail.com >, "Hal Finkel" < hfinkel at anl.gov >, > > > > "llvm-dev" < llvm-dev at lists.llvm.org > > > > > > > > > > > Sent: Tuesday, August 30, 2016 3:07:01 PM > > > > > > > > > > Subject: Re: [llvm-dev] GVN / Alias Analysis issue with > > > > llvm.masked.scatter/gather intrinsics > > > > > >> > > > This is now committed and a test added to GVN. > > > > > >> > > Thanks! > > >> > > I suspect that, in practice, we'll get little benefit from > > > handling > > > this until our AA passes learn how to deal with (i.e. look back > > > through) pointer vectors. > > >> > > -Hal > > >> > > > On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin < > > > > dberlin at dberlin.org > > > > > wrote: > > > > > >> > > > > Okay, so then it sounds like, for now, the right fix is to > > > > > stop > > > > > marking masked.gather and masked.scatter with intrarg* > > > > > options. > > > > > > > > > >> > > > > On Mon, Aug 29, 2016, 1:26 PM Philip Reames < > > > > > listmail at philipreames.com > wrote: > > > > > > > > > >> > > > > > We might have specification bug here, but we appear to > > > > > > implement > > > > > > what > > > > > > we specified. argmemonly is specified as only considering > > > > > > pointer > > > > > > typed arguments. It's behavior on vector-of-pointers is > > > > > > unspecified, > > > > > > but would seem to fall into the same case as inttoptr of an > > > > > > integer > > > > > > argument (i.e. explicitly undefined). We could consider > > > > > > changing > > > > > > that. > > > > > > > > > > > > > > >> > > > > > On 08/29/2016 11:59 AM, Daniel Berlin wrote: > > > > > > > > > > > > > > >> > > > > > > + a few others. > > > > > > > > > > > > > > > > > > > > >> > > > > > > After following this rabbit hole a bit, there are a lot > > > > > > > of > > > > > > > mutually > > > > > > > recursive calls, etc, that may or may not do the right > > > > > > > thing > > > > > > > with > > > > > > > vectors of pointers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can fix *this* particular bug with the attached patch. > > > > > > > > > > > > > > > > > > > > >> > > > > > > However, it's mostly papering over stuff. Nothing seems > > > > > > > to > > > > > > > know > > > > > > > what > > > > > > > to do with a memorylocation that is a vector of pointers. > > > > > > > They > > > > > > > all > > > > > > > expect memorylocation to be a single pointer location. > > > > > > > > > > > > > > > > > > > > >> > > > > > > I would chalk it up to "luck" that this patch fixes the > > > > > > > bug. > > > > > > > > > > > > > > > > > > > > >> > > > > > > It's pretty clear that MemoryLocation doesn't fit the > > > > > > > needs > > > > > > > of > > > > > > > a > > > > > > > lot > > > > > > > of stuff anymore (we hacked AA nodes into it, and lots of > > > > > > > stuff > > > > > > > now > > > > > > > tries to figure out the invariantess of the locations, > > > > > > > blah > > > > > > > blah > > > > > > > blah), but it seems like a big job to figure out what to > > > > > > > replace > > > > > > > it > > > > > > > with that will work for these cases. > > > > > > > > > > > > > > > > > > > > >> > > > > > > (I'm pretty positive if we just make it MemoryLocations, > > > > > > > and > > > > > > > have > > > > > > > everything loop over the locations, the compiler will get > > > > > > > a > > > > > > > lot > > > > > > > larger and a lot slower) > > > > > > > > > > > > > > > > > > > > >> > > > > > > On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin < > > > > > > > dberlin at dberlin.org > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > >> > > > > > > > it would also, for that matter, say the same about an > > > > > > > > array > > > > > > > > of > > > > > > > > pointers. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > It's not clear to me what will break if you change this > > > > > > > > to > > > > > > > > isPtrOrPtrVectorTy. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > In fact, i know it doesn't fix this bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It's a pretty deep rabbit hole of things not quite > > > > > > > > prepared > > > > > > > > to > > > > > > > > understand vectors of pointers. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > (we prepare memorylocations of them, but memory > > > > > > > > locations > > > > > > > > expect > > > > > > > > to > > > > > > > > be one thing, not a group of things, etc). > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin < > > > > > > > > dberlin at dberlin.org > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > this is definitely a bug in AA. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); > > > > > > > > > I > > > > > > > > > !> > > > > > > > > E; > > > > > > > > > ++I) > > > > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 226 const Value *Arg = *I; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 227 if (!Arg->getType()->isPointerTy()) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -> 228 continue; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 229 unsigned CS2ArgIdx > > > > > > > > > std::distance(CS2.arg_begin(), > > > > > > > > > I); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 230 auto CS2ArgLoc > > > > > > > > > MemoryLocation::getForArgument(CS2, > > > > > > > > > CS2ArgIdx, > > > > > > > > > TLI); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > AliasAnalysis.cpp:228 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > It ignores every argument because they are vectors of > > > > > > > > > pointers, > > > > > > > > > not > > > > > > > > > pointers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > I'm surprised this has not broken anything before. It > > > > > > > > > will > > > > > > > > > never > > > > > > > > > say > > > > > > > > > two intrinsics with vectors of pointers mod/ref each > > > > > > > > > other. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano < > > > > > > > > > davide at freebsd.org > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > + Daniel Berlin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via > > > > > > > > > > llvm-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < llvm-dev at lists.llvm.org > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think I have found an gvn / alias analysis > > > > > > > > > > > related > > > > > > > > > > > bug, > > > > > > > > > > > but > > > > > > > > > > > before opening > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > an issue on the tracker I wanted to see if I am > > > > > > > > > > > missing > > > > > > > > > > > something. > > > > > > > > > > > I have > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the following testcase: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> define spir_kernel void @test(<2 x i32*> %in1, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in2, > > > > > > > > > > >> i32* %out) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> entry: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Just some temporary storage > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.0 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.1 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.i = insertelement <2 x i32*> undef, i32* > > > > > > > > > > >> %tmp.0, > > > > > > > > > > >> i32 > > > > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp = insertelement <2 x i32*> %tmp.i, i32* > > > > > > > > > > >> %tmp.1, > > > > > > > > > > >> i32 > > > > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read from in1 and in2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in1.v = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in1, > > > > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in2.v = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in2, > > > > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in1 to the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> > > > > > > > > > > >> %in1.v, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read in1 from the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.0 = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in2 to the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> > > > > > > > > > > >> %in2.v, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read in2 from the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1 = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in2 to out for good measure > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, > > > > > > > > > > >> i32 > > > > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, > > > > > > > > > > >> i32 > > > > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.0, i32* %out > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %out.1 = getelementptr i32, i32* %out, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.1, i32* %out.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ret void > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It uses a masked scatter operation to store a > > > > > > > > > > > value > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > two > > > > > > > > > > > allocas and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > then uses a masked gather operation to read that > > > > > > > > > > > same > > > > > > > > > > > value. > > > > > > > > > > > This > > > > > > > > > > > is done > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > twice in a row, with two different values. If I > > > > > > > > > > > run > > > > > > > > > > > this > > > > > > > > > > > code > > > > > > > > > > > through the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GVN pass, the second gather (%tmp.v.1) will be > > > > > > > > > > > deemed > > > > > > > > > > > to > > > > > > > > > > > be > > > > > > > > > > > the > > > > > > > > > > > same as the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > first gather (%tmp.v.0) and it will be removed. > > > > > > > > > > > After > > > > > > > > > > > some > > > > > > > > > > > debugging, I > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > realized that this is happening because the > > > > > > > > > > > Memory > > > > > > > > > > > Dependence > > > > > > > > > > > Analysis > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > returns %tmp.v.0 as the Def dependency for > > > > > > > > > > > %tmp.v.1, > > > > > > > > > > > even > > > > > > > > > > > though > > > > > > > > > > > the scatter > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > call in between changes the value stored at %tmp. > > > > > > > > > > > This, > > > > > > > > > > > in > > > > > > > > > > > turn, > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > happening because the alias analysis is returning > > > > > > > > > > > NoModRef > > > > > > > > > > > for > > > > > > > > > > > the > > > > > > > > > > > %tmp.v.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and second scatter callsites. The resulting IR > > > > > > > > > > > produces > > > > > > > > > > > the > > > > > > > > > > > wrong > > > > > > > > > > > result: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> define spir_kernel void @test(<2 x i32*> %in1, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in2, > > > > > > > > > > >> i32* %out) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> entry: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.0 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.1 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.i = insertelement <2 x i32*> undef, i32* > > > > > > > > > > >> %tmp.0, > > > > > > > > > > >> i32 > > > > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp = insertelement <2 x i32*> %tmp.i, i32* > > > > > > > > > > >> %tmp.1, > > > > > > > > > > >> i32 > > > > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in1.v = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in1, > > > > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in2.v = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %in2, > > > > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> > > > > > > > > > > >> %in1.v, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.0 = call <2 x i32> > > > > > > > > > > >> @llvm.masked.gather.v2i32(<2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) > > > > > > > > > > >> #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> > > > > > > > > > > >> %in2.v, > > > > > > > > > > >> <2 > > > > > > > > > > >> x > > > > > > > > > > >> i32*> > > > > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; The call to masked.gather is gone and now we > > > > > > > > > > >> are > > > > > > > > > > >> using > > > > > > > > > > >> the > > > > > > > > > > >> old > > > > > > > > > > >> value > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> (%tmp.v.0) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, > > > > > > > > > > >> i32 > > > > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, > > > > > > > > > > >> i32 > > > > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.0, i32* %out > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %out.1 = getelementptr i32, i32* %out, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.1, i32* %out.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ret void > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The old value read from %tmp is used, instead of > > > > > > > > > > > the > > > > > > > > > > > new > > > > > > > > > > > one. > > > > > > > > > > > I > > > > > > > > > > > tested this > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > code using `opt -gvn`, with LLVM 3.8.1. I also > > > > > > > > > > > tried > > > > > > > > > > > tip > > > > > > > > > > > (84cb7f4) > > > > > > > > > > > with the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > same result. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I should mention that if I replace the second > > > > > > > > > > > scatter > > > > > > > > > > > with > > > > > > > > > > > stores, > > > > > > > > > > > the issue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > persists. The only way to avoid it is to replace > > > > > > > > > > > all > > > > > > > > > > > scatter/gather > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > intrinsics with equivalent sets of store/load, in > > > > > > > > > > > which > > > > > > > > > > > case > > > > > > > > > > > the > > > > > > > > > > > MemDep > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > returns the correct dependencies and the GVN pass > > > > > > > > > > > will > > > > > > > > > > > not > > > > > > > > > > > remove > > > > > > > > > > > the second > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > set of loads. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, my question is, is this a bug or am I doing > > > > > > > > > > > something > > > > > > > > > > > that > > > > > > > > > > > I > > > > > > > > > > > shouldn't > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > be in the IR? And if it is a bug, is it the AA > > > > > > > > > > > analyses > > > > > > > > > > > that > > > > > > > > > > > return > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong result (I presume so) or should GVN handle > > > > > > > > > > > such > > > > > > > > > > > cases > > > > > > > > > > > differently? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > LLVM Developers mailing list > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > llvm-dev at lists.llvm.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Davide > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > "There are no solved problems; there are only > > > > > > > > > > problems > > > > > > > > > > that > > > > > > > > > > are > > > > > > > > > > more > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > or less solved" -- Henri Poincare > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > -- > > >> > > Hal Finkel > > > > > > Assistant Computational Scientist > > > > > > Leadership Computing Facility > > > > > > Argonne National Laboratory > > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/6f0e0e00/attachment.html>
Chris Sakalis via llvm-dev
2016-Aug-31 16:00 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Great, thank you! On Wed, Aug 31, 2016 at 2:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"Chris Sakalis" <chrissakalis at gmail.com> > *To: *"Daniel Berlin" <dberlin at dberlin.org> > *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" < > david.majnemer at gmail.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Philip > Reames" <listmail at philipreames.com>, "Davide Italiano" <davide at freebsd.org>, > "Chandler Carruth" <chandlerc at gmail.com> > *Sent: *Wednesday, August 31, 2016 3:58:56 AM > *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with > llvm.masked.scatter/gather intrinsics > > Thank you for the quick fix, I can no longer reproduce the issue. As far a > releases go, I am guessing that this is going to be in 4.0? > > Yes, and we can consider it for 3.9.1 as well. > > -Hal > > > Best, > > Chris > > On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> > wrote: > >> Yeah, i just hope it doesn't regress scatter/gather vector code badly. >> But at least it's correct now? >> >> >> On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> >>> >>> ------------------------------ >>> >>> *From: *"Daniel Berlin" <dberlin at dberlin.org> >>> *To: *"Philip Reames" <listmail at philipreames.com>, "Davide Italiano" < >>> davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> >>> *Cc: *"Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" < >>> david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" < >>> llvm-dev at lists.llvm.org> >>> *Sent: *Tuesday, August 30, 2016 3:07:01 PM >>> *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with >>> llvm.masked.scatter/gather intrinsics >>> >>> This is now committed and a test added to GVN. >>> >>> Thanks! >>> >>> I suspect that, in practice, we'll get little benefit from handling this >>> until our AA passes learn how to deal with (i.e. look back through) pointer >>> vectors. >>> >>> -Hal >>> >>> >>> >>> On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org> >>> wrote: >>> >>>> Okay, so then it sounds like, for now, the right fix is to stop marking >>>> masked.gather and masked.scatter with intrarg* options. >>>> >>>> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> >>>> wrote: >>>> >>>>> We might have specification bug here, but we appear to implement what >>>>> we specified. argmemonly is specified as only considering pointer typed >>>>> arguments. It's behavior on vector-of-pointers is unspecified, but would >>>>> seem to fall into the same case as inttoptr of an integer argument (i.e. >>>>> explicitly undefined). We could consider changing that. >>>>> >>>>> >>>>> >>>>> On 08/29/2016 11:59 AM, Daniel Berlin wrote: >>>>> >>>>> + a few others. >>>>> >>>>> After following this rabbit hole a bit, there are a lot of mutually >>>>> recursive calls, etc, that may or may not do the right thing with vectors >>>>> of pointers. >>>>> I can fix *this* particular bug with the attached patch. >>>>> >>>>> However, it's mostly papering over stuff. Nothing seems to know what >>>>> to do with a memorylocation that is a vector of pointers. They all expect >>>>> memorylocation to be a single pointer location. >>>>> >>>>> I would chalk it up to "luck" that this patch fixes the bug. >>>>> >>>>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot >>>>> of stuff anymore (we hacked AA nodes into it, and lots of stuff now tries >>>>> to figure out the invariantess of the locations, blah blah blah), but it >>>>> seems like a big job to figure out what to replace it with that will work >>>>> for these cases. >>>>> >>>>> (I'm pretty positive if we just make it MemoryLocations, and have >>>>> everything loop over the locations, the compiler will get a lot larger and >>>>> a lot slower) >>>>> >>>>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org> >>>>> wrote: >>>>> >>>>>> it would also, for that matter, say the same about an array of >>>>>> pointers. >>>>>> >>>>>> It's not clear to me what will break if you change this to >>>>>> isPtrOrPtrVectorTy. >>>>>> >>>>>> In fact, i know it doesn't fix this bug. >>>>>> It's a pretty deep rabbit hole of things not quite prepared to >>>>>> understand vectors of pointers. >>>>>> >>>>>> (we prepare memorylocations of them, but memory locations expect to >>>>>> be one thing, not a group of things, etc). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org> >>>>>> wrote: >>>>>> >>>>>>> this is definitely a bug in AA. >>>>>>> >>>>>>> 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I !>>>>>>> E; ++I) { >>>>>>> 226 const Value *Arg = *I; >>>>>>> 227 if (!Arg->getType()->isPointerTy()) >>>>>>> -> 228 continue; >>>>>>> 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), >>>>>>> I); >>>>>>> 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, >>>>>>> CS2ArgIdx, TLI); >>>>>>> >>>>>>> AliasAnalysis.cpp:228 >>>>>>> >>>>>>> It ignores every argument because they are vectors of pointers, not >>>>>>> pointers. >>>>>>> >>>>>>> >>>>>>> I'm surprised this has not broken anything before. It will never say >>>>>>> two intrinsics with vectors of pointers mod/ref each other. >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org >>>>>>> > wrote: >>>>>>> >>>>>>>> + Daniel Berlin >>>>>>>> >>>>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev >>>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>>> > Hello everyone, >>>>>>>> > >>>>>>>> > I think I have found an gvn / alias analysis related bug, but >>>>>>>> before opening >>>>>>>> > an issue on the tracker I wanted to see if I am missing >>>>>>>> something. I have >>>>>>>> > the following testcase: >>>>>>>> > >>>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>>> i32* %out) >>>>>>>> >> { >>>>>>>> >> entry: >>>>>>>> >> ; Just some temporary storage >>>>>>>> >> %tmp.0 = alloca i32 >>>>>>>> >> %tmp.1 = alloca i32 >>>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>>> >> ; Read from in1 and in2 >>>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %in1, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %in2, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> ; Store in1 to the allocas >>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>>> i32*> %tmp, >>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>>> >> ; Read in1 from the allocas >>>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %tmp, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> ; Store in2 to the allocas >>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>>> i32*> %tmp, >>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>>> >> ; Read in2 from the allocas >>>>>>>> >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %tmp, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> ; Store in2 to out for good measure >>>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0 >>>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1 >>>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>>> >> ret void >>>>>>>> >> } >>>>>>>> > >>>>>>>> > >>>>>>>> > It uses a masked scatter operation to store a value to the two >>>>>>>> allocas and >>>>>>>> > then uses a masked gather operation to read that same value. This >>>>>>>> is done >>>>>>>> > twice in a row, with two different values. If I run this code >>>>>>>> through the >>>>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the >>>>>>>> same as the >>>>>>>> > first gather (%tmp.v.0) and it will be removed. After some >>>>>>>> debugging, I >>>>>>>> > realized that this is happening because the Memory Dependence >>>>>>>> Analysis >>>>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though >>>>>>>> the scatter >>>>>>>> > call in between changes the value stored at %tmp. This, in turn, >>>>>>>> is >>>>>>>> > happening because the alias analysis is returning NoModRef for >>>>>>>> the %tmp.v.1 >>>>>>>> > and second scatter callsites. The resulting IR produces the wrong >>>>>>>> result: >>>>>>>> > >>>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>>> i32* %out) >>>>>>>> >> { >>>>>>>> >> entry: >>>>>>>> >> %tmp.0 = alloca i32 >>>>>>>> >> %tmp.1 = alloca i32 >>>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %in1, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %in2, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>>> i32*> %tmp, >>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>>> %tmp, i32 >>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>>> i32*> %tmp, >>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>>> >> ; The call to masked.gather is gone and now we are using the >>>>>>>> old value >>>>>>>> >> (%tmp.v.0) >>>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0 >>>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1 >>>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>>> >> ret void >>>>>>>> >> } >>>>>>>> > >>>>>>>> > >>>>>>>> > The old value read from %tmp is used, instead of the new one. I >>>>>>>> tested this >>>>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip >>>>>>>> (84cb7f4) with the >>>>>>>> > same result. >>>>>>>> > >>>>>>>> > I should mention that if I replace the second scatter with >>>>>>>> stores, the issue >>>>>>>> > persists. The only way to avoid it is to replace all >>>>>>>> scatter/gather >>>>>>>> > intrinsics with equivalent sets of store/load, in which case the >>>>>>>> MemDep >>>>>>>> > returns the correct dependencies and the GVN pass will not remove >>>>>>>> the second >>>>>>>> > set of loads. >>>>>>>> > >>>>>>>> > So, my question is, is this a bug or am I doing something that I >>>>>>>> shouldn't >>>>>>>> > be in the IR? And if it is a bug, is it the AA analyses that >>>>>>>> return the >>>>>>>> > wrong result (I presume so) or should GVN handle such cases >>>>>>>> differently? >>>>>>>> > >>>>>>>> > Best regards, >>>>>>>> > >>>>>>>> > Chris >>>>>>>> > >>>>>>>> > _______________________________________________ >>>>>>>> > LLVM Developers mailing list >>>>>>>> > llvm-dev at lists.llvm.org >>>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Davide >>>>>>>> >>>>>>>> "There are no solved problems; there are only problems that are more >>>>>>>> or less solved" -- Henri Poincare >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>> >>> >>> >>> -- >>> Hal Finkel >>> Assistant Computational Scientist >>> Leadership Computing Facility >>> Argonne National Laboratory >>> >> >> > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/aff3aa13/attachment.html>