Hal Finkel via llvm-dev
2016-Aug-30 20:11 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
----- Original Message -----> From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Philip Reames" <listmail at philipreames.com>, "Davide Italiano" > <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > Cc: "Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" > <david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, > "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Tuesday, August 30, 2016 3:07:01 PM > Subject: Re: [llvm-dev] GVN / Alias Analysis issue with > llvm.masked.scatter/gather intrinsics> This is now committed and a test added to GVN.Thanks! I suspect that, in practice, we'll get little benefit from handling this until our AA passes learn how to deal with (i.e. look back through) pointer vectors. -Hal> On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin < dberlin at dberlin.org > > wrote:> > Okay, so then it sounds like, for now, the right fix is to stop > > marking masked.gather and masked.scatter with intrarg* options. >> > On Mon, Aug 29, 2016, 1:26 PM Philip Reames < > > listmail at philipreames.com > wrote: >> > > We might have specification bug here, but we appear to implement > > > what > > > we specified. argmemonly is specified as only considering pointer > > > typed arguments. It's behavior on vector-of-pointers is > > > unspecified, > > > but would seem to fall into the same case as inttoptr of an > > > integer > > > argument (i.e. explicitly undefined). We could consider changing > > > that. > > >> > > On 08/29/2016 11:59 AM, Daniel Berlin wrote: > > >> > > > + a few others. > > > > > >> > > > After following this rabbit hole a bit, there are a lot of > > > > mutually > > > > recursive calls, etc, that may or may not do the right thing > > > > with > > > > vectors of pointers. > > > > > > > > > > I can fix *this* particular bug with the attached patch. > > > > > >> > > > However, it's mostly papering over stuff. Nothing seems to know > > > > what > > > > to do with a memorylocation that is a vector of pointers. They > > > > all > > > > expect memorylocation to be a single pointer location. > > > > > >> > > > I would chalk it up to "luck" that this patch fixes the bug. > > > > > >> > > > It's pretty clear that MemoryLocation doesn't fit the needs of > > > > a > > > > lot > > > > of stuff anymore (we hacked AA nodes into it, and lots of stuff > > > > now > > > > tries to figure out the invariantess of the locations, blah > > > > blah > > > > blah), but it seems like a big job to figure out what to > > > > replace > > > > it > > > > with that will work for these cases. > > > > > >> > > > (I'm pretty positive if we just make it MemoryLocations, and > > > > have > > > > everything loop over the locations, the compiler will get a lot > > > > larger and a lot slower) > > > > > >> > > > On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin < > > > > dberlin at dberlin.org > > > > > wrote: > > > > > >> > > > > it would also, for that matter, say the same about an array > > > > > of > > > > > pointers. > > > > > > > > > >> > > > > It's not clear to me what will break if you change this to > > > > > isPtrOrPtrVectorTy. > > > > > > > > > >> > > > > In fact, i know it doesn't fix this bug. > > > > > > > > > > > > > > > It's a pretty deep rabbit hole of things not quite prepared > > > > > to > > > > > understand vectors of pointers. > > > > > > > > > >> > > > > (we prepare memorylocations of them, but memory locations > > > > > expect > > > > > to > > > > > be one thing, not a group of things, etc). > > > > > > > > > >> > > > > On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin < > > > > > dberlin at dberlin.org > > > > > > wrote: > > > > > > > > > >> > > > > > this is definitely a bug in AA. > > > > > > > > > > > > > > >> > > > > > 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I !> > > > > > E; > > > > > > ++I) > > > > > > { > > > > > > > > > > > > > > > > > > > > > 226 const Value *Arg = *I; > > > > > > > > > > > > > > > > > > > > > 227 if (!Arg->getType()->isPointerTy()) > > > > > > > > > > > > > > > > > > > > > -> 228 continue; > > > > > > > > > > > > > > > > > > > > > 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); > > > > > > > > > > > > > > > > > > > > > 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, > > > > > > CS2ArgIdx, > > > > > > TLI); > > > > > > > > > > > > > > >> > > > > > AliasAnalysis.cpp:228 > > > > > > > > > > > > > > >> > > > > > It ignores every argument because they are vectors of > > > > > > pointers, > > > > > > not > > > > > > pointers. > > > > > > > > > > > > > > >> > > > > > I'm surprised this has not broken anything before. It will > > > > > > never > > > > > > say > > > > > > two intrinsics with vectors of pointers mod/ref each other. > > > > > > > > > > > > > > >> > > > > > On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano < > > > > > > davide at freebsd.org > > > > > > > wrote: > > > > > > > > > > > > > > >> > > > > > > + Daniel Berlin > > > > > > > > > > > > > > > > > > > > >> > > > > > > On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via > > > > > > > llvm-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > < llvm-dev at lists.llvm.org > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think I have found an gvn / alias analysis related > > > > > > > > bug, > > > > > > > > but > > > > > > > > before opening > > > > > > > > > > > > > > > > > > > > > > > > > > > > > an issue on the tracker I wanted to see if I am missing > > > > > > > > something. > > > > > > > > I have > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the following testcase: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> define spir_kernel void @test(<2 x i32*> %in1, <2 x > > > > > > > >> i32*> > > > > > > > >> %in2, > > > > > > > >> i32* %out) > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> { > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> entry: > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Just some temporary storage > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.0 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.1 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, > > > > > > > >> i32 > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, > > > > > > > >> i32 > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read from in1 and in2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x > > > > > > > >> i32*> > > > > > > > >> %in1, > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x > > > > > > > >> i32*> > > > > > > > >> %in2, > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in1 to the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, > > > > > > > >> <2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>); > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read in1 from the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in2 to the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, > > > > > > > >> <2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>); > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Read in2 from the allocas > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; Store in2 to out for good measure > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.0, i32* %out > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %out.1 = getelementptr i32, i32* %out, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.1, i32* %out.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ret void > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It uses a masked scatter operation to store a value to > > > > > > > > the > > > > > > > > two > > > > > > > > allocas and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > then uses a masked gather operation to read that same > > > > > > > > value. > > > > > > > > This > > > > > > > > is done > > > > > > > > > > > > > > > > > > > > > > > > > > > > > twice in a row, with two different values. If I run > > > > > > > > this > > > > > > > > code > > > > > > > > through the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GVN pass, the second gather (%tmp.v.1) will be deemed > > > > > > > > to > > > > > > > > be > > > > > > > > the > > > > > > > > same as the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > first gather (%tmp.v.0) and it will be removed. After > > > > > > > > some > > > > > > > > debugging, I > > > > > > > > > > > > > > > > > > > > > > > > > > > > > realized that this is happening because the Memory > > > > > > > > Dependence > > > > > > > > Analysis > > > > > > > > > > > > > > > > > > > > > > > > > > > > > returns %tmp.v.0 as the Def dependency for %tmp.v.1, > > > > > > > > even > > > > > > > > though > > > > > > > > the scatter > > > > > > > > > > > > > > > > > > > > > > > > > > > > > call in between changes the value stored at %tmp. This, > > > > > > > > in > > > > > > > > turn, > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > happening because the alias analysis is returning > > > > > > > > NoModRef > > > > > > > > for > > > > > > > > the > > > > > > > > %tmp.v.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and second scatter callsites. The resulting IR produces > > > > > > > > the > > > > > > > > wrong > > > > > > > > result: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> define spir_kernel void @test(<2 x i32*> %in1, <2 x > > > > > > > >> i32*> > > > > > > > >> %in2, > > > > > > > >> i32* %out) > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> { > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> entry: > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.0 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.1 = alloca i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, > > > > > > > >> i32 > > > > > > > >> 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, > > > > > > > >> i32 > > > > > > > >> 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x > > > > > > > >> i32*> > > > > > > > >> %in1, > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x > > > > > > > >> i32*> > > > > > > > >> %in2, > > > > > > > >> i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, > > > > > > > >> <2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>) > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, i32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, > > > > > > > >> <2 > > > > > > > >> x > > > > > > > >> i32*> > > > > > > > >> %tmp, > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> i32 1, <2 x i1> <i1 true, i1 true>) > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ; The call to masked.gather is gone and now we are > > > > > > > >> using > > > > > > > >> the > > > > > > > >> old > > > > > > > >> value > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> (%tmp.v.0) > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.0, i32* %out > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> %out.1 = getelementptr i32, i32* %out, i32 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> store i32 %tmp.v.1.1, i32* %out.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> ret void > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The old value read from %tmp is used, instead of the > > > > > > > > new > > > > > > > > one. > > > > > > > > I > > > > > > > > tested this > > > > > > > > > > > > > > > > > > > > > > > > > > > > > code using `opt -gvn`, with LLVM 3.8.1. I also tried > > > > > > > > tip > > > > > > > > (84cb7f4) > > > > > > > > with the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > same result. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I should mention that if I replace the second scatter > > > > > > > > with > > > > > > > > stores, > > > > > > > > the issue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > persists. The only way to avoid it is to replace all > > > > > > > > scatter/gather > > > > > > > > > > > > > > > > > > > > > > > > > > > > > intrinsics with equivalent sets of store/load, in which > > > > > > > > case > > > > > > > > the > > > > > > > > MemDep > > > > > > > > > > > > > > > > > > > > > > > > > > > > > returns the correct dependencies and the GVN pass will > > > > > > > > not > > > > > > > > remove > > > > > > > > the second > > > > > > > > > > > > > > > > > > > > > > > > > > > > > set of loads. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, my question is, is this a bug or am I doing > > > > > > > > something > > > > > > > > that > > > > > > > > I > > > > > > > > shouldn't > > > > > > > > > > > > > > > > > > > > > > > > > > > > > be in the IR? And if it is a bug, is it the AA analyses > > > > > > > > that > > > > > > > > return > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong result (I presume so) or should GVN handle such > > > > > > > > cases > > > > > > > > differently? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > LLVM Developers mailing list > > > > > > > > > > > > > > > > > > > > > > > > > > > > > llvm-dev at lists.llvm.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > Davide > > > > > > > > > > > > > > > > > > > > >> > > > > > > "There are no solved problems; there are only problems > > > > > > > that > > > > > > > are > > > > > > > more > > > > > > > > > > > > > > > > > > > > > > > > > > > > or less solved" -- Henri Poincare > > > > > > > > > > > > > > > > > > > > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160830/8ed9f221/attachment-0001.html>
Daniel Berlin via llvm-dev
2016-Aug-30 20:26 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Yeah, i just hope it doesn't regress scatter/gather vector code badly. But at least it's correct now? On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"Daniel Berlin" <dberlin at dberlin.org> > *To: *"Philip Reames" <listmail at philipreames.com>, "Davide Italiano" < > davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > *Cc: *"Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" < > david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" < > llvm-dev at lists.llvm.org> > *Sent: *Tuesday, August 30, 2016 3:07:01 PM > *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with > llvm.masked.scatter/gather intrinsics > > This is now committed and a test added to GVN. > > Thanks! > > I suspect that, in practice, we'll get little benefit from handling this > until our AA passes learn how to deal with (i.e. look back through) pointer > vectors. > > -Hal > > > > On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org> > wrote: > >> Okay, so then it sounds like, for now, the right fix is to stop marking >> masked.gather and masked.scatter with intrarg* options. >> >> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> >> wrote: >> >>> We might have specification bug here, but we appear to implement what we >>> specified. argmemonly is specified as only considering pointer typed >>> arguments. It's behavior on vector-of-pointers is unspecified, but would >>> seem to fall into the same case as inttoptr of an integer argument (i.e. >>> explicitly undefined). We could consider changing that. >>> >>> >>> >>> On 08/29/2016 11:59 AM, Daniel Berlin wrote: >>> >>> + a few others. >>> >>> After following this rabbit hole a bit, there are a lot of mutually >>> recursive calls, etc, that may or may not do the right thing with vectors >>> of pointers. >>> I can fix *this* particular bug with the attached patch. >>> >>> However, it's mostly papering over stuff. Nothing seems to know what to >>> do with a memorylocation that is a vector of pointers. They all expect >>> memorylocation to be a single pointer location. >>> >>> I would chalk it up to "luck" that this patch fixes the bug. >>> >>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot of >>> stuff anymore (we hacked AA nodes into it, and lots of stuff now tries to >>> figure out the invariantess of the locations, blah blah blah), but it seems >>> like a big job to figure out what to replace it with that will work for >>> these cases. >>> >>> (I'm pretty positive if we just make it MemoryLocations, and have >>> everything loop over the locations, the compiler will get a lot larger and >>> a lot slower) >>> >>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org> >>> wrote: >>> >>>> it would also, for that matter, say the same about an array of >>>> pointers. >>>> >>>> It's not clear to me what will break if you change this to >>>> isPtrOrPtrVectorTy. >>>> >>>> In fact, i know it doesn't fix this bug. >>>> It's a pretty deep rabbit hole of things not quite prepared to >>>> understand vectors of pointers. >>>> >>>> (we prepare memorylocations of them, but memory locations expect to be >>>> one thing, not a group of things, etc). >>>> >>>> >>>> >>>> >>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org> >>>> wrote: >>>> >>>>> this is definitely a bug in AA. >>>>> >>>>> 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; >>>>> ++I) { >>>>> 226 const Value *Arg = *I; >>>>> 227 if (!Arg->getType()->isPointerTy()) >>>>> -> 228 continue; >>>>> 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); >>>>> 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, >>>>> CS2ArgIdx, TLI); >>>>> >>>>> AliasAnalysis.cpp:228 >>>>> >>>>> It ignores every argument because they are vectors of pointers, not >>>>> pointers. >>>>> >>>>> >>>>> I'm surprised this has not broken anything before. It will never say >>>>> two intrinsics with vectors of pointers mod/ref each other. >>>>> >>>>> >>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org> >>>>> wrote: >>>>> >>>>>> + Daniel Berlin >>>>>> >>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev >>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>> > Hello everyone, >>>>>> > >>>>>> > I think I have found an gvn / alias analysis related bug, but >>>>>> before opening >>>>>> > an issue on the tracker I wanted to see if I am missing something. >>>>>> I have >>>>>> > the following testcase: >>>>>> > >>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>> i32* %out) >>>>>> >> { >>>>>> >> entry: >>>>>> >> ; Just some temporary storage >>>>>> >> %tmp.0 = alloca i32 >>>>>> >> %tmp.1 = alloca i32 >>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>> >> ; Read from in1 and in2 >>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %in1, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %in2, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> ; Store in1 to the allocas >>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>> i32*> %tmp, >>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>> >> ; Read in1 from the allocas >>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %tmp, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> ; Store in2 to the allocas >>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>> i32*> %tmp, >>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>> >> ; Read in2 from the allocas >>>>>> >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %tmp, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> ; Store in2 to out for good measure >>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0 >>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1 >>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>> >> ret void >>>>>> >> } >>>>>> > >>>>>> > >>>>>> > It uses a masked scatter operation to store a value to the two >>>>>> allocas and >>>>>> > then uses a masked gather operation to read that same value. This >>>>>> is done >>>>>> > twice in a row, with two different values. If I run this code >>>>>> through the >>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the >>>>>> same as the >>>>>> > first gather (%tmp.v.0) and it will be removed. After some >>>>>> debugging, I >>>>>> > realized that this is happening because the Memory Dependence >>>>>> Analysis >>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though >>>>>> the scatter >>>>>> > call in between changes the value stored at %tmp. This, in turn, is >>>>>> > happening because the alias analysis is returning NoModRef for the >>>>>> %tmp.v.1 >>>>>> > and second scatter callsites. The resulting IR produces the wrong >>>>>> result: >>>>>> > >>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>> i32* %out) >>>>>> >> { >>>>>> >> entry: >>>>>> >> %tmp.0 = alloca i32 >>>>>> >> %tmp.1 = alloca i32 >>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %in1, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %in2, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>> i32*> %tmp, >>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>> %tmp, i32 >>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>> i32*> %tmp, >>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>> >> ; The call to masked.gather is gone and now we are using the old >>>>>> value >>>>>> >> (%tmp.v.0) >>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0 >>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1 >>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>> >> ret void >>>>>> >> } >>>>>> > >>>>>> > >>>>>> > The old value read from %tmp is used, instead of the new one. I >>>>>> tested this >>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip (84cb7f4) >>>>>> with the >>>>>> > same result. >>>>>> > >>>>>> > I should mention that if I replace the second scatter with stores, >>>>>> the issue >>>>>> > persists. The only way to avoid it is to replace all scatter/gather >>>>>> > intrinsics with equivalent sets of store/load, in which case the >>>>>> MemDep >>>>>> > returns the correct dependencies and the GVN pass will not remove >>>>>> the second >>>>>> > set of loads. >>>>>> > >>>>>> > So, my question is, is this a bug or am I doing something that I >>>>>> shouldn't >>>>>> > be in the IR? And if it is a bug, is it the AA analyses that return >>>>>> the >>>>>> > wrong result (I presume so) or should GVN handle such cases >>>>>> differently? >>>>>> > >>>>>> > Best regards, >>>>>> > >>>>>> > Chris >>>>>> > >>>>>> > _______________________________________________ >>>>>> > LLVM Developers mailing list >>>>>> > llvm-dev at lists.llvm.org >>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Davide >>>>>> >>>>>> "There are no solved problems; there are only problems that are more >>>>>> or less solved" -- Henri Poincare >>>>>> >>>>> >>>>> >>>> >>> >>> > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160830/c4b87893/attachment.html>
Chris Sakalis via llvm-dev
2016-Aug-31 08:58 UTC
[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Thank you for the quick fix, I can no longer reproduce the issue. As far a releases go, I am guessing that this is going to be in 4.0? Best, Chris On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> Yeah, i just hope it doesn't regress scatter/gather vector code badly. > But at least it's correct now? > > > On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> ------------------------------ >> >> *From: *"Daniel Berlin" <dberlin at dberlin.org> >> *To: *"Philip Reames" <listmail at philipreames.com>, "Davide Italiano" < >> davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> >> *Cc: *"Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" < >> david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" < >> llvm-dev at lists.llvm.org> >> *Sent: *Tuesday, August 30, 2016 3:07:01 PM >> *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with >> llvm.masked.scatter/gather intrinsics >> >> This is now committed and a test added to GVN. >> >> Thanks! >> >> I suspect that, in practice, we'll get little benefit from handling this >> until our AA passes learn how to deal with (i.e. look back through) pointer >> vectors. >> >> -Hal >> >> >> >> On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org> >> wrote: >> >>> Okay, so then it sounds like, for now, the right fix is to stop marking >>> masked.gather and masked.scatter with intrarg* options. >>> >>> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> >>> wrote: >>> >>>> We might have specification bug here, but we appear to implement what >>>> we specified. argmemonly is specified as only considering pointer typed >>>> arguments. It's behavior on vector-of-pointers is unspecified, but would >>>> seem to fall into the same case as inttoptr of an integer argument (i.e. >>>> explicitly undefined). We could consider changing that. >>>> >>>> >>>> >>>> On 08/29/2016 11:59 AM, Daniel Berlin wrote: >>>> >>>> + a few others. >>>> >>>> After following this rabbit hole a bit, there are a lot of mutually >>>> recursive calls, etc, that may or may not do the right thing with vectors >>>> of pointers. >>>> I can fix *this* particular bug with the attached patch. >>>> >>>> However, it's mostly papering over stuff. Nothing seems to know what >>>> to do with a memorylocation that is a vector of pointers. They all expect >>>> memorylocation to be a single pointer location. >>>> >>>> I would chalk it up to "luck" that this patch fixes the bug. >>>> >>>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot of >>>> stuff anymore (we hacked AA nodes into it, and lots of stuff now tries to >>>> figure out the invariantess of the locations, blah blah blah), but it seems >>>> like a big job to figure out what to replace it with that will work for >>>> these cases. >>>> >>>> (I'm pretty positive if we just make it MemoryLocations, and have >>>> everything loop over the locations, the compiler will get a lot larger and >>>> a lot slower) >>>> >>>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org> >>>> wrote: >>>> >>>>> it would also, for that matter, say the same about an array of >>>>> pointers. >>>>> >>>>> It's not clear to me what will break if you change this to >>>>> isPtrOrPtrVectorTy. >>>>> >>>>> In fact, i know it doesn't fix this bug. >>>>> It's a pretty deep rabbit hole of things not quite prepared to >>>>> understand vectors of pointers. >>>>> >>>>> (we prepare memorylocations of them, but memory locations expect to be >>>>> one thing, not a group of things, etc). >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org> >>>>> wrote: >>>>> >>>>>> this is definitely a bug in AA. >>>>>> >>>>>> 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; >>>>>> ++I) { >>>>>> 226 const Value *Arg = *I; >>>>>> 227 if (!Arg->getType()->isPointerTy()) >>>>>> -> 228 continue; >>>>>> 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); >>>>>> 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, >>>>>> CS2ArgIdx, TLI); >>>>>> >>>>>> AliasAnalysis.cpp:228 >>>>>> >>>>>> It ignores every argument because they are vectors of pointers, not >>>>>> pointers. >>>>>> >>>>>> >>>>>> I'm surprised this has not broken anything before. It will never say >>>>>> two intrinsics with vectors of pointers mod/ref each other. >>>>>> >>>>>> >>>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org> >>>>>> wrote: >>>>>> >>>>>>> + Daniel Berlin >>>>>>> >>>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev >>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> > Hello everyone, >>>>>>> > >>>>>>> > I think I have found an gvn / alias analysis related bug, but >>>>>>> before opening >>>>>>> > an issue on the tracker I wanted to see if I am missing something. >>>>>>> I have >>>>>>> > the following testcase: >>>>>>> > >>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>> i32* %out) >>>>>>> >> { >>>>>>> >> entry: >>>>>>> >> ; Just some temporary storage >>>>>>> >> %tmp.0 = alloca i32 >>>>>>> >> %tmp.1 = alloca i32 >>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>> >> ; Read from in1 and in2 >>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in1, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in2, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in1 to the allocas >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>> >> ; Read in1 from the allocas >>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in2 to the allocas >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>); >>>>>>> >> ; Read in2 from the allocas >>>>>>> >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> ; Store in2 to out for good measure >>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0 >>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1 >>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>> >> ret void >>>>>>> >> } >>>>>>> > >>>>>>> > >>>>>>> > It uses a masked scatter operation to store a value to the two >>>>>>> allocas and >>>>>>> > then uses a masked gather operation to read that same value. This >>>>>>> is done >>>>>>> > twice in a row, with two different values. If I run this code >>>>>>> through the >>>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the >>>>>>> same as the >>>>>>> > first gather (%tmp.v.0) and it will be removed. After some >>>>>>> debugging, I >>>>>>> > realized that this is happening because the Memory Dependence >>>>>>> Analysis >>>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though >>>>>>> the scatter >>>>>>> > call in between changes the value stored at %tmp. This, in turn, is >>>>>>> > happening because the alias analysis is returning NoModRef for the >>>>>>> %tmp.v.1 >>>>>>> > and second scatter callsites. The resulting IR produces the wrong >>>>>>> result: >>>>>>> > >>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, >>>>>>> i32* %out) >>>>>>> >> { >>>>>>> >> entry: >>>>>>> >> %tmp.0 = alloca i32 >>>>>>> >> %tmp.1 = alloca i32 >>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0 >>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1 >>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in1, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %in2, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> >>>>>>> %tmp, i32 >>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1 >>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x >>>>>>> i32*> %tmp, >>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>) >>>>>>> >> ; The call to masked.gather is gone and now we are using the >>>>>>> old value >>>>>>> >> (%tmp.v.0) >>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0 >>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1 >>>>>>> >> store i32 %tmp.v.1.0, i32* %out >>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1 >>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1 >>>>>>> >> ret void >>>>>>> >> } >>>>>>> > >>>>>>> > >>>>>>> > The old value read from %tmp is used, instead of the new one. I >>>>>>> tested this >>>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip (84cb7f4) >>>>>>> with the >>>>>>> > same result. >>>>>>> > >>>>>>> > I should mention that if I replace the second scatter with stores, >>>>>>> the issue >>>>>>> > persists. The only way to avoid it is to replace all scatter/gather >>>>>>> > intrinsics with equivalent sets of store/load, in which case the >>>>>>> MemDep >>>>>>> > returns the correct dependencies and the GVN pass will not remove >>>>>>> the second >>>>>>> > set of loads. >>>>>>> > >>>>>>> > So, my question is, is this a bug or am I doing something that I >>>>>>> shouldn't >>>>>>> > be in the IR? And if it is a bug, is it the AA analyses that >>>>>>> return the >>>>>>> > wrong result (I presume so) or should GVN handle such cases >>>>>>> differently? >>>>>>> > >>>>>>> > Best regards, >>>>>>> > >>>>>>> > Chris >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > LLVM Developers mailing list >>>>>>> > llvm-dev at lists.llvm.org >>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Davide >>>>>>> >>>>>>> "There are no solved problems; there are only problems that are more >>>>>>> or less solved" -- Henri Poincare >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >> >> >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/e40f5814/attachment.html>