----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov> > To: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu > Sent: Friday, October 24, 2014 8:39:56 AM > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > ----- Original Message ----- > > From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > > To: "Hal Finkel" <hfinkel at anl.gov> > > Cc: dag at cray.com, llvmdev at cs.uiuc.edu, "Ayal Zaks" > > <ayal.zaks at intel.com> > > Sent: Friday, October 24, 2014 8:07:18 AM > > Subject: RE: [LLVMdev] Adding masked vector load and store > > intrinsics > > > > > For the loads, I'm must less sure. Why can't we represent the > > > loads > > > as select(mask, load(addr), passthru)? It is true, that the load > > > might get separated from the select so that isel might not see it > > > (because isel if basic-block local), but we can add some code in > > > CodeGenPrep to fix that for targets on which it is useful to do > > > so > > > (which is a more-general solution than the intrinsic anyhow). > > > What > > > do you think? > > > > We generate the vector-masked-intrinsic on IR-to-IR pass. It is too > > far from instruction selection. We'll need to guarantee that all > > subsequent IR-to-IR passes will not break the sequence. > > I'm fully aware of this issue. This needs to be weighed against the > cost of updating all other optimizations that operate on loads to > also understand this intrinsic. > > > And only for > > one or two specific targets. > > Regardless, they're certainly targets many users care about ;) > > > Then we'll keep the logic in type > > legalizer, which may split or extend operations. Then we are taking > > care in DAG-combine. > > In my opinion, this is just unsafe. > > If this were really a question of safety, I'd agree. And if we were > talking about gather loads, I'd agree. For a regular vector loads, I > don't see this as a safety issue. We should outline what the > downside of emitting a regular load would actually be should some > optimization be done to the select. Can you please elaborate on > this?Nevermind ;) -- I changed my mind, the safety issue is with non-aligned loads that might cross page boundaries. Is that right? If so, I think this proposal is good (although obviously the docs need to make clear what the faulting behavior of these intrinsics is). Thanks again, Hal> > Thanks again, > Hal > > > > > - Elena > > > > > > -----Original Message----- > > From: Hal Finkel [mailto:hfinkel at anl.gov] > > Sent: Friday, October 24, 2014 15:50 > > To: Demikhovsky, Elena > > Cc: dag at cray.com; llvmdev at cs.uiuc.edu > > Subject: Re: [LLVMdev] Adding masked vector load and store > > intrinsics > > > > ----- Original Message ----- > > > From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > > > To: llvmdev at cs.uiuc.edu > > > Cc: dag at cray.com > > > Sent: Friday, October 24, 2014 6:24:15 AM > > > Subject: [LLVMdev] Adding masked vector load and store intrinsics > > > > > > > > > > > > Hi, > > > > > > We would like to add support for masked vector loads and stores > > > by > > > introducing new target-independent intrinsics. The loop > > > vectorizer > > > will then be enhanced to optimize loops containing conditional > > > memory > > > accesses by generating these intrinsics for existing targets such > > > as > > > AVX2 and AVX-512. The vectorizer will first ask the target about > > > availability of masked vector loads and stores. The SLP > > > vectorizer > > > can > > > potentially be enhanced to use these intrinsics as well. > > > > > > The intrinsics would be legal for all targets; targets that do > > > not > > > support masked vector loads or stores will scalarize them. > > > The addressed memory will not be touched for masked-off lanes. In > > > particular, if all lanes are masked off no address will be > > > accessed. > > > > > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 > > > 4, > > > <16 x i1> %mask) > > > > > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > > > %passthru, i32 4, <8 x i1> %mask) > > > > > > where %passthru is used to fill the elements of %data that are > > > masked-off (if any; can be zeroinitializer or undef). > > > > > > Comments so far, before we dive into more details? > > > > For the stores, I think this is a reasonable idea. The alternative > > is > > to represent them in scalar form with a lot of control flow, and I > > think that expecting the backend to properly pattern match that > > after isel is not realistic. > > > > For the loads, I'm must less sure. Why can't we represent the loads > > as select(mask, load(addr), passthru)? It is true, that the load > > might get separated from the select so that isel might not see it > > (because isel if basic-block local), but we can add some code in > > CodeGenPrep to fix that for targets on which it is useful to do so > > (which is a more-general solution than the intrinsic anyhow). What > > do you think? > > > > Thanks again, > > Hal > > > > > > > > Thank you. > > > > > > - Elena and Ayal > > > > > > > > > > > > --------------------------------------------------------------------- > > > Intel Israel (74) Limited > > > > > > This e-mail and any attachments may contain confidential material > > > for > > > the sole use of the intended recipient(s). Any review or > > > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material > > for > > the sole use of the intended recipient(s). Any review or > > distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
dag at cray.com
2014-Oct-24 16:56 UTC
[LLVMdev] Adding masked vector load and store intrinsics
Hal Finkel <hfinkel at anl.gov> writes:>> If this were really a question of safety, I'd agree. And if we were >> talking about gather loads, I'd agree. For a regular vector loads, I >> don't see this as a safety issue. We should outline what the >> downside of emitting a regular load would actually be should some >> optimization be done to the select. Can you please elaborate on >> this? > > Nevermind ;) -- I changed my mind, the safety issue is with > non-aligned loads that might cross page boundaries. Is that right?That's just one safety issue. There are others.> If so, I think this proposal is good (although obviously the docs need > to make clear what the faulting behavior of these intrinsics is).The behavior should be not to ever fault on an element whose mask bit is false, and behave as a regular load (wrt trapping) for any element whose mask bit is true. -David
----- Original Message -----> From: dag at cray.com > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Elena Demikhovsky" <elena.demikhovsky at intel.com>, llvmdev at cs.uiuc.edu > Sent: Friday, October 24, 2014 11:56:14 AM > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > Hal Finkel <hfinkel at anl.gov> writes: > > >> If this were really a question of safety, I'd agree. And if we > >> were > >> talking about gather loads, I'd agree. For a regular vector loads, > >> I > >> don't see this as a safety issue. We should outline what the > >> downside of emitting a regular load would actually be should some > >> optimization be done to the select. Can you please elaborate on > >> this? > > > > Nevermind ;) -- I changed my mind, the safety issue is with > > non-aligned loads that might cross page boundaries. Is that right? > > That's just one safety issue. There are others.Can you be more specific? You mentioned overindexing in your other e-mail, exactly what do you mean by that? Thanks again, Hal> > > If so, I think this proposal is good (although obviously the docs > > need > > to make clear what the faulting behavior of these intrinsics is). > > The behavior should be not to ever fault on an element whose mask bit > is > false, and behave as a regular load (wrt trapping) for any element > whose > mask bit is true. > > -David >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory