> Why can't we represent the loads as select(mask, load(addr), passthru)?This suggests masked-off lanes are free to speculatively load from memory. Whereas proposed semantics is that:> The addressed memory will not be touched for masked-off lanes. In > particular, if all lanes are masked off no address will be accessed.Ayal. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Friday, October 24, 2014 15:50 To: Demikhovsky, Elena Cc: dag at cray.com; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics ----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > To: llvmdev at cs.uiuc.edu > Cc: dag at cray.com > Sent: Friday, October 24, 2014 6:24:15 AM > Subject: [LLVMdev] Adding masked vector load and store intrinsics > > > > Hi, > > We would like to add support for masked vector loads and stores by > introducing new target-independent intrinsics. The loop vectorizer > will then be enhanced to optimize loops containing conditional memory > accesses by generating these intrinsics for existing targets such as > AVX2 and AVX-512. The vectorizer will first ask the target about > availability of masked vector loads and stores. The SLP vectorizer can > potentially be enhanced to use these intrinsics as well. > > The intrinsics would be legal for all targets; targets that do not > support masked vector loads or stores will scalarize them. > The addressed memory will not be touched for masked-off lanes. In > particular, if all lanes are masked off no address will be accessed. > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, > <16 x i1> %mask) > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > %passthru, i32 4, <8 x i1> %mask) > > where %passthru is used to fill the elements of %data that are > masked-off (if any; can be zeroinitializer or undef). > > Comments so far, before we dive into more details?For the stores, I think this is a reasonable idea. The alternative is to represent them in scalar form with a lot of control flow, and I think that expecting the backend to properly pattern match that after isel is not realistic. For the loads, I'm must less sure. Why can't we represent the loads as select(mask, load(addr), passthru)? It is true, that the load might get separated from the select so that isel might not see it (because isel if basic-block local), but we can add some code in CodeGenPrep to fix that for targets on which it is useful to do so (which is a more-general solution than the intrinsic anyhow). What do you think? Thanks again, Hal> > Thank you. > > - Elena and Ayal > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
----- Original Message -----> From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu > Sent: Friday, October 24, 2014 9:46:01 AM > Subject: RE: [LLVMdev] Adding masked vector load and store intrinsics > > > Why can't we represent the loads as select(mask, load(addr), > > passthru)? > > This suggests masked-off lanes are free to speculatively load from > memory. Whereas proposed semantics is that: > > > The addressed memory will not be touched for masked-off lanes. In > > particular, if all lanes are masked off no address will be > > accessed.Agreed -- as I said in an e-mail that you probably did not see before you wrote this ;) -- but we should make sure to explicitly state this in the rationale. "touched" is not really the right term here. The underlying issue is that it allows us to deal with unaligned loads that cross page boundaries - i.e. that a masked-off load is safe to speculate. On a related note, I presume that the 'i32 4' in the provided example is the alignment. Is that correct? Thanks again, Hal> > Ayal. > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel > Sent: Friday, October 24, 2014 15:50 > To: Demikhovsky, Elena > Cc: dag at cray.com; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > ----- Original Message ----- > > From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > > To: llvmdev at cs.uiuc.edu > > Cc: dag at cray.com > > Sent: Friday, October 24, 2014 6:24:15 AM > > Subject: [LLVMdev] Adding masked vector load and store intrinsics > > > > > > > > Hi, > > > > We would like to add support for masked vector loads and stores by > > introducing new target-independent intrinsics. The loop vectorizer > > will then be enhanced to optimize loops containing conditional > > memory > > accesses by generating these intrinsics for existing targets such > > as > > AVX2 and AVX-512. The vectorizer will first ask the target about > > availability of masked vector loads and stores. The SLP vectorizer > > can > > potentially be enhanced to use these intrinsics as well. > > > > The intrinsics would be legal for all targets; targets that do not > > support masked vector loads or stores will scalarize them. > > The addressed memory will not be touched for masked-off lanes. In > > particular, if all lanes are masked off no address will be > > accessed. > > > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, > > <16 x i1> %mask) > > > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > > %passthru, i32 4, <8 x i1> %mask) > > > > where %passthru is used to fill the elements of %data that are > > masked-off (if any; can be zeroinitializer or undef). > > > > Comments so far, before we dive into more details? > > For the stores, I think this is a reasonable idea. The alternative is > to represent them in scalar form with a lot of control flow, and I > think that expecting the backend to properly pattern match that > after isel is not realistic. > > For the loads, I'm must less sure. Why can't we represent the loads > as select(mask, load(addr), passthru)? It is true, that the load > might get separated from the select so that isel might not see it > (because isel if basic-block local), but we can add some code in > CodeGenPrep to fix that for targets on which it is useful to do so > (which is a more-general solution than the intrinsic anyhow). What > do you think? > > Thanks again, > Hal > > > > > Thank you. > > > > - Elena and Ayal > > > > > > > > --------------------------------------------------------------------- > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material > > for > > the sole use of the intended recipient(s). Any review or > > distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Demikhovsky, Elena
2014-Oct-24 15:37 UTC
[LLVMdev] Adding masked vector load and store intrinsics
> On a related note, I presume that the 'i32 4' in the provided example is the alignment. Is that correct?yes. - Elena -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Friday, October 24, 2014 18:28 To: Zaks, Ayal Cc: dag at cray.com; llvmdev at cs.uiuc.edu; Demikhovsky, Elena Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics ----- Original Message -----> From: "Ayal Zaks" <ayal.zaks at intel.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Elena Demikhovsky" > <elena.demikhovsky at intel.com> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu > Sent: Friday, October 24, 2014 9:46:01 AM > Subject: RE: [LLVMdev] Adding masked vector load and store intrinsics > > > Why can't we represent the loads as select(mask, load(addr), > > passthru)? > > This suggests masked-off lanes are free to speculatively load from > memory. Whereas proposed semantics is that: > > > The addressed memory will not be touched for masked-off lanes. In > > particular, if all lanes are masked off no address will be accessed.Agreed -- as I said in an e-mail that you probably did not see before you wrote this ;) -- but we should make sure to explicitly state this in the rationale. "touched" is not really the right term here. The underlying issue is that it allows us to deal with unaligned loads that cross page boundaries - i.e. that a masked-off load is safe to speculate. On a related note, I presume that the 'i32 4' in the provided example is the alignment. Is that correct? Thanks again, Hal> > Ayal. > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel > Sent: Friday, October 24, 2014 15:50 > To: Demikhovsky, Elena > Cc: dag at cray.com; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > ----- Original Message ----- > > From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > > To: llvmdev at cs.uiuc.edu > > Cc: dag at cray.com > > Sent: Friday, October 24, 2014 6:24:15 AM > > Subject: [LLVMdev] Adding masked vector load and store intrinsics > > > > > > > > Hi, > > > > We would like to add support for masked vector loads and stores by > > introducing new target-independent intrinsics. The loop vectorizer > > will then be enhanced to optimize loops containing conditional > > memory accesses by generating these intrinsics for existing targets > > such as > > AVX2 and AVX-512. The vectorizer will first ask the target about > > availability of masked vector loads and stores. The SLP vectorizer > > can potentially be enhanced to use these intrinsics as well. > > > > The intrinsics would be legal for all targets; targets that do not > > support masked vector loads or stores will scalarize them. > > The addressed memory will not be touched for masked-off lanes. In > > particular, if all lanes are masked off no address will be accessed. > > > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, > > <16 x i1> %mask) > > > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > > %passthru, i32 4, <8 x i1> %mask) > > > > where %passthru is used to fill the elements of %data that are > > masked-off (if any; can be zeroinitializer or undef). > > > > Comments so far, before we dive into more details? > > For the stores, I think this is a reasonable idea. The alternative is > to represent them in scalar form with a lot of control flow, and I > think that expecting the backend to properly pattern match that after > isel is not realistic. > > For the loads, I'm must less sure. Why can't we represent the loads as > select(mask, load(addr), passthru)? It is true, that the load might > get separated from the select so that isel might not see it (because > isel if basic-block local), but we can add some code in CodeGenPrep to > fix that for targets on which it is useful to do so (which is a > more-general solution than the intrinsic anyhow). What do you think? > > Thanks again, > Hal > > > > > Thank you. > > > > - Elena and Ayal > > > > > > > > -------------------------------------------------------------------- > > - > > Intel Israel (74) Limited > > > > This e-mail and any attachments may contain confidential material > > for the sole use of the intended recipient(s). Any review or > > distribution by others is strictly prohibited. If you are not the > > intended recipient, please contact the sender and delete all copies. > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.