Demikhovsky, Elena
2014-Oct-28 12:26 UTC
[LLVMdev] Adding masked vector load and store intrinsics
Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt. Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs? A new instruction like %a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask is possible, but may be not very useful for most of targets. So we start from intrinsics. - Elena From: Owen Anderson [mailto:resistor at mac.com] Sent: Monday, October 27, 2014 18:59 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; dag at cray.com Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics Since this is something that you expect to be supported on all targets, and which requires extensive type overloading, it seems like a perfect candidate for being an Instruction rather than an intrinsic. —Owen On Oct 27, 2014, at 12:02 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: we just follow a common recommendation to start with intrinsics: http://llvm.org/docs/ExtendingLLVM.html - Elena From: Owen Anderson [mailto:resistor at mac.com] Sent: Sunday, October 26, 2014 23:57 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>; dag at cray.com<mailto:dag at cray.com> Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics What is the motivation for using intrinsics versus adding new instructions? —Owen On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote: Hi, We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP vectorizer can potentially be enhanced to use these intrinsics as well. The intrinsics would be legal for all targets; targets that do not support masked vector loads or stores will scalarize them. The addressed memory will not be touched for masked-off lanes. In particular, if all lanes are masked off no address will be accessed. call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, <16 x i1> %mask) %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %passthru, i32 4, <8 x i1> %mask) where %passthru is used to fill the elements of %data that are masked-off (if any; can be zeroinitializer or undef). Comments so far, before we dive into more details? Thank you. - Elena and Ayal --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141028/8020a160/attachment.html>
Owen Anderson
2014-Oct-28 16:30 UTC
[LLVMdev] Adding masked vector load and store intrinsics
I would have no issue promoting some of the fundamental floating point operations that are currently intrinsics to instructions, though I don’t think there’s a strong impetus to do so at this time. The only “deep” reasons for the guidance to start with intrinsics is (1) it’s more work to add an instruction, and (2) in theory the instruction opcode space is bounded, though this has never been a practical problem. The advantages include better compile-time (not having to string-match function names), more compact bitcode representation, and cleaner IR syntax particularly vis-a-vis type overloading. There’s a big qualitative difference between fabs and these masked operations, mostly because of the degree of type overloading you intend to support. I am very concerned that the IR that will contain these constructs will be dramatically harder to read because of it. —Owen> On Oct 28, 2014, at 5:26 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote: > > Many oveloaded intrinsics may be replaced with instructions - fabs or fma or sqrt. > Chandler will probably explain the criteria. What the diff between fma and fadd? Or fptrunc and fabs? > > A new instruction like > %a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x i1>%mask > is possible, but may be not very useful for most of targets. > So we start from intrinsics. > > - Elena > > From: Owen Anderson [mailto:resistor at mac.com <mailto:resistor at mac.com>] > Sent: Monday, October 27, 2014 18:59 > To: Demikhovsky, Elena > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>; dag at cray.com <mailto:dag at cray.com> > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > Since this is something that you expect to be supported on all targets, and which requires extensive type overloading, it seems like a perfect candidate for being an Instruction rather than an intrinsic. > > —Owen > > On Oct 27, 2014, at 12:02 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote: > > we just follow a common recommendation to start with intrinsics: > http://llvm.org/docs/ExtendingLLVM.html <http://llvm.org/docs/ExtendingLLVM.html> > > > - Elena > > From: Owen Anderson [mailto:resistor at mac.com <mailto:resistor at mac.com>] > Sent: Sunday, October 26, 2014 23:57 > To: Demikhovsky, Elena > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>; dag at cray.com <mailto:dag at cray.com> > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > What is the motivation for using intrinsics versus adding new instructions? > > —Owen > > On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote: > > Hi, > > We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP vectorizer can potentially be enhanced to use these intrinsics as well. > > The intrinsics would be legal for all targets; targets that do not support masked vector loads or stores will scalarize them. > The addressed memory will not be touched for masked-off lanes. In particular, if all lanes are masked off no address will be accessed. > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, <16 x i1> %mask) > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %passthru, i32 4, <8 x i1> %mask) > > where %passthru is used to fill the elements of %data that are masked-off (if any; can be zeroinitializer or undef). > > Comments so far, before we dive into more details? > > Thank you. > > - Elena and Ayal > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> > > --------------------------------------------------------------------- > Intel Israel (74) Limited > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141028/31e9f2bb/attachment.html>
----- Original Message -----> From: "Owen Anderson" <resistor at mac.com> > To: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu > Sent: Tuesday, October 28, 2014 11:30:15 AM > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > > I would have no issue promoting some of the fundamental floating > point operations that are currently intrinsics to instructions, > though I don’t think there’s a strong impetus to do so at this time. > > > The only “deep” reasons for the guidance to start with intrinsics is > (1) it’s more work to add an instruction, and (2) in theory the > instruction opcode space is bounded, though this has never been a > practical problem. The advantages include better compile-time (not > having to string-match function names), more compact bitcode > representation, and cleaner IR syntax particularly vis-a-vis type > overloading.I think that starting with the intrinsics, for now, will be the right path while we figure out exactly what the design space is and use cases are. For the moment if these are primarily generated by the loop vectorizer, it should not be a big problem. Obviously when adding new instructions, there are a lot of switch statements to update ;)> > > There’s a big qualitative difference between fabs and these masked > operations, mostly because of the degree of type overloading you > intend to support. I am very concerned that the IR that will contain > these constructs will be dramatically harder to read because of it.I think this ties back to the other thread on intrinsics name mangling (and the lack of the need for it). I think that, at least, Elena, Philip and I agree that, generally speaking, we'd like to clean this up, but that we should do this as a separate change independent of this. The memcpy.whatever are not easy to read either ;) -- and I agree that this could make things worse in that regard. -Hal> > > —Owen > > > > > > On Oct 28, 2014, at 5:26 AM, Demikhovsky, Elena < > elena.demikhovsky at intel.com > wrote: > > > > Many oveloaded intrinsics may be replaced with instructions - fabs or > fma or sqrt. > Chandler will probably explain the criteria. What the diff between > fma and fadd? Or fptrunc and fabs? > > A new instruction like > %a = loadm <4 x i32>* %addr, <4 x i32> %passthru, i32 4, <4 x > i1>%mask > is possible, but may be not very useful for most of targets. > So we start from intrinsics. > > > - Elena > > > > From: Owen Anderson [ mailto:resistor at mac.com ] > Sent: Monday, October 27, 2014 18:59 > To: Demikhovsky, Elena > Cc: llvmdev at cs.uiuc.edu ; dag at cray.com > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > Since this is something that you expect to be supported on all > targets, and which requires extensive type overloading, it seems > like a perfect candidate for being an Instruction rather than an > intrinsic. > > > > —Owen > > > > > > > On Oct 27, 2014, at 12:02 AM, Demikhovsky, Elena < > elena.demikhovsky at intel.com > wrote: > > > > we just follow a common recommendation to start with intrinsics: > > http://llvm.org/docs/ExtendingLLVM.html > > > > > > > - Elena > > > > > > From: Owen Anderson [ mailto:resistor at mac.com ] > Sent: Sunday, October 26, 2014 23:57 > To: Demikhovsky, Elena > Cc: llvmdev at cs.uiuc.edu ; dag at cray.com > Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics > > > > What is the motivation for using intrinsics versus adding new > instructions? > > > > > > —Owen > > > > > > > > > On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena < > elena.demikhovsky at intel.com > wrote: > > > > > > Hi, > > > > > > We would like to add support for masked vector loads and stores by > introducing new target-independent intrinsics. The loop vectorizer > will then be enhanced to optimize loops containing conditional > memory accesses by generating these intrinsics for existing targets > such as AVX2 and AVX-512. The vectorizer will first ask the target > about availability of masked vector loads and stores. The SLP > vectorizer can potentially be enhanced to use these intrinsics as > well. > > > > > > The intrinsics would be legal for all targets; targets that do not > support masked vector loads or stores will scalarize them. > > > The addressed memory will not be touched for masked-off lanes. In > particular, if all lanes are masked off no address will be accessed. > > > > > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, > <16 x i1> %mask) > > > > > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > %passthru, i32 4, <8 x i1> %mask) > > > > > > where %passthru is used to fill the elements of %data that are > masked-off (if any; can be zeroinitializer or undef). > > > > > > Comments so far, before we dive into more details? > > > > > > Thank you. > > > > > > - Elena and Ayal > > > > > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory