Smith, Kevin B
2014-Oct-24 17:58 UTC
[LLVMdev] Adding masked vector load and store intrinsics
> So %passthrough can *only* be undef or zeroinitializer?No, that wasn't the intent. %passthrough can be any other definition that is needed. Zero and undef were simply two possible values that illustrated some interesting behavior. Mapping of the %passthrough to the actual semantics of many vector instruction sets where the masked instructions leave the masked-off elements of the destination unchanged is done in a similar manner as three-address instructions are turned into two address instructions, by placing a copy as necessary so that dest and passthrough are in the same register. Kevin B. Smith -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of dag at cray.com Sent: Friday, October 24, 2014 10:21 AM To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics "Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> > %passthru, i32 4, <8 x i1> %mask) > where %passthru is used to fill the elements of %data that are > masked-off (if any; can be zeroinitializer or undef).So %passthrough can *only* be undef or zeroinitializer? If that's the case it might make more sense to have two intrinsics, one that fills with undef and one that fills with zero. Using a general vector operand with a restriction on valid values seems odd and potentially misleading. Another option is to always fill with undef and require a select on top of the load to fill with zero. The load + select would be easily matchable to a target instruction. I'm trying to think beyond just AVX-512 to what other future architectures might want. It's not a given that future architectures will fill with zero *or* undef though those are the two most likely fill values. -David _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
dag at cray.com
2014-Oct-24 19:22 UTC
[LLVMdev] Adding masked vector load and store intrinsics
"Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> So %passthrough can *only* be undef or zeroinitializer? > > No, that wasn't the intent. %passthrough can be any other definition > that is needed. Zero and undef were simply two possible values that > illustrated some interesting behavior.> Mapping of the %passthrough to the actual semantics of many vector > instruction sets where the masked instructions leave the masked-off > elements of the destination unchanged is done in a similar manner as > three-address instructions are turned into two address instructions, > by placing a copy as necessary so that dest and passthrough are in the > same register.How would one express such semantics in LLVM IR with this intrinsic? By definition, %data anmd %passthrough are different IR virtual registers and there are no copy instructions in LLVM IR. In the more general case: %b = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %a, i32 4, <8 x i1> %mask) where %a and %b have no relation to each other, I presume the backend would be responsible for doing a select/merge after the load if the ISA didn't directly support the merge as part of the load operation. Right? -David
Smith, Kevin B
2014-Oct-24 19:40 UTC
[LLVMdev] Adding masked vector load and store intrinsics
> How would one express such semantics in LLVM IR with this intrinsic? By definition, %data anmd %passthrough are different IR virtual registers and there are no copy instructions in LLVM IR.You never need to express this semantic in LLVM IR, because in SSA form they are always different SSA defs for the result of the operation versus the inputs to the operation. Someplace late in the CG needs to handle this, in exactly an analogous fashion as it already has to handle this for mapping to regular X86 two address code. For example, this LLVM IR %add = add nsw i32 %b, %a gets converted into # *** IR Dump After Expand ISel Pseudo-instructions ***: # Machine code for function foo: SSA Function Live Ins: %EDI in %vreg0, %ESI in %vreg1 BB#0: derived from LLVM BB %entry Live Ins: %EDI %ESI %vreg1<def> = COPY %ESI; GR32:%vreg1 %vreg0<def> = COPY %EDI; GR32:%vreg0 %vreg2<def,tied1> = ADD32rr %vreg1<tied0>, %vreg0, %EFLAGS<imp-def,dead> ; GR32:%vreg2,%vreg1,%vreg0 in ISEL. So, the necessary instruction semantic needn't be represented in LLVM IR. It is created once you have to do mapping to "real" machine instructions using virtual registers, where copies, and the ability to mark a destination and a source as "tied" together are representable. Kevin -----Original Message----- From: dag at cray.com [mailto:dag at cray.com] Sent: Friday, October 24, 2014 12:23 PM To: Smith, Kevin B Cc: Demikhovsky, Elena; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics "Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> So %passthrough can *only* be undef or zeroinitializer? > > No, that wasn't the intent. %passthrough can be any other definition > that is needed. Zero and undef were simply two possible values that > illustrated some interesting behavior.> Mapping of the %passthrough to the actual semantics of many vector > instruction sets where the masked instructions leave the masked-off > elements of the destination unchanged is done in a similar manner as > three-address instructions are turned into two address instructions, > by placing a copy as necessary so that dest and passthrough are in the > same register.How would one express such semantics in LLVM IR with this intrinsic? By definition, %data anmd %passthrough are different IR virtual registers and there are no copy instructions in LLVM IR. In the more general case: %b = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %a, i32 4, <8 x i1> %mask) where %a and %b have no relation to each other, I presume the backend would be responsible for doing a select/merge after the load if the ISA didn't directly support the merge as part of the load operation. Right? -David