Smith, Kevin B
2014-Oct-24 19:40 UTC
[LLVMdev] Adding masked vector load and store intrinsics
> How would one express such semantics in LLVM IR with this intrinsic? By definition, %data anmd %passthrough are different IR virtual registers and there are no copy instructions in LLVM IR.You never need to express this semantic in LLVM IR, because in SSA form they are always different SSA defs for the result of the operation versus the inputs to the operation. Someplace late in the CG needs to handle this, in exactly an analogous fashion as it already has to handle this for mapping to regular X86 two address code. For example, this LLVM IR %add = add nsw i32 %b, %a gets converted into # *** IR Dump After Expand ISel Pseudo-instructions ***: # Machine code for function foo: SSA Function Live Ins: %EDI in %vreg0, %ESI in %vreg1 BB#0: derived from LLVM BB %entry Live Ins: %EDI %ESI %vreg1<def> = COPY %ESI; GR32:%vreg1 %vreg0<def> = COPY %EDI; GR32:%vreg0 %vreg2<def,tied1> = ADD32rr %vreg1<tied0>, %vreg0, %EFLAGS<imp-def,dead> ; GR32:%vreg2,%vreg1,%vreg0 in ISEL. So, the necessary instruction semantic needn't be represented in LLVM IR. It is created once you have to do mapping to "real" machine instructions using virtual registers, where copies, and the ability to mark a destination and a source as "tied" together are representable. Kevin -----Original Message----- From: dag at cray.com [mailto:dag at cray.com] Sent: Friday, October 24, 2014 12:23 PM To: Smith, Kevin B Cc: Demikhovsky, Elena; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics "Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> So %passthrough can *only* be undef or zeroinitializer? > > No, that wasn't the intent. %passthrough can be any other definition > that is needed. Zero and undef were simply two possible values that > illustrated some interesting behavior.> Mapping of the %passthrough to the actual semantics of many vector > instruction sets where the masked instructions leave the masked-off > elements of the destination unchanged is done in a similar manner as > three-address instructions are turned into two address instructions, > by placing a copy as necessary so that dest and passthrough are in the > same register.How would one express such semantics in LLVM IR with this intrinsic? By definition, %data anmd %passthrough are different IR virtual registers and there are no copy instructions in LLVM IR. In the more general case: %b = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %a, i32 4, <8 x i1> %mask) where %a and %b have no relation to each other, I presume the backend would be responsible for doing a select/merge after the load if the ISA didn't directly support the merge as part of the load operation. Right? -David
dag at cray.com
2014-Oct-24 22:09 UTC
[LLVMdev] Adding masked vector load and store intrinsics
"Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> How would one express such semantics in LLVM IR with this intrinsic? >> By definition, %data anmd %passthrough are different IR virtual >> registers and there are no copy instructions in LLVM IR. > > You never need to express this semantic in LLVM IR, because in SSA > form they are always different SSA defs for the result of the > operation versus the inputs to the operation. Someplace late in the > CG needs to handle this, in exactly an analogous fashion as it already > has to handle this for mapping to regular X86 two address code.Ok, I think that works. I was concerned there may be some reason to express this at the IR level for, say, AVX-512 because of masks but I think you're right, the normal two-operand handling scheme can take care of it. -David
dag at cray.com
2014-Oct-24 22:12 UTC
[LLVMdev] Adding masked vector load and store intrinsics
"Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> How would one express such semantics in LLVM IR with this intrinsic? >> By definition, %data anmd %passthrough are different IR virtual >> registers and there are no copy instructions in LLVM IR. > > You never need to express this semantic in LLVM IR, because in SSA > form they are always different SSA defs for the result of the > operation versus the inputs to the operation. Someplace late in the > CG needs to handle this, in exactly an analogous fashion as it already > has to handle this for mapping to regular X86 two address code.Following up, doing it this way will require that additional intrinsics (for exmaple, all FP operations) each have an additional passthrough register operand: %result = llvm.masked.fadd(%a, %b, %mask, %passthrough) Otherwise we would need some implicit specification that either %a or %b is the passthrough which seems very wrong for a general intrinsic. Is this how you see this going? -David
Smith, Kevin B
2014-Oct-24 22:30 UTC
[LLVMdev] Adding masked vector load and store intrinsics
Yes, IMO that has to be the direction in order for SSA form to work properly for masked operations. Kevin B. Smith -----Original Message----- From: dag at cray.com [mailto:dag at cray.com] Sent: Friday, October 24, 2014 3:13 PM To: Smith, Kevin B Cc: dag at cray.com; Demikhovsky, Elena; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics "Smith, Kevin B" <kevin.b.smith at intel.com> writes:>> How would one express such semantics in LLVM IR with this intrinsic? >> By definition, %data anmd %passthrough are different IR virtual >> registers and there are no copy instructions in LLVM IR. > > You never need to express this semantic in LLVM IR, because in SSA > form they are always different SSA defs for the result of the > operation versus the inputs to the operation. Someplace late in the > CG needs to handle this, in exactly an analogous fashion as it already > has to handle this for mapping to regular X86 two address code.Following up, doing it this way will require that additional intrinsics (for exmaple, all FP operations) each have an additional passthrough register operand: %result = llvm.masked.fadd(%a, %b, %mask, %passthrough) Otherwise we would need some implicit specification that either %a or %b is the passthrough which seems very wrong for a general intrinsic. Is this how you see this going? -David