thr3ads.net - llvm dev - [LLVMdev] Adding masked vector load and store intrinsics [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Smith, Kevin B

2014-Oct-24 17:58 UTC

[LLVMdev] Adding masked vector load and store intrinsics

> So %passthrough can *only* be undef or zeroinitializer?
No, that wasn't the intent.  %passthrough can be any other definition that
is needed.  Zero and undef were simply two possible values that illustrated some
interesting behavior.
Mapping of the %passthrough to the actual semantics of many vector instruction
sets where the masked instructions leave the masked-off elements of the
destination unchanged
is done in a similar manner as three-address instructions are turned into two
address instructions, by placing a copy as necessary so that dest and
passthrough are in the same register.

Kevin B. Smith

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of dag at cray.com
Sent: Friday, October 24, 2014 10:21 AM
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> where %passthru is used to fill the elements of %data that are
> masked-off (if any; can be zeroinitializer or undef).
So %passthrough can *only* be undef or zeroinitializer?  If that's the
case it might make more sense to have two intrinsics, one that fills
with undef and one that fills with zero.  Using a general vector operand
with a restriction on valid values seems odd and potentially misleading.

Another option is to always fill with undef and require a select on top
of the load to fill with zero.  The load + select would be easily
matchable to a target instruction.

I'm trying to think beyond just AVX-512 to what other future
architectures might want.  It's not a given that future architectures
will fill with zero *or* undef though those are the two most likely fill
values.

                             -David
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

dag at cray.com

2014-Oct-24 19:22 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

"Smith, Kevin B" <kevin.b.smith at intel.com> writes:
>> So %passthrough can *only* be undef or zeroinitializer?
>
> No, that wasn't the intent.  %passthrough can be any other definition
> that is needed.  Zero and undef were simply two possible values that
> illustrated some interesting behavior.
> Mapping of the %passthrough to the actual semantics of many vector
> instruction sets where the masked instructions leave the masked-off
> elements of the destination unchanged is done in a similar manner as
> three-address instructions are turned into two address instructions,
> by placing a copy as necessary so that dest and passthrough are in the
> same register.
How would one express such semantics in LLVM IR with this intrinsic?  By
definition, %data anmd %passthrough are different IR virtual registers
and there are no copy instructions in LLVM IR.

In the more general case:

%b = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %a, i32
4, <8 x i1> %mask)

where %a and %b have no relation to each other, I presume the backend
would be responsible for doing a select/merge after the load if the ISA
didn't directly support the merge as part of the load operation.  Right?

                                 -David

Smith, Kevin B

2014-Oct-24 19:40 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> How would one express such semantics in LLVM IR with this intrinsic?  By
definition, %data anmd %passthrough are different IR virtual registers and there
are no copy instructions in LLVM IR.
You never need to express this semantic in LLVM IR, because in SSA form they are
always different SSA defs for the result of the operation versus the inputs to
the operation.  Someplace late in the CG needs to handle
this, in exactly an analogous fashion as it already has to handle this for
mapping to regular X86 two address code.

For example, this LLVM IR

%add = add nsw i32 %b, %a

gets converted into

# *** IR Dump After Expand ISel Pseudo-instructions ***:
# Machine code for function foo: SSA
Function Live Ins: %EDI in %vreg0, %ESI in %vreg1

BB#0: derived from LLVM BB %entry
    Live Ins: %EDI %ESI
        %vreg1<def> = COPY %ESI; GR32:%vreg1
        %vreg0<def> = COPY %EDI; GR32:%vreg0
        %vreg2<def,tied1> = ADD32rr %vreg1<tied0>, %vreg0,
%EFLAGS<imp-def,dead>
; GR32:%vreg2,%vreg1,%vreg0

in ISEL.  So, the necessary instruction semantic needn't be represented in
LLVM IR.  It is created once you have to do mapping to "real" machine
instructions using virtual registers, where copies, and the ability to mark a
destination and a
source as "tied" together are representable.

Kevin

-----Original Message-----
From: dag at cray.com [mailto:dag at cray.com] 
Sent: Friday, October 24, 2014 12:23 PM
To: Smith, Kevin B
Cc: Demikhovsky, Elena; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

"Smith, Kevin B" <kevin.b.smith at intel.com> writes:
>> So %passthrough can *only* be undef or zeroinitializer?
>
> No, that wasn't the intent.  %passthrough can be any other definition
> that is needed.  Zero and undef were simply two possible values that
> illustrated some interesting behavior.
> Mapping of the %passthrough to the actual semantics of many vector
> instruction sets where the masked instructions leave the masked-off
> elements of the destination unchanged is done in a similar manner as
> three-address instructions are turned into two address instructions,
> by placing a copy as necessary so that dest and passthrough are in the
> same register.
How would one express such semantics in LLVM IR with this intrinsic?  By
definition, %data anmd %passthrough are different IR virtual registers
and there are no copy instructions in LLVM IR.

In the more general case:

%b = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %a, i32
4, <8 x i1> %mask)

where %a and %b have no relation to each other, I presume the backend
would be responsible for doing a select/merge after the load if the ISA
didn't directly support the merge as part of the load operation.  Right?

                                 -David

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Oct 2014 - [LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

Apparently Analagous Threads