thr3ads.net - llvm dev - [LLVMdev] Adding masked vector load and store intrinsics [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Demikhovsky, Elena

2014-Oct-24 11:24 UTC

[LLVMdev] Adding masked vector load and store intrinsics

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/d56ce6e1/attachment.html>

Hal Finkel

2014-Oct-24 12:50 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by
> introducing new target-independent intrinsics. The loop vectorizer
> will then be enhanced to optimize loops containing conditional
> memory accesses by generating these intrinsics for existing targets
> such as AVX2 and AVX-512. The vectorizer will first ask the target
> about availability of masked vector loads and stores. The SLP
> vectorizer can potentially be enhanced to use these intrinsics as
> well.
> 
> The intrinsics would be legal for all targets; targets that do not
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
For the stores, I think this is a reasonable idea. The alternative is to
represent them in scalar form with a lot of control flow, and I think that
expecting the backend to properly pattern match that after isel is not
realistic.

For the loads, I'm must less sure. Why can't we represent the loads as
select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?

Thanks again,
Hal
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Demikhovsky, Elena

2014-Oct-24 13:07 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> For the loads, I'm must less sure. Why can't we represent the loads
as select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?
We generate the vector-masked-intrinsic on IR-to-IR pass. It is too far from
instruction selection. We'll need to guarantee that all subsequent IR-to-IR
passes will not break the sequence. And only for one or two specific targets.
Then we'll keep the logic in type legalizer, which may split or extend
operations. Then we are taking care in DAG-combine.
In my opinion, this is just unsafe.

-  Elena


-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Friday, October 24, 2014 15:50
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by 
> introducing new target-independent intrinsics. The loop vectorizer 
> will then be enhanced to optimize loops containing conditional memory 
> accesses by generating these intrinsics for existing targets such as 
> AVX2 and AVX-512. The vectorizer will first ask the target about 
> availability of masked vector loads and stores. The SLP vectorizer can 
> potentially be enhanced to use these intrinsics as well.
> 
> The intrinsics would be legal for all targets; targets that do not 
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In 
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are 
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
For the stores, I think this is a reasonable idea. The alternative is to
represent them in scalar form with a lot of control flow, and I think that
expecting the backend to properly pattern match that after isel is not
realistic.

For the loads, I'm must less sure. Why can't we represent the loads as
select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?

Thanks again,
Hal
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Das, Dibyendu

2014-Oct-24 13:19 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

This looks to be a reasonable proposal. However native instructions that support
such masked ld/st may have a high latency ? Also, it would be good to state some
workloads where this will have a positive impact.

-dibyendu

From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]
Sent: Friday, October 24, 2014 06:24 AM Central Standard Time
To: llvmdev at cs.uiuc.edu <llvmdev at cs.uiuc.edu>
Cc: dag at cray.com <dag at cray.com>
Subject: [LLVMdev] Adding masked vector load and store intrinsics

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/b1fd8842/attachment.html>

Demikhovsky, Elena

2014-Oct-24 13:36 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

I wrote a loop with conditional load and store and measured performance on AVX2,
where masking support is very basic, relatively to AVX-512.
I got 2x speedup with vpmaskmovd.

The maskmov instruction is slower than one vector load or store, but much faster
than 8 scalar memory operations and 8 branches.

Usage of masked instructions on AVX-512 will give much more. There is no latency
on target in comparison to the regular vector memop.

-           Elena

From: Das, Dibyendu [mailto:Dibyendu.Das at amd.com]
Sent: Friday, October 24, 2014 16:20
To: Demikhovsky, Elena; 'llvmdev at cs.uiuc.edu'
Cc: 'dag at cray.com'
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

This looks to be a reasonable proposal. However native instructions that support
such masked ld/st may have a high latency ? Also, it would be good to state some
workloads where this will have a positive impact.

-dibyendu

From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]
Sent: Friday, October 24, 2014 06:24 AM Central Standard Time
To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> <llvmdev at
cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>>
Cc: dag at cray.com<mailto:dag at cray.com> <dag at
cray.com<mailto:dag at cray.com>>
Subject: [LLVMdev] Adding masked vector load and store intrinsics

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal



---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/41aaba88/attachment.html>

Zaks, Ayal

2014-Oct-24 14:46 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> Why can't we represent the loads as select(mask, load(addr), passthru)?
This suggests masked-off lanes are free to speculatively load from memory.
Whereas proposed semantics is that:
> The addressed memory will not be touched for masked-off lanes. In 
> particular, if all lanes are masked off no address will be accessed.
Ayal.

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Hal Finkel
Sent: Friday, October 24, 2014 15:50
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by 
> introducing new target-independent intrinsics. The loop vectorizer 
> will then be enhanced to optimize loops containing conditional memory 
> accesses by generating these intrinsics for existing targets such as 
> AVX2 and AVX-512. The vectorizer will first ask the target about 
> availability of masked vector loads and stores. The SLP vectorizer can 
> potentially be enhanced to use these intrinsics as well.
> 
> The intrinsics would be legal for all targets; targets that do not 
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In 
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are 
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
For the stores, I think this is a reasonable idea. The alternative is to
represent them in scalar form with a lot of control flow, and I think that
expecting the backend to properly pattern match that after isel is not
realistic.

For the loads, I'm must less sure. Why can't we represent the loads as
select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?

Thanks again,
Hal
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

dag at cray.com

2014-Oct-24 16:48 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Hal Finkel <hfinkel at anl.gov> writes:
> For the loads, I'm must less sure. Why can't we represent the loads
as
> select(mask, load(addr), passthru)?
Because that does not specify the correct semantics.  This formulation
expects the load to happen before the mask is applied.  The load could
trap.  The operation needs to be presented as an atomic unit.

The same problem exists with any potentially trapping instruction
(e.g. all floating point computations).  The need for intrinsics goes
way beyond loads and stores.

                             -David

dag at cray.com

2014-Oct-24 17:20 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> where %passthru is used to fill the elements of %data that are
> masked-off (if any; can be zeroinitializer or undef).
So %passthrough can *only* be undef or zeroinitializer?  If that's the
case it might make more sense to have two intrinsics, one that fills
with undef and one that fills with zero.  Using a general vector operand
with a restriction on valid values seems odd and potentially misleading.

Another option is to always fill with undef and require a select on top
of the load to fill with zero.  The load + select would be easily
matchable to a target instruction.

I'm trying to think beyond just AVX-512 to what other future
architectures might want.  It's not a given that future architectures
will fill with zero *or* undef though those are the two most likely fill
values.

                             -David

dag at cray.com

2014-Oct-24 17:22 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
> This looks to be a reasonable proposal. However native instructions
> that support such masked ld/st may have a high latency ? Also, it
> would be good to state some workloads where this will have a positive
> impact.
Any significant vector workload will see a giant gain from this.

The masked operations really shouldn't have any more latency.  The time
of the memory operation itself dominates.

                            -David

Adam Nemet

2014-Oct-24 17:57 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at
intel.com> wrote:
> Hi,
>  
> We would like to add support for masked vector loads and stores by
introducing new target-independent intrinsics. The loop vectorizer will then be
enhanced to optimize loops containing conditional memory accesses by generating
these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer
will first ask the target about availability of masked vector loads and stores.
The SLP vectorizer can potentially be enhanced to use these intrinsics as well.
>  
> The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
I do agree that we would like to have one IR node to capture these so that they
survive until ISel and that their specific semantics can be expressed.  However,
can you discuss the other options (new IR instructions, target-specific
intrinsics) and why you went with target-independent intrinsics.

My intuition would have been to go with target-specific intrinsics until we have
something solid implemented and then potentially turn this into native IR
instructions as the next step (for other targets, etc.).  I am particularly
worried whether we really want to generate these for targets that don’t have
vector predication support.

There is also the related question of vector predicating any other instruction
beyond just loads and stores which AVX512 supports.  This is probably a smaller
gain but should probably be part of the plan as well.

Adam
> The addressed memory will not be touched for masked-off lanes. In
particular, if all lanes are masked off no address will be accessed.
>  
>   call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)
>  
>   %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x
i32> %passthru, i32 4, <8 x i1> %mask)
>  
> where %passthru is used to fill the elements of %data that are masked-off
(if any; can be zeroinitializer or undef).
>  
> Comments so far, before we dive into more details?
>  
> Thank you.
>  
> - Elena and Ayal
>  
>  
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/00588841/attachment.html>

Smith, Kevin B

2014-Oct-24 17:58 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> So %passthrough can *only* be undef or zeroinitializer?
No, that wasn't the intent.  %passthrough can be any other definition that
is needed.  Zero and undef were simply two possible values that illustrated some
interesting behavior.
Mapping of the %passthrough to the actual semantics of many vector instruction
sets where the masked instructions leave the masked-off elements of the
destination unchanged
is done in a similar manner as three-address instructions are turned into two
address instructions, by placing a copy as necessary so that dest and
passthrough are in the same register.

Kevin B. Smith

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of dag at cray.com
Sent: Friday, October 24, 2014 10:21 AM
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> where %passthru is used to fill the elements of %data that are
> masked-off (if any; can be zeroinitializer or undef).
So %passthrough can *only* be undef or zeroinitializer?  If that's the
case it might make more sense to have two intrinsics, one that fills
with undef and one that fills with zero.  Using a general vector operand
with a restriction on valid values seems odd and potentially misleading.

Another option is to always fill with undef and require a select on top
of the load to fill with zero.  The load + select would be easily
matchable to a target instruction.

I'm trying to think beyond just AVX-512 to what other future
architectures might want.  It's not a given that future architectures
will fill with zero *or* undef though those are the two most likely fill
values.

                             -David
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nadav Rotem

2014-Oct-24 18:38 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> On Oct 24, 2014, at 10:57 AM, Adam Nemet <anemet at apple.com> wrote:
> 
> On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at
intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
> 
>> Hi,
>>  
>> We would like to add support for masked vector loads and stores by
introducing new target-independent intrinsics. The loop vectorizer will then be
enhanced to optimize loops containing conditional memory accesses by generating
these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer
will first ask the target about availability of masked vector loads and stores.
The SLP vectorizer can potentially be enhanced to use these intrinsics as well.
>>  
I am happy to hear that you are working on this because it means that in the
future we would be able to teach the SLP Vectorizer to vectorize types of <3
x float>.
>> The intrinsics would be legal for all targets; targets that do not
support masked vector loads or stores will scalarize them.
> 
+1. I think that this is an important requirement. 
> I do agree that we would like to have one IR node to capture these so that
they survive until ISel and that their specific semantics can be expressed. 
However, can you discuss the other options (new IR instructions, target-specific
intrinsics) and why you went with target-independent intrinsics.
> 
I agree with the approach of adding target-independent masked memory intrinsics.
One reason is that I would like to keep the vectorizers target independent (and
use the target transform info to query the backends). I oppose adding new
first-level instructions because we would need to teach all of the existing
optimizations about the new instructions, and considering the limited usefulness
of masked operations it is not worth the effort.
> My intuition would have been to go with target-specific intrinsics until we
have something solid implemented and then potentially turn this into native IR
instructions as the next step (for other targets, etc.).  I am particularly
worried whether we really want to generate these for targets that don’t have
vector predication support.
Probably not, but this is a cost-benefit decision that the vectorizers would
need to make.
> 
> There is also the related question of vector predicating any other
instruction beyond just loads and stores which AVX512 supports.  This is
probably a smaller gain but should probably be part of the plan as well.
> 
> Adam
> 
>> The addressed memory will not be touched for masked-off lanes. In
particular, if all lanes are masked off no address will be accessed.
>>  
>>   call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32
4, <16 x i1> %mask)
>>  
>>   %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x
i32> %passthru, i32 4, <8 x i1> %mask)
>>  
>> where %passthru is used to fill the elements of %data that are
masked-off (if any; can be zeroinitializer or undef).
>>  
>> Comments so far, before we dive into more details?
>>  
>> Thank you.
>>  
>> - Elena and Ayal
>>  
>>  
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/8f9fe89d/attachment.html>

Tian, Xinmin

2014-Oct-24 18:48 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Adam,  yes, there are more stuff we need to consider, e.g. masked gather /
scatter, masked arithmetic ops, ...etc.  This proposal serves the first step
which is an important, as a direction check w/ community.

Xinmin Tian

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Adam Nemet
Sent: Friday, October 24, 2014 10:58 AM
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>> wrote:

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.

I do agree that we would like to have one IR node to capture these so that they
survive until ISel and that their specific semantics can be expressed.  However,
can you discuss the other options (new IR instructions, target-specific
intrinsics) and why you went with target-independent intrinsics.

My intuition would have been to go with target-specific intrinsics until we have
something solid implemented and then potentially turn this into native IR
instructions as the next step (for other targets, etc.).  I am particularly
worried whether we really want to generate these for targets that don't have
vector predication support.

There is also the related question of vector predicating any other instruction
beyond just loads and stores which AVX512 supports.  This is probably a smaller
gain but should probably be part of the plan as well.

Adam

The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/d117864f/attachment.html>

dag at cray.com

2014-Oct-24 19:59 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Adam Nemet <anemet at apple.com> writes:
> I am particularly worried whether we really want to generate these for
> targets that don’t have vector predication support.
We almost certainly don't want to do that.  Clang or whatever is
generating LLVM IR will need to be aware of target vector capabilities.
Still, legalization needs to be available to handle this situation if it
arises.
> There is also the related question of vector predicating any other
> instruction beyond just loads and stores which AVX512 supports. This
> is probably a smaller gain but should probably be part of the plan as
> well.
It's not a small gain, it is a *critical* thing to do.  We have
customers that always run with traps enabled and without masking, it
severely limits what code can be vectorized.

                               -David

Hal Finkel

2014-Oct-24 21:01 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Elena,

As far as I can tell, consensus is strongly in favor. Please submit a patch :-)

Thanks again,
Hal

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by
> introducing new target-independent intrinsics. The loop vectorizer
> will then be enhanced to optimize loops containing conditional
> memory accesses by generating these intrinsics for existing targets
> such as AVX2 and AVX-512. The vectorizer will first ask the target
> about availability of masked vector loads and stores. The SLP
> vectorizer can potentially be enhanced to use these intrinsics as
> well.
> 
> The intrinsics would be legal for all targets; targets that do not
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Demikhovsky, Elena

2014-Oct-25 11:22 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

> So %passthrough can *only* be undef or zeroinitializer?No, it can be any value including undef and zeroinitializer.

We considered, while designing, zero and merge semantics and decided that merge
semantics is better because it covers zero semantics if you use zeroinitializer
in the %paththru.

-  Elena

-----Original Message-----
From: dag at cray.com [mailto:dag at cray.com] 
Sent: Friday, October 24, 2014 20:21
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu; Zaks, Ayal; Nadav Rotem <nrotem at apple.com>
(nrotem at apple.com); Chandler Carruth (chandlerc at google.com); Adam Nemet
(anemet at apple.com)
Subject: Re: Adding masked vector load and store intrinsics

"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask) where %passthru is used to fill the
> elements of %data that are masked-off (if any; can be zeroinitializer 
> or undef).
So %passthrough can *only* be undef or zeroinitializer?  If that's the case
it might make more sense to have two intrinsics, one that fills with undef and
one that fills with zero.  Using a general vector operand with a restriction on
valid values seems odd and potentially misleading.

Another option is to always fill with undef and require a select on top of the
load to fill with zero.  The load + select would be easily matchable to a target
instruction.

I'm trying to think beyond just AVX-512 to what other future architectures
might want.  It's not a given that future architectures will fill with zero
*or* undef though those are the two most likely fill values.

                             -David
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Demikhovsky, Elena

2014-Oct-25 11:40 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Thank you Hal,

meanwhile, I implemented something quick to be sure that it works and estimate
what pieces of LLVM code should be touched.
I'll prepare a patch soon.

-  Elena


-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Saturday, October 25, 2014 00:02
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

Elena,

As far as I can tell, consensus is strongly in favor. Please submit a patch :-)

Thanks again,
Hal

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by 
> introducing new target-independent intrinsics. The loop vectorizer 
> will then be enhanced to optimize loops containing conditional memory 
> accesses by generating these intrinsics for existing targets such as 
> AVX2 and AVX-512. The vectorizer will first ask the target about 
> availability of masked vector loads and stores. The SLP vectorizer can 
> potentially be enhanced to use these intrinsics as well.
> 
> The intrinsics would be legal for all targets; targets that do not 
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In 
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are 
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

shahid shahid

2014-Oct-25 14:53 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Hi Elena,
Nice to see that your thinking are quite similar with mine.
Do you plan to generate this intrinsic in Loop Vectorizer based on subtarget
feature?
If so, it would be better to let it generate here in target independent
manner.Later on,during lowering, based on the availability of target support for
masked ops you can decideeither to scalarize or generate the target masked ops
instruction.
Shahid

     On Friday, October 24, 2014 4:59 PM, "Demikhovsky, Elena"
<elena.demikhovsky at intel.com> wrote:
   

  <!--#yiv1368802508 .yiv1368802508EmailQuote
{margin-left:1pt;padding-left:4pt;border-left:#800000 2px solid;}-->Hi, We
would like to add support for masked vector loads and stores by introducing new
target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating
theseintrinsics for existing targets such as AVX2 and AVX-512. The vectorizer
will first ask the target about availability of masked vector loads and stores.
The SLP vectorizer can potentially be enhanced to use these intrinsics as
well. The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.The addressed memory will not
be touched for masked-off lanes. In particular, if all lanes are masked off no
address will be accessed.   call void @llvm.masked.store (i32* %addr, <16 x
i32> %data, i32 4, <16 x i1> %mask)   %data = call <8 x i32>
@llvm.masked.load (i32* %addr, <8 x i32> %passthru, i32 4, <8 x i1>
%mask) where %passthru is used to fill the elements of %data that are masked-off
(if any; can be zeroinitializer or undef). Comments so far, before we dive into
more details? Thank you. - Elena and
Ayal  ---------------------------------------------------------------------
Intel Israel (74) LimitedThis e-mail and any attachments may contain
confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141025/9d54a5f3/attachment.html>

Demikhovsky, Elena

2014-Oct-26 07:07 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

We may receive less optimal code on other targets as a result. User may want
optimize a sequence of scalar instructions after vectorization did not pass.

-           Elena

From: shahid shahid [mailto:shahid77c at yahoo.com]
Sent: Saturday, October 25, 2014 17:53
To: Demikhovsky, Elena; llvmdev at cs.uiuc.edu
Cc: dag at cray.com
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

Hi Elena,

Nice to see that your thinking are quite similar with mine.

Do you plan to generate this intrinsic in Loop Vectorizer based on subtarget
feature?

If so, it would be better to let it generate here in target independent
manner.Later on,
during lowering, based on the availability of target support for masked ops you
can decide
either to scalarize or generate the target masked ops instruction.

Shahid
On Friday, October 24, 2014 4:59 PM, "Demikhovsky, Elena"
<elena.demikhovsky at intel.com<mailto:elena.demikhovsky at
intel.com>> wrote:

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal

---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>       
http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141026/0a41fcc5/attachment.html>

Owen Anderson

2014-Oct-26 21:56 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

What is the motivation for using intrinsics versus adding new instructions?

—Owen
> On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at
intel.com> wrote:
> 
> Hi,
>  
> We would like to add support for masked vector loads and stores by
introducing new target-independent intrinsics. The loop vectorizer will then be
enhanced to optimize loops containing conditional memory accesses by generating
these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer
will first ask the target about availability of masked vector loads and stores.
The SLP vectorizer can potentially be enhanced to use these intrinsics as well.
>  
> The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In
particular, if all lanes are masked off no address will be accessed.
>  
>   call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)
>  
>   %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x
i32> %passthru, i32 4, <8 x i1> %mask)
>  
> where %passthru is used to fill the elements of %data that are masked-off
(if any; can be zeroinitializer or undef).
>  
> Comments so far, before we dive into more details?
>  
> Thank you.
>  
> - Elena and Ayal
>  
>  
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141026/3c561069/attachment.html>

Demikhovsky, Elena

2014-Oct-27 07:02 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

we just follow  a common recommendation to start with intrinsics:
http://llvm.org/docs/ExtendingLLVM.html

-           Elena

From: Owen Anderson [mailto:resistor at mac.com]
Sent: Sunday, October 26, 2014 23:57
To: Demikhovsky, Elena
Cc: llvmdev at cs.uiuc.edu; dag at cray.com
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

What is the motivation for using intrinsics versus adding new instructions?

—Owen

On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at
intel.com<mailto:elena.demikhovsky at intel.com>> wrote:

Hi,

We would like to add support for masked vector loads and stores by introducing
new target-independent intrinsics. The loop vectorizer will then be enhanced to
optimize loops containing conditional memory accesses by generating these
intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will
first ask the target about availability of masked vector loads and stores. The
SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support
masked vector loads or stores will scalarize them.
The addressed memory will not be touched for masked-off lanes. In particular, if
all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
<16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
%passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if
any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal

---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141027/647272c8/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2014 - [LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

Possibly Parallel Threads