thr3ads.net - llvm dev - [LLVMdev] Adding masked vector load and store intrinsics [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Demikhovsky, Elena

2014-Oct-24 13:07 UTC

[LLVMdev] Adding masked vector load and store intrinsics

> For the loads, I'm must less sure. Why can't we represent the loads
as select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?
We generate the vector-masked-intrinsic on IR-to-IR pass. It is too far from
instruction selection. We'll need to guarantee that all subsequent IR-to-IR
passes will not break the sequence. And only for one or two specific targets.
Then we'll keep the logic in type legalizer, which may split or extend
operations. Then we are taking care in DAG-combine.
In my opinion, this is just unsafe.

-  Elena


-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Friday, October 24, 2014 15:50
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: llvmdev at cs.uiuc.edu
> Cc: dag at cray.com
> Sent: Friday, October 24, 2014 6:24:15 AM
> Subject: [LLVMdev] Adding masked vector load and store intrinsics
> 
> 
> 
> Hi,
> 
> We would like to add support for masked vector loads and stores by 
> introducing new target-independent intrinsics. The loop vectorizer 
> will then be enhanced to optimize loops containing conditional memory 
> accesses by generating these intrinsics for existing targets such as 
> AVX2 and AVX-512. The vectorizer will first ask the target about 
> availability of masked vector loads and stores. The SLP vectorizer can 
> potentially be enhanced to use these intrinsics as well.
> 
> The intrinsics would be legal for all targets; targets that do not 
> support masked vector loads or stores will scalarize them.
> The addressed memory will not be touched for masked-off lanes. In 
> particular, if all lanes are masked off no address will be accessed.
> 
> call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4,
> <16 x i1> %mask)
> 
> %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32>
> %passthru, i32 4, <8 x i1> %mask)
> 
> where %passthru is used to fill the elements of %data that are 
> masked-off (if any; can be zeroinitializer or undef).
> 
> Comments so far, before we dive into more details?
For the stores, I think this is a reasonable idea. The alternative is to
represent them in scalar form with a lot of control flow, and I think that
expecting the backend to properly pattern match that after isel is not
realistic.

For the loads, I'm must less sure. Why can't we represent the loads as
select(mask, load(addr), passthru)? It is true, that the load might get
separated from the select so that isel might not see it (because isel if
basic-block local), but we can add some code in CodeGenPrep to fix that for
targets on which it is useful to do so (which is a more-general solution than
the intrinsic anyhow). What do you think?

Thanks again,
Hal
> 
> Thank you.
> 
> - Elena and Ayal
> 
> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Hal Finkel

2014-Oct-24 13:39 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: dag at cray.com, llvmdev at cs.uiuc.edu, "Ayal Zaks"
<ayal.zaks at intel.com>
> Sent: Friday, October 24, 2014 8:07:18 AM
> Subject: RE: [LLVMdev] Adding masked vector load and store intrinsics
> 
> > For the loads, I'm must less sure. Why can't we represent the
loads
> > as select(mask, load(addr), passthru)? It is true, that the load
> > might get separated from the select so that isel might not see it
> > (because isel if basic-block local), but we can add some code in
> > CodeGenPrep to fix that for targets on which it is useful to do so
> > (which is a more-general solution than the intrinsic anyhow). What
> > do you think?
> 
> We generate the vector-masked-intrinsic on IR-to-IR pass. It is too
> far from instruction selection. We'll need to guarantee that all
> subsequent IR-to-IR passes will not break the sequence.
I'm fully aware of this issue. This needs to be weighed against the cost of
updating all other optimizations that operate on loads to also understand this
intrinsic.
> And only for
> one or two specific targets.
Regardless, they're certainly targets many users care about ;)
> Then we'll keep the logic in type
> legalizer, which may split or extend operations. Then we are taking
> care in DAG-combine.
> In my opinion, this is just unsafe.
If this were really a question of safety, I'd agree. And if we were talking
about gather loads, I'd agree. For a regular vector loads, I don't see
this as a safety issue. We should outline what the downside of emitting a
regular load would actually be should some optimization be done to the select.
Can you please elaborate on this?

Thanks again,
Hal
> 
> -  Elena
> 
> 
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Friday, October 24, 2014 15:50
> To: Demikhovsky, Elena
> Cc: dag at cray.com; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
> 
> ----- Original Message -----
> > From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
> > To: llvmdev at cs.uiuc.edu
> > Cc: dag at cray.com
> > Sent: Friday, October 24, 2014 6:24:15 AM
> > Subject: [LLVMdev] Adding masked vector load and store intrinsics
> > 
> > 
> > 
> > Hi,
> > 
> > We would like to add support for masked vector loads and stores by
> > introducing new target-independent intrinsics. The loop vectorizer
> > will then be enhanced to optimize loops containing conditional
> > memory
> > accesses by generating these intrinsics for existing targets such
> > as
> > AVX2 and AVX-512. The vectorizer will first ask the target about
> > availability of masked vector loads and stores. The SLP vectorizer
> > can
> > potentially be enhanced to use these intrinsics as well.
> > 
> > The intrinsics would be legal for all targets; targets that do not
> > support masked vector loads or stores will scalarize them.
> > The addressed memory will not be touched for masked-off lanes. In
> > particular, if all lanes are masked off no address will be
> > accessed.
> > 
> > call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32
4,
> > <16 x i1> %mask)
> > 
> > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x
i32>
> > %passthru, i32 4, <8 x i1> %mask)
> > 
> > where %passthru is used to fill the elements of %data that are
> > masked-off (if any; can be zeroinitializer or undef).
> > 
> > Comments so far, before we dive into more details?
> 
> For the stores, I think this is a reasonable idea. The alternative is
> to represent them in scalar form with a lot of control flow, and I
> think that expecting the backend to properly pattern match that
> after isel is not realistic.
> 
> For the loads, I'm must less sure. Why can't we represent the loads
> as select(mask, load(addr), passthru)? It is true, that the load
> might get separated from the select so that isel might not see it
> (because isel if basic-block local), but we can add some code in
> CodeGenPrep to fix that for targets on which it is useful to do so
> (which is a more-general solution than the intrinsic anyhow). What
> do you think?
> 
> Thanks again,
> Hal
> 
> > 
> > Thank you.
> > 
> > - Elena and Ayal
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> > 
> > This e-mail and any attachments may contain confidential material
> > for
> > the sole use of the intended recipient(s). Any review or
> > distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel

2014-Oct-24 14:28 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov>
> To: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> Cc: dag at cray.com, llvmdev at cs.uiuc.edu
> Sent: Friday, October 24, 2014 8:39:56 AM
> Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
> 
> ----- Original Message -----
> > From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu, "Ayal Zaks"
> > <ayal.zaks at intel.com>
> > Sent: Friday, October 24, 2014 8:07:18 AM
> > Subject: RE: [LLVMdev] Adding masked vector load and store
> > intrinsics
> > 
> > > For the loads, I'm must less sure. Why can't we represent
the
> > > loads
> > > as select(mask, load(addr), passthru)? It is true, that the load
> > > might get separated from the select so that isel might not see it
> > > (because isel if basic-block local), but we can add some code in
> > > CodeGenPrep to fix that for targets on which it is useful to do
> > > so
> > > (which is a more-general solution than the intrinsic anyhow).
> > > What
> > > do you think?
> > 
> > We generate the vector-masked-intrinsic on IR-to-IR pass. It is too
> > far from instruction selection. We'll need to guarantee that all
> > subsequent IR-to-IR passes will not break the sequence.
> 
> I'm fully aware of this issue. This needs to be weighed against the
> cost of updating all other optimizations that operate on loads to
> also understand this intrinsic.
> 
> > And only for
> > one or two specific targets.
> 
> Regardless, they're certainly targets many users care about ;)
> 
> > Then we'll keep the logic in type
> > legalizer, which may split or extend operations. Then we are taking
> > care in DAG-combine.
> > In my opinion, this is just unsafe.
> 
> If this were really a question of safety, I'd agree. And if we were
> talking about gather loads, I'd agree. For a regular vector loads, I
> don't see this as a safety issue. We should outline what the
> downside of emitting a regular load would actually be should some
> optimization be done to the select. Can you please elaborate on
> this?
Nevermind ;) -- I changed my mind, the safety issue is with non-aligned loads
that might cross page boundaries. Is that right? If so, I think this proposal is
good (although obviously the docs need to make clear what the faulting behavior
of these intrinsics is).

Thanks again,
Hal
> 
> Thanks again,
> Hal
> 
> > 
> > -  Elena
> > 
> > 
> > -----Original Message-----
> > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > Sent: Friday, October 24, 2014 15:50
> > To: Demikhovsky, Elena
> > Cc: dag at cray.com; llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Adding masked vector load and store
> > intrinsics
> > 
> > ----- Original Message -----
> > > From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
> > > To: llvmdev at cs.uiuc.edu
> > > Cc: dag at cray.com
> > > Sent: Friday, October 24, 2014 6:24:15 AM
> > > Subject: [LLVMdev] Adding masked vector load and store intrinsics
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > We would like to add support for masked vector loads and stores
> > > by
> > > introducing new target-independent intrinsics. The loop
> > > vectorizer
> > > will then be enhanced to optimize loops containing conditional
> > > memory
> > > accesses by generating these intrinsics for existing targets such
> > > as
> > > AVX2 and AVX-512. The vectorizer will first ask the target about
> > > availability of masked vector loads and stores. The SLP
> > > vectorizer
> > > can
> > > potentially be enhanced to use these intrinsics as well.
> > > 
> > > The intrinsics would be legal for all targets; targets that do
> > > not
> > > support masked vector loads or stores will scalarize them.
> > > The addressed memory will not be touched for masked-off lanes. In
> > > particular, if all lanes are masked off no address will be
> > > accessed.
> > > 
> > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data,
i32
> > > 4,
> > > <16 x i1> %mask)
> > > 
> > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8
x i32>
> > > %passthru, i32 4, <8 x i1> %mask)
> > > 
> > > where %passthru is used to fill the elements of %data that are
> > > masked-off (if any; can be zeroinitializer or undef).
> > > 
> > > Comments so far, before we dive into more details?
> > 
> > For the stores, I think this is a reasonable idea. The alternative
> > is
> > to represent them in scalar form with a lot of control flow, and I
> > think that expecting the backend to properly pattern match that
> > after isel is not realistic.
> > 
> > For the loads, I'm must less sure. Why can't we represent the
loads
> > as select(mask, load(addr), passthru)? It is true, that the load
> > might get separated from the select so that isel might not see it
> > (because isel if basic-block local), but we can add some code in
> > CodeGenPrep to fix that for targets on which it is useful to do so
> > (which is a more-general solution than the intrinsic anyhow). What
> > do you think?
> > 
> > Thanks again,
> > Hal
> > 
> > > 
> > > Thank you.
> > > 
> > > - Elena and Ayal
> > > 
> > > 
> > > 
> > >
---------------------------------------------------------------------
> > > Intel Israel (74) Limited
> > > 
> > > This e-mail and any attachments may contain confidential material
> > > for
> > > the sole use of the intended recipient(s). Any review or
> > > distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> > 
> > This e-mail and any attachments may contain confidential material
> > for
> > the sole use of the intended recipient(s). Any review or
> > distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

dag at cray.com

2014-Oct-24 16:52 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Hal Finkel <hfinkel at anl.gov> writes:
> I'm fully aware of this issue. This needs to be weighed against the
> cost of updating all other optimizations that operate on loads to also
> understand this intrinsic.
In my experience, LLVM's behavior of treating unknwon intrinsics
conservatively works just fine.
> If this were really a question of safety, I'd agree. And if we were
> talking about gather loads, I'd agree. For a regular vector loads, I
> don't see this as a safety issue.
It absolutely is a safety issue.  Not only could loop control flow cause
some vector elements to be skipped that would otherwise trap if loaded,
there are some vector optimizations that assume masking behavior will
handle overindexing and other such problems.

Masking is an extremely powerful concept and the sooner LLVM understands
it, the better.

                           -David

llvm dev - Oct 2014 - [LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics