thr3ads.net - llvm dev - [LLVMdev] Adding masked vector load and store intrinsics [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2014-Oct-24 14:28 UTC

[LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov>
> To: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> Cc: dag at cray.com, llvmdev at cs.uiuc.edu
> Sent: Friday, October 24, 2014 8:39:56 AM
> Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
> 
> ----- Original Message -----
> > From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: dag at cray.com, llvmdev at cs.uiuc.edu, "Ayal Zaks"
> > <ayal.zaks at intel.com>
> > Sent: Friday, October 24, 2014 8:07:18 AM
> > Subject: RE: [LLVMdev] Adding masked vector load and store
> > intrinsics
> > 
> > > For the loads, I'm must less sure. Why can't we represent
the
> > > loads
> > > as select(mask, load(addr), passthru)? It is true, that the load
> > > might get separated from the select so that isel might not see it
> > > (because isel if basic-block local), but we can add some code in
> > > CodeGenPrep to fix that for targets on which it is useful to do
> > > so
> > > (which is a more-general solution than the intrinsic anyhow).
> > > What
> > > do you think?
> > 
> > We generate the vector-masked-intrinsic on IR-to-IR pass. It is too
> > far from instruction selection. We'll need to guarantee that all
> > subsequent IR-to-IR passes will not break the sequence.
> 
> I'm fully aware of this issue. This needs to be weighed against the
> cost of updating all other optimizations that operate on loads to
> also understand this intrinsic.
> 
> > And only for
> > one or two specific targets.
> 
> Regardless, they're certainly targets many users care about ;)
> 
> > Then we'll keep the logic in type
> > legalizer, which may split or extend operations. Then we are taking
> > care in DAG-combine.
> > In my opinion, this is just unsafe.
> 
> If this were really a question of safety, I'd agree. And if we were
> talking about gather loads, I'd agree. For a regular vector loads, I
> don't see this as a safety issue. We should outline what the
> downside of emitting a regular load would actually be should some
> optimization be done to the select. Can you please elaborate on
> this?
Nevermind ;) -- I changed my mind, the safety issue is with non-aligned loads
that might cross page boundaries. Is that right? If so, I think this proposal is
good (although obviously the docs need to make clear what the faulting behavior
of these intrinsics is).

Thanks again,
Hal
> 
> Thanks again,
> Hal
> 
> > 
> > -  Elena
> > 
> > 
> > -----Original Message-----
> > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > Sent: Friday, October 24, 2014 15:50
> > To: Demikhovsky, Elena
> > Cc: dag at cray.com; llvmdev at cs.uiuc.edu
> > Subject: Re: [LLVMdev] Adding masked vector load and store
> > intrinsics
> > 
> > ----- Original Message -----
> > > From: "Elena Demikhovsky" <elena.demikhovsky at
intel.com>
> > > To: llvmdev at cs.uiuc.edu
> > > Cc: dag at cray.com
> > > Sent: Friday, October 24, 2014 6:24:15 AM
> > > Subject: [LLVMdev] Adding masked vector load and store intrinsics
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > We would like to add support for masked vector loads and stores
> > > by
> > > introducing new target-independent intrinsics. The loop
> > > vectorizer
> > > will then be enhanced to optimize loops containing conditional
> > > memory
> > > accesses by generating these intrinsics for existing targets such
> > > as
> > > AVX2 and AVX-512. The vectorizer will first ask the target about
> > > availability of masked vector loads and stores. The SLP
> > > vectorizer
> > > can
> > > potentially be enhanced to use these intrinsics as well.
> > > 
> > > The intrinsics would be legal for all targets; targets that do
> > > not
> > > support masked vector loads or stores will scalarize them.
> > > The addressed memory will not be touched for masked-off lanes. In
> > > particular, if all lanes are masked off no address will be
> > > accessed.
> > > 
> > > call void @llvm.masked.store (i32* %addr, <16 x i32> %data,
i32
> > > 4,
> > > <16 x i1> %mask)
> > > 
> > > %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8
x i32>
> > > %passthru, i32 4, <8 x i1> %mask)
> > > 
> > > where %passthru is used to fill the elements of %data that are
> > > masked-off (if any; can be zeroinitializer or undef).
> > > 
> > > Comments so far, before we dive into more details?
> > 
> > For the stores, I think this is a reasonable idea. The alternative
> > is
> > to represent them in scalar form with a lot of control flow, and I
> > think that expecting the backend to properly pattern match that
> > after isel is not realistic.
> > 
> > For the loads, I'm must less sure. Why can't we represent the
loads
> > as select(mask, load(addr), passthru)? It is true, that the load
> > might get separated from the select so that isel might not see it
> > (because isel if basic-block local), but we can add some code in
> > CodeGenPrep to fix that for targets on which it is useful to do so
> > (which is a more-general solution than the intrinsic anyhow). What
> > do you think?
> > 
> > Thanks again,
> > Hal
> > 
> > > 
> > > Thank you.
> > > 
> > > - Elena and Ayal
> > > 
> > > 
> > > 
> > >
---------------------------------------------------------------------
> > > Intel Israel (74) Limited
> > > 
> > > This e-mail and any attachments may contain confidential material
> > > for
> > > the sole use of the intended recipient(s). Any review or
> > > distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> > 
> > This e-mail and any attachments may contain confidential material
> > for
> > the sole use of the intended recipient(s). Any review or
> > distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

dag at cray.com

2014-Oct-24 16:56 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

Hal Finkel <hfinkel at anl.gov> writes:
>> If this were really a question of safety, I'd agree. And if we were
>> talking about gather loads, I'd agree. For a regular vector loads,
I
>> don't see this as a safety issue. We should outline what the
>> downside of emitting a regular load would actually be should some
>> optimization be done to the select. Can you please elaborate on
>> this?
>
> Nevermind ;) -- I changed my mind, the safety issue is with
> non-aligned loads that might cross page boundaries. Is that right?
That's just one safety issue.  There are others.
> If so, I think this proposal is good (although obviously the docs need
> to make clear what the faulting behavior of these intrinsics is).
The behavior should be not to ever fault on an element whose mask bit is
false, and behave as a regular load (wrt trapping) for any element whose
mask bit is true.

                              -David

Hal Finkel

2014-Oct-24 16:58 UTC

head link

[LLVMdev] Adding masked vector load and store intrinsics

----- Original Message -----> From: dag at cray.com
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Elena Demikhovsky" <elena.demikhovsky at intel.com>,
llvmdev at cs.uiuc.edu
> Sent: Friday, October 24, 2014 11:56:14 AM
> Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics
> 
> Hal Finkel <hfinkel at anl.gov> writes:
> 
> >> If this were really a question of safety, I'd agree. And if we
> >> were
> >> talking about gather loads, I'd agree. For a regular vector
loads,
> >> I
> >> don't see this as a safety issue. We should outline what the
> >> downside of emitting a regular load would actually be should some
> >> optimization be done to the select. Can you please elaborate on
> >> this?
> >
> > Nevermind ;) -- I changed my mind, the safety issue is with
> > non-aligned loads that might cross page boundaries. Is that right?
> 
> That's just one safety issue.  There are others.
Can you be more specific? You mentioned overindexing in your other e-mail,
exactly what do you mean by that?

Thanks again,
Hal
> 
> > If so, I think this proposal is good (although obviously the docs
> > need
> > to make clear what the faulting behavior of these intrinsics is).
> 
> The behavior should be not to ever fault on an element whose mask bit
> is
> false, and behave as a regular load (wrt trapping) for any element
> whose
> mask bit is true.
> 
>                               -David
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Oct 2014 - [LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics

[LLVMdev] Adding masked vector load and store intrinsics