thr3ads.net - llvm dev - [LLVMdev] Indexed Load and Store Intrinsics

If this information is useful, please help other people find it:
Share via:

Philip Reames

2014-Dec-21 18:24 UTC

[LLVMdev] Indexed Load and Store Intrinsics - proposal

On 12/18/2014 11:56 AM, dag at cray.com wrote:> "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
writes:
>
>> Semantics:
>> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale)
>> = VectorValue[i];}
>> VectorValue: any float or integer vector type.
>> BaseAddr: a pointer; may be zero if full address is placed in the
>> index.
>> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer
>> values.
> What about the case of a gather/scatter where the BaseAddr is zero and
> the indices are pointers?  Must we do a ptrtoint?  llvm.org is down at
> the moment but I don't think we currently have a vector ptrtoint.I would be opposed to any representation which required the introduction 
of ptrtoint casts by the vectorizer.  If it were the only option 
available, I could be argued around, but I think we should try to avoid 
this.

More generally, I'm somewhat hesitant of representing a scatter with 
explicit base and offsets at all.  Why shouldn't the IR representation 
simply be a load from a vector of arbitrary pointers?  The backend can 
pattern match the actual gather instructions it supports and scalarize 
the rest.  The proposal being made seems very specific to the current 
generation of x86 hardware.

p.s. Where is the documentation for the existing mask load intrinsics?  
I can't find it with a quick search through the LangRef.

Philip

Zaks, Ayal

2014-Dec-22 14:05 UTC

head link

[LLVMdev] Indexed Load and Store Intrinsics - proposal

> Why shouldn't the IR representation simply be a load from a vector of
arbitrary pointers?
Such a load could indeed serve as a general form of a gather or scatter. As
Elena responded, we can propose two distinct intrinsics: one with a vector of
pointers, and another with (non-zero) base, a vector of indices, and a scale
implicitly inferred from the element type.

The motivation for the latter stems from vectorizing a load or store to
"b[i]", where b is invariant. Broadcasting b and using a vector gep to
feed a vector of pointers, to be pattern matched and folded later, may work. The
alternative intrinsic proposed keeps b scalar and uses a vector of indices for
i. In any case, it's important to recognize such common patterns, at-least
for x86, so could deserve an x86 intrinsic. But it's a general pattern that
could potentially serve other implementations; any other gathers to consider
atm?

Documentation indeed needs to be provided.

Ayal.

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Philip Reames
Sent: Sunday, December 21, 2014 20:25
To: dag at cray.com; Demikhovsky, Elena
Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal

On 12/18/2014 11:56 AM, dag at cray.com wrote:> "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
writes:
>
>> Semantics:
>> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale) 
>> = VectorValue[i];}
>> VectorValue: any float or integer vector type.
>> BaseAddr: a pointer; may be zero if full address is placed in the 
>> index.
>> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer 
>> values.
> What about the case of a gather/scatter where the BaseAddr is zero and 
> the indices are pointers?  Must we do a ptrtoint?  llvm.org is down at 
> the moment but I don't think we currently have a vector ptrtoint.I would be opposed to any representation which required the introduction of
ptrtoint casts by the vectorizer.  If it were the only option available, I could
be argued around, but I think we should try to avoid this.

More generally, I'm somewhat hesitant of representing a scatter with
explicit base and offsets at all.  Why shouldn't the IR representation
simply be a load from a vector of arbitrary pointers?  The backend can pattern
match the actual gather instructions it supports and scalarize the rest.  The
proposal being made seems very specific to the current generation of x86
hardware.

p.s. Where is the documentation for the existing mask load intrinsics?  
I can't find it with a quick search through the LangRef.

Philip

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Hal Finkel

2014-Dec-24 01:16 UTC

head link

[LLVMdev] Indexed Load and Store Intrinsics - proposal

----- Original Message -----> From: "Ayal Zaks" <ayal.zaks at intel.com>
> To: "Philip Reames" <listmail at philipreames.com>, dag at
cray.com, "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> Cc: "Robert Khasanov" <robert.khasanov at intel.com>,
llvmdev at cs.uiuc.edu
> Sent: Monday, December 22, 2014 8:05:43 AM
> Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal
> 
> > Why shouldn't the IR representation simply be a load from a vector
> > of arbitrary pointers?
> 
> Such a load could indeed serve as a general form of a gather or
> scatter. As Elena responded, we can propose two distinct intrinsics:
> one with a vector of pointers, and another with (non-zero) base, a
> vector of indices, and a scale implicitly inferred from the element
> type.
> 
> The motivation for the latter stems from vectorizing a load or store
> to "b[i]", where b is invariant. Broadcasting b and using a
vector
> gep to feed a vector of pointers, to be pattern matched and folded
> later, may work.
I would like you to explore this direction, where we use a vector GEP and the
intrinsic simply takes a vector of pointers. The backend should pattern-match
this as appropriate. I see no reason why we can't make this work, especially
because we don't have any real uses of vector GEPs now, so we can *define*
the canonical optimized form of them to be conducive to the kind of pattern
matching we'd like to perform in the backends.

This, I imagine, will require some additional infrastructure work. Currently,
GEPs, including vector GEPs, are expanded very early during SDAG building, and
the form produced may not be appropriate for reliable pattern matching during
later lowering phases. The way this is done is not set in stone, however, and we
can certainly change it (including via the introduction of new SDAG nodes) to
keep the necessary information together in compact form.

Thanks again,
Hal
> The alternative intrinsic proposed keeps b scalar
> and uses a vector of indices for i. In any case, it's important to
> recognize such common patterns, at-least for x86, so could deserve
> an x86 intrinsic. But it's a general pattern that could potentially
> serve other implementations; any other gathers to consider atm?
> 
> Documentation indeed needs to be provided.
> 
> Ayal.
> 
> 
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu
> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Philip Reames
> Sent: Sunday, December 21, 2014 20:25
> To: dag at cray.com; Demikhovsky, Elena
> Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal
> 
> 
> On 12/18/2014 11:56 AM, dag at cray.com wrote:
> > "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
writes:
> >
> >> Semantics:
> >> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr +
> >> VectorOfIndices[i]*Scale)
> >> = VectorValue[i];}
> >> VectorValue: any float or integer vector type.
> >> BaseAddr: a pointer; may be zero if full address is placed in the
> >> index.
> >> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer
> >> values.
> > What about the case of a gather/scatter where the BaseAddr is zero
> > and
> > the indices are pointers?  Must we do a ptrtoint?  llvm.org is down
> > at
> > the moment but I don't think we currently have a vector ptrtoint.
> I would be opposed to any representation which required the
> introduction of ptrtoint casts by the vectorizer.  If it were the
> only option available, I could be argued around, but I think we
> should try to avoid this.
> 
> More generally, I'm somewhat hesitant of representing a scatter with
> explicit base and offsets at all.  Why shouldn't the IR
> representation simply be a load from a vector of arbitrary pointers?
>  The backend can pattern match the actual gather instructions it
> supports and scalarize the rest.  The proposal being made seems very
> specific to the current generation of x86 hardware.
> 
> p.s. Where is the documentation for the existing mask load
> intrinsics?
> I can't find it with a quick search through the LangRef.
> 
> Philip
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Pierre-Andre Saulais

2015-Jan-05 16:13 UTC

head link

[LLVMdev] Indexed Load and Store Intrinsics - proposal

On 22/12/14 14:05, Zaks, Ayal wrote:>> Why shouldn't the IR representation simply be a load from a vector
of arbitrary pointers?
> Such a load could indeed serve as a general form of a gather or scatter. As
Elena responded, we can propose two distinct intrinsics: one with a vector of
pointers, and another with (non-zero) base, a vector of indices, and a scale
implicitly inferred from the element type.
>
> The motivation for the latter stems from vectorizing a load or store to
"b[i]", where b is invariant. Broadcasting b and using a vector gep to
feed a vector of pointers, to be pattern matched and folded later, may work. The
alternative intrinsic proposed keeps b scalar and uses a vector of indices for
i. In any case, it's important to recognize such common patterns, at-least
for x86, so could deserve an x86 intrinsic. But it's a general pattern that
could potentially serve other implementations; any other gathers to consider
atm?I think ARM supports a very limited form of gather/scatter through 
VLD2/3/4 and VST2/3/4 interleaved load and stores instructions.

For example, this instruction performs one gather load to d0, and 
another one to d1:

VLD2.8 {d0, d1}, [r0]

Using the proposed 'vector index' intrinsics, this would translate to IR
like:

%scale = i32 1
%indices0 = <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 
12, i32 14>
%indices1 = <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 
13, i32 15>
%d0 = @llvm.uindex.load(i8* %addr.r0, <8 x i32> %indices0, i32 %scale) ; 
<8 x i8>
%d1 = @llvm.uindex.load(i8* %addr.r0, <8 x i32> %indices1, i32 %scale) ; 
<8 x i8>

Same for the 3-element variant:

VLD3.8 {d0, d1, d2}, [r0]

%scale = i32 1
%indices0 = <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 
18, i32 21>
%indices1 = <8 x i32> <i32 1, i32 4, i32 7, i32 10, i32 13, i32 16, i32
19, i32 22>
%indices2 = <8 x i32> <i32 2, i32 5, i32 8, i32 11, i32 14, i32 17, i32
20, i32 23>
%d0 = @llvm.uindex.load(i8* %addr.r0, <8 x i32> %indices0, i32 %scale) ; 
<8 x i8>
%d1 = @llvm.uindex.load(i8* %addr.r0, <8 x i32> %indices1, i32 %scale) ; 
<8 x i8>
%d2 = @llvm.uindex.load(i8* %addr.r0, <8 x i32> %indices2, i32 %scale) ; 
<8 x i8>

This pattern comes up with code that converts data from AoS to SoA, for 
example when doing whole-function vectorization (e.g. if b is an array 
of vectors, due to scalarization).

It is quite limited tough (sequential indices, fixed scale), and 
probably more difficult to match than single intrinsics.

Pierre-Andre
>
> Documentation indeed needs to be provided.
>
> Ayal.
>
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On Behalf Of Philip Reames
> Sent: Sunday, December 21, 2014 20:25
> To: dag at cray.com; Demikhovsky, Elena
> Cc: Khasanov, Robert; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Indexed Load and Store Intrinsics - proposal
>
>
> On 12/18/2014 11:56 AM, dag at cray.com wrote:
>> "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
writes:
>>
>>> Semantics:
>>> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr +
VectorOfIndices[i]*Scale)
>>> = VectorValue[i];}
>>> VectorValue: any float or integer vector type.
>>> BaseAddr: a pointer; may be zero if full address is placed in the
>>> index.
>>> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer
>>> values.
>> What about the case of a gather/scatter where the BaseAddr is zero and
>> the indices are pointers?  Must we do a ptrtoint?  llvm.org is down at
>> the moment but I don't think we currently have a vector ptrtoint.
> I would be opposed to any representation which required the introduction of
ptrtoint casts by the vectorizer.  If it were the only option available, I could
be argued around, but I think we should try to avoid this.
>
> More generally, I'm somewhat hesitant of representing a scatter with
explicit base and offsets at all.  Why shouldn't the IR representation
simply be a load from a vector of arbitrary pointers?  The backend can pattern
match the actual gather instructions it supports and scalarize the rest.  The
proposal being made seems very specific to the current generation of x86
hardware.
>
> p.s. Where is the documentation for the existing mask load intrinsics?
> I can't find it with a quick search through the LangRef.
>
> Philip
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Pierre-Andre Saulais
Compiler Developer
Codeplay Software Ltd
45 York Place, Edinburgh, EH1 3HP
Tel: 0131 466 0503
Fax: 0131 557 6600
Website: http://www.codeplay.com
Twitter: https://twitter.com/codeplaysoft

This email and any attachments may contain confidential and /or privileged
information and is for use by the addressee only. If you are not the intended
recipient, please notify Codeplay Software Ltd immediately and delete the
message from your computer. You may not copy or forward it,or use or disclose
its contents to any other person. Any views or other information in this message
which do not relate to our business are not authorized by Codeplay software Ltd,
nor does this message form part of any contract unless so stated.
As internet communications are capable of data corruption Codeplay Software Ltd
does not accept any responsibility for any changes made to this message after it
was sent. Please note that Codeplay Software Ltd does not accept any liability
or responsibility for viruses and it is your responsibility to scan any
attachments.
Company registered in England and Wales, number: 04567874
Registered office: 81 Linkfield Street, Redhill RH1 6BY

llvm dev - Dec 2014 - [LLVMdev] Indexed Load and Store Intrinsics - proposal

[LLVMdev] Indexed Load and Store Intrinsics - proposal

[LLVMdev] Indexed Load and Store Intrinsics - proposal

[LLVMdev] Indexed Load and Store Intrinsics - proposal

[LLVMdev] Indexed Load and Store Intrinsics - proposal