thr3ads.net - llvm dev - [LLVMdev] SIMD Projects with LLVM [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Rob Cameron

2014-Apr-03 01:52 UTC

[LLVMdev] SIMD Projects with LLVM

Hi everyone.   After lurking for a while, this is my
first post to the list.

I am working with some graduate students on the general
topic of compiler support for SIMD programming and specific
projects related to LLVM and my own Parabix technology
(parabix.costar.sfu.ca).

Right now we have a few course projects on the go and
already a question arising out of one of them (SSE2 Hoisting).
We're not sure how much has been tried before, or even
makes sense, but we're eager to learn.

Briefly the projects are:

SSE2 Hoisting: translating programs that directly use SSE2
intrinsics into platform-independent code expressed with LLVM IR.

Long integer support: systematic support for i128, i256, ... targetting
SIMD registers.  

Systematic strategies for the shufflevector operation.   This
is a very powerful operation that can be used to code for arbitrary
rearrangement of data in SIMD registers.   No architecture we
know of supports it in its full generaility.   But there are
many special cases that are recognized in code generation and
potentially many more that might be.

Systematic support for all power-of-2 field widths with
vector types.   For example, we are interested in <64 x i2> being
a legal type with appropriate expansion operations.   A student
has made a GSoC submission for this project.

The question I have right now actually relates to the i2 type.
In our SSE2 hoisting, we found an issue with the movemask_pd
operation, which extracts the sign bits of the 2 doubles in
a <2 x double> and returns them as an int32.   We would
like to use the icmp slt as the LLVM IR operation for this,
but have a problem when we bitcast the <2 x i1> vector to i2,
it seems.  We use the following LLVM IR code.

define i32 @signmaskd(<2 x double> %a) alwaysinline #5
{
        %bits = bitcast <2 x double> %a to <2 x i64>
        %b = icmp slt <2 x i64> %bits, zeroinitializer
        %c = bitcast <2 x i1> %b to i2
        %result = zext i2 %c to i32
        ret i32 %result
}

Unfortunately, we only get 1 bit of data out; the assembly language
output seems to confirm that the individual bit extractions take
place, but the second one clobbers the first.   We are using the 3.4
tool chain.

There is more detail at the following URL.
http://parabix.costar.sfu.ca/wiki/I2Result

Anyway the question is whether we should just try to treat
this as a bug to be fixed or whether our idea of working with
i2 types is misguided in a more fundamental way.

Nadav Rotem

2014-Apr-03 05:04 UTC

head link

[LLVMdev] SIMD Projects with LLVM

Hi Rob, 

This is a codegen bug. At the moment we don’t support bitcasting or
storing/loading memory of 'illegal' vector element types that are
smaller than i8.

Thanks,
Nadav

On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote:
> Hi everyone.   After lurking for a while, this is my
> first post to the list.
> 
> I am working with some graduate students on the general
> topic of compiler support for SIMD programming and specific
> projects related to LLVM and my own Parabix technology
> (parabix.costar.sfu.ca).
> 
> Right now we have a few course projects on the go and
> already a question arising out of one of them (SSE2 Hoisting).
> We're not sure how much has been tried before, or even
> makes sense, but we're eager to learn.
> 
> Briefly the projects are:
> 
> SSE2 Hoisting: translating programs that directly use SSE2
> intrinsics into platform-independent code expressed with LLVM IR.
> 
> Long integer support: systematic support for i128, i256, ... targetting
> SIMD registers.  
> 
> Systematic strategies for the shufflevector operation.   This
> is a very powerful operation that can be used to code for arbitrary
> rearrangement of data in SIMD registers.   No architecture we
> know of supports it in its full generaility.   But there are
> many special cases that are recognized in code generation and
> potentially many more that might be.
> 
> Systematic support for all power-of-2 field widths with
> vector types.   For example, we are interested in <64 x i2> being
> a legal type with appropriate expansion operations.   A student
> has made a GSoC submission for this project.
> 
> The question I have right now actually relates to the i2 type.
> In our SSE2 hoisting, we found an issue with the movemask_pd
> operation, which extracts the sign bits of the 2 doubles in
> a <2 x double> and returns them as an int32.   We would
> like to use the icmp slt as the LLVM IR operation for this,
> but have a problem when we bitcast the <2 x i1> vector to i2,
> it seems.  We use the following LLVM IR code.
> 
> define i32 @signmaskd(<2 x double> %a) alwaysinline #5
> {
>        %bits = bitcast <2 x double> %a to <2 x i64>
>        %b = icmp slt <2 x i64> %bits, zeroinitializer
>        %c = bitcast <2 x i1> %b to i2
>        %result = zext i2 %c to i32
>        ret i32 %result
> }
> 
> Unfortunately, we only get 1 bit of data out; the assembly language
> output seems to confirm that the individual bit extractions take
> place, but the second one clobbers the first.   We are using the 3.4
> tool chain.
> 
> There is more detail at the following URL.
> http://parabix.costar.sfu.ca/wiki/I2Result
> 
> Anyway the question is whether we should just try to treat
> this as a bug to be fixed or whether our idea of working with
> i2 types is misguided in a more fundamental way.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Dmitry Babokin

2014-Apr-03 11:28 UTC

head link

[LLVMdev] SIMD Projects with LLVM

Rob,

If you care about SIMD performance, I advice you generating LLVM IR, which
has obvious mapping to particular instruction set. I.e. what is i2, what is
vector of i2 and how this should be mapped to instruction set - all this is
a mystery for me. You should extend i2 to i8/i16/i32 depending on the
desired code generation. In ISPC project we map vector of booleans to
vector of i32 typically, when targeting SSE2/4 and AVX1/2 instruction set,
for example.

Also representing SIMD registers as i128/i256/i512 is bad idea IMO, as you
are not able to perform arithmetic operation on it. Vector representation
is much better idea and you can freely cast between vectors, when you need
to switch from one element type to another (from <i32 x 4> to <f32 x
4>,
for example).

-Dmitry.



On Thu, Apr 3, 2014 at 9:04 AM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Rob,
>
> This is a codegen bug. At the moment we don't support bitcasting or
> storing/loading memory of 'illegal' vector element types that are
smaller
> than i8.
>
> Thanks,
> Nadav
>
> On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote:
>
> > Hi everyone.   After lurking for a while, this is my
> > first post to the list.
> >
> > I am working with some graduate students on the general
> > topic of compiler support for SIMD programming and specific
> > projects related to LLVM and my own Parabix technology
> > (parabix.costar.sfu.ca).
> >
> > Right now we have a few course projects on the go and
> > already a question arising out of one of them (SSE2 Hoisting).
> > We're not sure how much has been tried before, or even
> > makes sense, but we're eager to learn.
> >
> > Briefly the projects are:
> >
> > SSE2 Hoisting: translating programs that directly use SSE2
> > intrinsics into platform-independent code expressed with LLVM IR.
> >
> > Long integer support: systematic support for i128, i256, ...
targetting
> > SIMD registers.
> >
> > Systematic strategies for the shufflevector operation.   This
> > is a very powerful operation that can be used to code for arbitrary
> > rearrangement of data in SIMD registers.   No architecture we
> > know of supports it in its full generaility.   But there are
> > many special cases that are recognized in code generation and
> > potentially many more that might be.
> >
> > Systematic support for all power-of-2 field widths with
> > vector types.   For example, we are interested in <64 x i2>
being
> > a legal type with appropriate expansion operations.   A student
> > has made a GSoC submission for this project.
> >
> > The question I have right now actually relates to the i2 type.
> > In our SSE2 hoisting, we found an issue with the movemask_pd
> > operation, which extracts the sign bits of the 2 doubles in
> > a <2 x double> and returns them as an int32.   We would
> > like to use the icmp slt as the LLVM IR operation for this,
> > but have a problem when we bitcast the <2 x i1> vector to i2,
> > it seems.  We use the following LLVM IR code.
> >
> > define i32 @signmaskd(<2 x double> %a) alwaysinline #5
> > {
> >        %bits = bitcast <2 x double> %a to <2 x i64>
> >        %b = icmp slt <2 x i64> %bits, zeroinitializer
> >        %c = bitcast <2 x i1> %b to i2
> >        %result = zext i2 %c to i32
> >        ret i32 %result
> > }
> >
> > Unfortunately, we only get 1 bit of data out; the assembly language
> > output seems to confirm that the individual bit extractions take
> > place, but the second one clobbers the first.   We are using the 3.4
> > tool chain.
> >
> > There is more detail at the following URL.
> > http://parabix.costar.sfu.ca/wiki/I2Result
> >
> > Anyway the question is whether we should just try to treat
> > this as a bug to be fixed or whether our idea of working with
> > i2 types is misguided in a more fundamental way.
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140403/2c07c0b8/attachment.html>

Rob Cameron

2014-Apr-08 00:54 UTC

head link

[LLVMdev] SIMD Projects with LLVM

Hi, Nadav.

Thanks for this.  I am now wondering about the
treatment of i1 and <N x i1>.   These are both
fundamental types for LLVM primitives like icmp.  

My understanding right now is that i1 is probably
dealt with by "promoting" i1 to i8 in the type
legalizer phase.  I haven't located where in the codebase
this happens, yet, though.

Another option that occurs to me is to assert that
the types i1 and <N x i1> are legal, and require all 
the i1 operations to be implemented.   Certainly, this
would be straightforward for all the binary arithmetic
and bitwise logic operations.   But perhaps it is
load/store where it becomes problematic.    I am 
wondering if there exists any discussion of this design
possibility that I might read.

For i2 and <N x i2> it should probably be treated 
similarly (but not identically) to i1.  I am guessing that
the codegen bug you mention is related to the promotion
of i2 to i8 in type legalization; I can certainly imagine 
that this code has not been so thoroughly evaluated. 
  

----- Original Message -----> Hi Rob,
> 
> This is a codegen bug. At the moment we don’t support bitcasting or
> storing/loading memory of 'illegal' vector element types that are
> smaller than i8.
> 
> Thanks,
> Nadav
> 
> On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote:
> 
> > Hi everyone.   After lurking for a while, this is my
> > first post to the list.
> > 
> > I am working with some graduate students on the general
> > topic of compiler support for SIMD programming and specific
> > projects related to LLVM and my own Parabix technology
> > (parabix.costar.sfu.ca).
> > 
> > Right now we have a few course projects on the go and
> > already a question arising out of one of them (SSE2 Hoisting).
> > We're not sure how much has been tried before, or even
> > makes sense, but we're eager to learn.
> > 
> > Briefly the projects are:
> > 
> > SSE2 Hoisting: translating programs that directly use SSE2
> > intrinsics into platform-independent code expressed with LLVM IR.
> > 
> > Long integer support: systematic support for i128, i256, ...
> > targetting
> > SIMD registers.
> > 
> > Systematic strategies for the shufflevector operation.   This
> > is a very powerful operation that can be used to code for arbitrary
> > rearrangement of data in SIMD registers.   No architecture we
> > know of supports it in its full generaility.   But there are
> > many special cases that are recognized in code generation and
> > potentially many more that might be.
> > 
> > Systematic support for all power-of-2 field widths with
> > vector types.   For example, we are interested in <64 x i2>
being
> > a legal type with appropriate expansion operations.   A student
> > has made a GSoC submission for this project.
> > 
> > The question I have right now actually relates to the i2 type.
> > In our SSE2 hoisting, we found an issue with the movemask_pd
> > operation, which extracts the sign bits of the 2 doubles in
> > a <2 x double> and returns them as an int32.   We would
> > like to use the icmp slt as the LLVM IR operation for this,
> > but have a problem when we bitcast the <2 x i1> vector to i2,
> > it seems.  We use the following LLVM IR code.
> > 
> > define i32 @signmaskd(<2 x double> %a) alwaysinline #5
> > {
> >        %bits = bitcast <2 x double> %a to <2 x i64>
> >        %b = icmp slt <2 x i64> %bits, zeroinitializer
> >        %c = bitcast <2 x i1> %b to i2
> >        %result = zext i2 %c to i32
> >        ret i32 %result
> > }
> > 
> > Unfortunately, we only get 1 bit of data out; the assembly language
> > output seems to confirm that the individual bit extractions take
> > place, but the second one clobbers the first.   We are using the
> > 3.4
> > tool chain.
> > 
> > There is more detail at the following URL.
> > http://parabix.costar.sfu.ca/wiki/I2Result
> > 
> > Anyway the question is whether we should just try to treat
> > this as a bug to be fixed or whether our idea of working with
> > i2 types is misguided in a more fundamental way.
> > 
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Apr 2014 - [LLVMdev] SIMD Projects with LLVM

[LLVMdev] SIMD Projects with LLVM

[LLVMdev] SIMD Projects with LLVM

[LLVMdev] SIMD Projects with LLVM

[LLVMdev] SIMD Projects with LLVM

Apparently Analagous Threads