Hi everyone. After lurking for a while, this is my first post to the list. I am working with some graduate students on the general topic of compiler support for SIMD programming and specific projects related to LLVM and my own Parabix technology (parabix.costar.sfu.ca). Right now we have a few course projects on the go and already a question arising out of one of them (SSE2 Hoisting). We're not sure how much has been tried before, or even makes sense, but we're eager to learn. Briefly the projects are: SSE2 Hoisting: translating programs that directly use SSE2 intrinsics into platform-independent code expressed with LLVM IR. Long integer support: systematic support for i128, i256, ... targetting SIMD registers. Systematic strategies for the shufflevector operation. This is a very powerful operation that can be used to code for arbitrary rearrangement of data in SIMD registers. No architecture we know of supports it in its full generaility. But there are many special cases that are recognized in code generation and potentially many more that might be. Systematic support for all power-of-2 field widths with vector types. For example, we are interested in <64 x i2> being a legal type with appropriate expansion operations. A student has made a GSoC submission for this project. The question I have right now actually relates to the i2 type. In our SSE2 hoisting, we found an issue with the movemask_pd operation, which extracts the sign bits of the 2 doubles in a <2 x double> and returns them as an int32. We would like to use the icmp slt as the LLVM IR operation for this, but have a problem when we bitcast the <2 x i1> vector to i2, it seems. We use the following LLVM IR code. define i32 @signmaskd(<2 x double> %a) alwaysinline #5 { %bits = bitcast <2 x double> %a to <2 x i64> %b = icmp slt <2 x i64> %bits, zeroinitializer %c = bitcast <2 x i1> %b to i2 %result = zext i2 %c to i32 ret i32 %result } Unfortunately, we only get 1 bit of data out; the assembly language output seems to confirm that the individual bit extractions take place, but the second one clobbers the first. We are using the 3.4 tool chain. There is more detail at the following URL. http://parabix.costar.sfu.ca/wiki/I2Result Anyway the question is whether we should just try to treat this as a bug to be fixed or whether our idea of working with i2 types is misguided in a more fundamental way.
Hi Rob, This is a codegen bug. At the moment we don’t support bitcasting or storing/loading memory of 'illegal' vector element types that are smaller than i8. Thanks, Nadav On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote:> Hi everyone. After lurking for a while, this is my > first post to the list. > > I am working with some graduate students on the general > topic of compiler support for SIMD programming and specific > projects related to LLVM and my own Parabix technology > (parabix.costar.sfu.ca). > > Right now we have a few course projects on the go and > already a question arising out of one of them (SSE2 Hoisting). > We're not sure how much has been tried before, or even > makes sense, but we're eager to learn. > > Briefly the projects are: > > SSE2 Hoisting: translating programs that directly use SSE2 > intrinsics into platform-independent code expressed with LLVM IR. > > Long integer support: systematic support for i128, i256, ... targetting > SIMD registers. > > Systematic strategies for the shufflevector operation. This > is a very powerful operation that can be used to code for arbitrary > rearrangement of data in SIMD registers. No architecture we > know of supports it in its full generaility. But there are > many special cases that are recognized in code generation and > potentially many more that might be. > > Systematic support for all power-of-2 field widths with > vector types. For example, we are interested in <64 x i2> being > a legal type with appropriate expansion operations. A student > has made a GSoC submission for this project. > > The question I have right now actually relates to the i2 type. > In our SSE2 hoisting, we found an issue with the movemask_pd > operation, which extracts the sign bits of the 2 doubles in > a <2 x double> and returns them as an int32. We would > like to use the icmp slt as the LLVM IR operation for this, > but have a problem when we bitcast the <2 x i1> vector to i2, > it seems. We use the following LLVM IR code. > > define i32 @signmaskd(<2 x double> %a) alwaysinline #5 > { > %bits = bitcast <2 x double> %a to <2 x i64> > %b = icmp slt <2 x i64> %bits, zeroinitializer > %c = bitcast <2 x i1> %b to i2 > %result = zext i2 %c to i32 > ret i32 %result > } > > Unfortunately, we only get 1 bit of data out; the assembly language > output seems to confirm that the individual bit extractions take > place, but the second one clobbers the first. We are using the 3.4 > tool chain. > > There is more detail at the following URL. > http://parabix.costar.sfu.ca/wiki/I2Result > > Anyway the question is whether we should just try to treat > this as a bug to be fixed or whether our idea of working with > i2 types is misguided in a more fundamental way. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Rob, If you care about SIMD performance, I advice you generating LLVM IR, which has obvious mapping to particular instruction set. I.e. what is i2, what is vector of i2 and how this should be mapped to instruction set - all this is a mystery for me. You should extend i2 to i8/i16/i32 depending on the desired code generation. In ISPC project we map vector of booleans to vector of i32 typically, when targeting SSE2/4 and AVX1/2 instruction set, for example. Also representing SIMD registers as i128/i256/i512 is bad idea IMO, as you are not able to perform arithmetic operation on it. Vector representation is much better idea and you can freely cast between vectors, when you need to switch from one element type to another (from <i32 x 4> to <f32 x 4>, for example). -Dmitry. On Thu, Apr 3, 2014 at 9:04 AM, Nadav Rotem <nrotem at apple.com> wrote:> Hi Rob, > > This is a codegen bug. At the moment we don't support bitcasting or > storing/loading memory of 'illegal' vector element types that are smaller > than i8. > > Thanks, > Nadav > > On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote: > > > Hi everyone. After lurking for a while, this is my > > first post to the list. > > > > I am working with some graduate students on the general > > topic of compiler support for SIMD programming and specific > > projects related to LLVM and my own Parabix technology > > (parabix.costar.sfu.ca). > > > > Right now we have a few course projects on the go and > > already a question arising out of one of them (SSE2 Hoisting). > > We're not sure how much has been tried before, or even > > makes sense, but we're eager to learn. > > > > Briefly the projects are: > > > > SSE2 Hoisting: translating programs that directly use SSE2 > > intrinsics into platform-independent code expressed with LLVM IR. > > > > Long integer support: systematic support for i128, i256, ... targetting > > SIMD registers. > > > > Systematic strategies for the shufflevector operation. This > > is a very powerful operation that can be used to code for arbitrary > > rearrangement of data in SIMD registers. No architecture we > > know of supports it in its full generaility. But there are > > many special cases that are recognized in code generation and > > potentially many more that might be. > > > > Systematic support for all power-of-2 field widths with > > vector types. For example, we are interested in <64 x i2> being > > a legal type with appropriate expansion operations. A student > > has made a GSoC submission for this project. > > > > The question I have right now actually relates to the i2 type. > > In our SSE2 hoisting, we found an issue with the movemask_pd > > operation, which extracts the sign bits of the 2 doubles in > > a <2 x double> and returns them as an int32. We would > > like to use the icmp slt as the LLVM IR operation for this, > > but have a problem when we bitcast the <2 x i1> vector to i2, > > it seems. We use the following LLVM IR code. > > > > define i32 @signmaskd(<2 x double> %a) alwaysinline #5 > > { > > %bits = bitcast <2 x double> %a to <2 x i64> > > %b = icmp slt <2 x i64> %bits, zeroinitializer > > %c = bitcast <2 x i1> %b to i2 > > %result = zext i2 %c to i32 > > ret i32 %result > > } > > > > Unfortunately, we only get 1 bit of data out; the assembly language > > output seems to confirm that the individual bit extractions take > > place, but the second one clobbers the first. We are using the 3.4 > > tool chain. > > > > There is more detail at the following URL. > > http://parabix.costar.sfu.ca/wiki/I2Result > > > > Anyway the question is whether we should just try to treat > > this as a bug to be fixed or whether our idea of working with > > i2 types is misguided in a more fundamental way. > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140403/2c07c0b8/attachment.html>
Hi, Nadav. Thanks for this. I am now wondering about the treatment of i1 and <N x i1>. These are both fundamental types for LLVM primitives like icmp. My understanding right now is that i1 is probably dealt with by "promoting" i1 to i8 in the type legalizer phase. I haven't located where in the codebase this happens, yet, though. Another option that occurs to me is to assert that the types i1 and <N x i1> are legal, and require all the i1 operations to be implemented. Certainly, this would be straightforward for all the binary arithmetic and bitwise logic operations. But perhaps it is load/store where it becomes problematic. I am wondering if there exists any discussion of this design possibility that I might read. For i2 and <N x i2> it should probably be treated similarly (but not identically) to i1. I am guessing that the codegen bug you mention is related to the promotion of i2 to i8 in type legalization; I can certainly imagine that this code has not been so thoroughly evaluated. ----- Original Message -----> Hi Rob, > > This is a codegen bug. At the moment we don’t support bitcasting or > storing/loading memory of 'illegal' vector element types that are > smaller than i8. > > Thanks, > Nadav > > On Apr 2, 2014, at 6:52 PM, Rob Cameron <cameron at cs.sfu.ca> wrote: > > > Hi everyone. After lurking for a while, this is my > > first post to the list. > > > > I am working with some graduate students on the general > > topic of compiler support for SIMD programming and specific > > projects related to LLVM and my own Parabix technology > > (parabix.costar.sfu.ca). > > > > Right now we have a few course projects on the go and > > already a question arising out of one of them (SSE2 Hoisting). > > We're not sure how much has been tried before, or even > > makes sense, but we're eager to learn. > > > > Briefly the projects are: > > > > SSE2 Hoisting: translating programs that directly use SSE2 > > intrinsics into platform-independent code expressed with LLVM IR. > > > > Long integer support: systematic support for i128, i256, ... > > targetting > > SIMD registers. > > > > Systematic strategies for the shufflevector operation. This > > is a very powerful operation that can be used to code for arbitrary > > rearrangement of data in SIMD registers. No architecture we > > know of supports it in its full generaility. But there are > > many special cases that are recognized in code generation and > > potentially many more that might be. > > > > Systematic support for all power-of-2 field widths with > > vector types. For example, we are interested in <64 x i2> being > > a legal type with appropriate expansion operations. A student > > has made a GSoC submission for this project. > > > > The question I have right now actually relates to the i2 type. > > In our SSE2 hoisting, we found an issue with the movemask_pd > > operation, which extracts the sign bits of the 2 doubles in > > a <2 x double> and returns them as an int32. We would > > like to use the icmp slt as the LLVM IR operation for this, > > but have a problem when we bitcast the <2 x i1> vector to i2, > > it seems. We use the following LLVM IR code. > > > > define i32 @signmaskd(<2 x double> %a) alwaysinline #5 > > { > > %bits = bitcast <2 x double> %a to <2 x i64> > > %b = icmp slt <2 x i64> %bits, zeroinitializer > > %c = bitcast <2 x i1> %b to i2 > > %result = zext i2 %c to i32 > > ret i32 %result > > } > > > > Unfortunately, we only get 1 bit of data out; the assembly language > > output seems to confirm that the individual bit extractions take > > place, but the second one clobbers the first. We are using the > > 3.4 > > tool chain. > > > > There is more detail at the following URL. > > http://parabix.costar.sfu.ca/wiki/I2Result > > > > Anyway the question is whether we should just try to treat > > this as a bug to be fixed or whether our idea of working with > > i2 types is misguided in a more fundamental way. > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >