thr3ads.net - llvm dev - [llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Ginger Bill via llvm-dev

2016-Aug-15 08:54 UTC

[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results

I have been using LLVM as a backend for my compiler (I'm not using the 
LLVM libraries but my own to generate the necessary IR for numerous 
reasons). At the moment, I am implementing vector operations. 
Comparisons of vectors emit vectors of booleans|<N x i1>|and these are 
causing me problems.

To access vector elements, I have been 
using|extractelement|and|insertelement|however, I am getting some weird 
behaviour when I execute these instructions in a different orders. The 
code examples below have the same instructions and should be logically 
the same. Version 1 outputs|BAA|while Version 2 outputs|BAB|. Version 2 
is the logically correct version but I cannot figure out why version 1 
outputs the wrong version but has the exact same instructions, just in a 
different order.

I'm suspecting this may be a compiler bug rather than a code error. It 
may be due to the way vectors structured. I could not find any 
documentation on what the size of a vector is but it seems that vector 
elements get packed. |<N x iM>| size == (N*M+7)/8 bytes. This also 
suggests why the FAQ suggests not to use |getelementptr|on vectors as 
the elements may not be byte aligned.

As a workaround, is there a way to make sure a boolean vector isn't 
packed where each element takes up a byte, or convert a vector of 
booleans to either a vector of i8, or extra instructions to prevent this 
from happening?

Regards,

Bill

--------


    Version 1

|; Version 1 - Generated by my naïve SSA generator ; Outputs: BAA 
(incorrect) declare i32 @putchar(i32) define void @main() { entry: %0 = 
alloca <8 x i1>, align 8 ; v store <8 x i1> zeroinitializer, <8 x
i1>*
%0 %1 = alloca <8 x i1>, align 8 store <8 x i1> zeroinitializer,
<8 x
i1>* %1 %2 = load <8 x i1>, <8 x i1>* %1, align 8 %3 =
insertelement <8
x i1> %2, i1 true, i64 0 %4 = insertelement <8 x i1> %3, i1 false, i64
1
%5 = insertelement <8 x i1> %4, i1 true, i64 2 %6 = insertelement <8 x 
i1> %5, i1 false, i64 3 %7 = insertelement <8 x i1> %6, i1 true, i64 4 
%8 = insertelement <8 x i1> %7, i1 false, i64 5 %9 = insertelement <8 x
i1> %8, i1 true, i64 6 %10 = insertelement <8 x i1> %9, i1 false, i64 7
store <8 x i1> %10, <8 x i1>* %0 %11 = load <8 x i1>, <8 x
i1>* %0,
align 8 %12 = extractelement <8 x i1> %11, i64 0 %13 = zext i1 %12 to 
i32 %14 = add i32 %13, 65 ; + 'A' %15 = call i32 @putchar(i32 %14) %16 =
load <8 x i1>, <8 x i1>* %0, align 8 %17 = extractelement <8 x
i1> %16,
i64 1 %18 = zext i1 %17 to i32 %19 = add i32 %18, 65 ; + 'A' %20 = call 
i32 @putchar(i32 %19) %21 = load <8 x i1>, <8 x i1>* %0, align 8 %22
=
extractelement <8 x i1> %21, i64 2 %23 = zext i1 %22 to i32 %24 = add 
i32 %23, 65 ; + 'A' %25 = call i32 @putchar(i32 %24) %26 = call i32 
@putchar(i32 10) ; \n ret void } |

------------------------------------------------------------------------


    Version 2

|; Version 2 - Manually modified version of Version 1 ; Outputs: BAB 
(correct) declare i32 @putchar(i32) define void @main() { entry: %0 = 
alloca <8 x i1>, align 8 ; v store <8 x i1> zeroinitializer, <8 x
i1>*
%0 %1 = alloca <8 x i1>, align 8 store <8 x i1> zeroinitializer,
<8 x
i1>* %1 %2 = load <8 x i1>, <8 x i1>* %1, align 8 %3 =
insertelement <8
x i1> %2, i1 true, i64 0 %4 = insertelement <8 x i1> %3, i1 false, i64
1
%5 = insertelement <8 x i1> %4, i1 true, i64 2 %6 = insertelement <8 x 
i1> %5, i1 false, i64 3 %7 = insertelement <8 x i1> %6, i1 true, i64 4 
%8 = insertelement <8 x i1> %7, i1 false, i64 5 %9 = insertelement <8 x
i1> %8, i1 true, i64 6 %10 = insertelement <8 x i1> %9, i1 false, i64 7
store <8 x i1> %10, <8 x i1>* %0 %11 = load <8 x i1>, <8 x
i1>* %0,
align 8 %12 = load <8 x i1>, <8 x i1>* %0, align 8 %13 = load <8
x i1>,
<8 x i1>* %0, align 8 %14 = extractelement <8 x i1> %11, i64 0 %15 =
extractelement <8 x i1> %12, i64 1 %16 = extractelement <8 x i1>
%13,
i64 2 %17 = zext i1 %14 to i32 %18 = zext i1 %15 to i32 %19 = zext i1 
%16 to i32 %20 = add i32 %17, 65 ; + 'A' %21 = add i32 %18, 65 ; +
'A'
%22 = add i32 %19, 65 ; + 'A' %23 = call i32 @putchar(i32 %20) %24 = 
call i32 @putchar(i32 %21) %25 = call i32 @putchar(i32 %22) %26 = call 
i32 @putchar(i32 10) ; \n ret void } |

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160815/c5b9d97e/attachment-0001.html>

Hal Finkel via llvm-dev

2016-Aug-15 20:49 UTC

head link

[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results

Hi Bill, 

I highly recommend that you use only vectors of elements which have a size which
is a whole number of bytes. There are known issues with how we handle the
more-general cases, see:

https://llvm.org/bugs/show_bug.cgi?id=1784 
https://llvm.org/bugs/show_bug.cgi?id=22603 
https://llvm.org/bugs/show_bug.cgi?id=27600 

In short, different parts of the compiler disagree on whether <8 x i1> is
one or eight bytes long, and some parts do nonsensical things altogether. There
are a limited subset of cases where the current infrastructure works well
(mostly for handling vectors of i1 for vectorized comparisons), but if you stray
too far you'll run into problems. That having been said, we would like to
fix these things, and so if you find problems, please do file bug reports about
them.

-Hal 

----- Original Message -----
> From: "Ginger Bill via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Monday, August 15, 2016 3:54:58 AM
> Subject: [llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector
> instructions ordering causing different results
> I have been using LLVM as a backend for my compiler (I'm not using
> the LLVM libraries but my own to generate the necessary IR for
> numerous reasons). At the moment, I am implementing vector
> operations. Comparisons of vectors emit vectors of booleans <N x i1>
> and these are causing me problems.
> To access vector elements, I have been using extractelement and
> insertelement however, I am getting some weird behaviour when I
> execute these instructions in a different orders. The code examples
> below have the same instructions and should be logically the same.
> Version 1 outputs BAA while Version 2 outputs BAB . Version 2 is the
> logically correct version but I cannot figure out why version 1
> outputs the wrong version but has the exact same instructions, just
> in a different order.
> I'm suspecting this may be a compiler bug rather than a code error.
> It may be due to the way vectors structured. I could not find any
> documentation on what the size of a vector is but it seems that
> vector elements get packed. <N x iM> size == (N*M+7)/8 bytes. This
> also suggests why the FAQ suggests not to use getelementptr on
> vectors as the elements may not be byte aligned.
> As a workaround, is there a way to make sure a boolean vector isn't
> packed where each element takes up a byte, or convert a vector of
> booleans to either a vector of i8, or extra instructions to prevent
> this from happening?
> Regards,
> Bill
> --------
> Version 1
> ; Version 1 - Generated by my naïve SSA generator
> ; Outputs: BAA (incorrect)
> declare i32 @putchar(i32)
> define void @main() {
> entry:
> %0 = alloca <8 x i1>, align 8 ; v
> store <8 x i1> zeroinitializer, <8 x i1>* %0
> %1 = alloca <8 x i1>, align 8
> store <8 x i1> zeroinitializer, <8 x i1>* %1
> %2 = load <8 x i1>, <8 x i1>* %1, align 8
> %3 = insertelement <8 x i1> %2, i1 true, i64 0
> %4 = insertelement <8 x i1> %3, i1 false, i64 1
> %5 = insertelement <8 x i1> %4, i1 true, i64 2
> %6 = insertelement <8 x i1> %5, i1 false, i64 3
> %7 = insertelement <8 x i1> %6, i1 true, i64 4
> %8 = insertelement <8 x i1> %7, i1 false, i64 5
> %9 = insertelement <8 x i1> %8, i1 true, i64 6
> %10 = insertelement <8 x i1> %9, i1 false, i64 7
> store <8 x i1> %10, <8 x i1>* %0
> %11 = load <8 x i1>, <8 x i1>* %0, align 8
> %12 = extractelement <8 x i1> %11, i64 0
> %13 = zext i1 %12 to i32
> %14 = add i32 %13, 65 ; + 'A'
> %15 = call i32 @putchar(i32 %14)
> %16 = load <8 x i1>, <8 x i1>* %0, align 8
> %17 = extractelement <8 x i1> %16, i64 1
> %18 = zext i1 %17 to i32
> %19 = add i32 %18, 65 ; + 'A'
> %20 = call i32 @putchar(i32 %19)
> %21 = load <8 x i1>, <8 x i1>* %0, align 8
> %22 = extractelement <8 x i1> %21, i64 2
> %23 = zext i1 %22 to i32
> %24 = add i32 %23, 65 ; + 'A'
> %25 = call i32 @putchar(i32 %24)
> %26 = call i32 @putchar(i32 10) ; \n
> ret void
> }
> Version 2
> ; Version 2 - Manually modified version of Version 1
> ; Outputs: BAB (correct)
> declare i32 @putchar(i32)
> define void @main() {
> entry:
> %0 = alloca <8 x i1>, align 8 ; v
> store <8 x i1> zeroinitializer, <8 x i1>* %0
> %1 = alloca <8 x i1>, align 8
> store <8 x i1> zeroinitializer, <8 x i1>* %1
> %2 = load <8 x i1>, <8 x i1>* %1, align 8
> %3 = insertelement <8 x i1> %2, i1 true, i64 0
> %4 = insertelement <8 x i1> %3, i1 false, i64 1
> %5 = insertelement <8 x i1> %4, i1 true, i64 2
> %6 = insertelement <8 x i1> %5, i1 false, i64 3
> %7 = insertelement <8 x i1> %6, i1 true, i64 4
> %8 = insertelement <8 x i1> %7, i1 false, i64 5
> %9 = insertelement <8 x i1> %8, i1 true, i64 6
> %10 = insertelement <8 x i1> %9, i1 false, i64 7
> store <8 x i1> %10, <8 x i1>* %0
> %11 = load <8 x i1>, <8 x i1>* %0, align 8
> %12 = load <8 x i1>, <8 x i1>* %0, align 8
> %13 = load <8 x i1>, <8 x i1>* %0, align 8
> %14 = extractelement <8 x i1> %11, i64 0
> %15 = extractelement <8 x i1> %12, i64 1
> %16 = extractelement <8 x i1> %13, i64 2
> %17 = zext i1 %14 to i32
> %18 = zext i1 %15 to i32
> %19 = zext i1 %16 to i32
> %20 = add i32 %17, 65 ; + 'A'
> %21 = add i32 %18, 65 ; + 'A'
> %22 = add i32 %19, 65 ; + 'A'
> %23 = call i32 @putchar(i32 %20)
> %24 = call i32 @putchar(i32 %21)
> %25 = call i32 @putchar(i32 %22)
> %26 = call i32 @putchar(i32 10) ; \n
> ret void
> }
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160815/2a49735c/attachment.html>

llvm dev - Aug 2016 - [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results

[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results

[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results