Ginger Bill via llvm-dev
2016-Aug-15 08:54 UTC
[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results
I have been using LLVM as a backend for my compiler (I'm not using the LLVM libraries but my own to generate the necessary IR for numerous reasons). At the moment, I am implementing vector operations. Comparisons of vectors emit vectors of booleans|<N x i1>|and these are causing me problems. To access vector elements, I have been using|extractelement|and|insertelement|however, I am getting some weird behaviour when I execute these instructions in a different orders. The code examples below have the same instructions and should be logically the same. Version 1 outputs|BAA|while Version 2 outputs|BAB|. Version 2 is the logically correct version but I cannot figure out why version 1 outputs the wrong version but has the exact same instructions, just in a different order. I'm suspecting this may be a compiler bug rather than a code error. It may be due to the way vectors structured. I could not find any documentation on what the size of a vector is but it seems that vector elements get packed. |<N x iM>| size == (N*M+7)/8 bytes. This also suggests why the FAQ suggests not to use |getelementptr|on vectors as the elements may not be byte aligned. As a workaround, is there a way to make sure a boolean vector isn't packed where each element takes up a byte, or convert a vector of booleans to either a vector of i8, or extra instructions to prevent this from happening? Regards, Bill -------- Version 1 |; Version 1 - Generated by my naïve SSA generator ; Outputs: BAA (incorrect) declare i32 @putchar(i32) define void @main() { entry: %0 = alloca <8 x i1>, align 8 ; v store <8 x i1> zeroinitializer, <8 x i1>* %0 %1 = alloca <8 x i1>, align 8 store <8 x i1> zeroinitializer, <8 x i1>* %1 %2 = load <8 x i1>, <8 x i1>* %1, align 8 %3 = insertelement <8 x i1> %2, i1 true, i64 0 %4 = insertelement <8 x i1> %3, i1 false, i64 1 %5 = insertelement <8 x i1> %4, i1 true, i64 2 %6 = insertelement <8 x i1> %5, i1 false, i64 3 %7 = insertelement <8 x i1> %6, i1 true, i64 4 %8 = insertelement <8 x i1> %7, i1 false, i64 5 %9 = insertelement <8 x i1> %8, i1 true, i64 6 %10 = insertelement <8 x i1> %9, i1 false, i64 7 store <8 x i1> %10, <8 x i1>* %0 %11 = load <8 x i1>, <8 x i1>* %0, align 8 %12 = extractelement <8 x i1> %11, i64 0 %13 = zext i1 %12 to i32 %14 = add i32 %13, 65 ; + 'A' %15 = call i32 @putchar(i32 %14) %16 = load <8 x i1>, <8 x i1>* %0, align 8 %17 = extractelement <8 x i1> %16, i64 1 %18 = zext i1 %17 to i32 %19 = add i32 %18, 65 ; + 'A' %20 = call i32 @putchar(i32 %19) %21 = load <8 x i1>, <8 x i1>* %0, align 8 %22 = extractelement <8 x i1> %21, i64 2 %23 = zext i1 %22 to i32 %24 = add i32 %23, 65 ; + 'A' %25 = call i32 @putchar(i32 %24) %26 = call i32 @putchar(i32 10) ; \n ret void } | ------------------------------------------------------------------------ Version 2 |; Version 2 - Manually modified version of Version 1 ; Outputs: BAB (correct) declare i32 @putchar(i32) define void @main() { entry: %0 = alloca <8 x i1>, align 8 ; v store <8 x i1> zeroinitializer, <8 x i1>* %0 %1 = alloca <8 x i1>, align 8 store <8 x i1> zeroinitializer, <8 x i1>* %1 %2 = load <8 x i1>, <8 x i1>* %1, align 8 %3 = insertelement <8 x i1> %2, i1 true, i64 0 %4 = insertelement <8 x i1> %3, i1 false, i64 1 %5 = insertelement <8 x i1> %4, i1 true, i64 2 %6 = insertelement <8 x i1> %5, i1 false, i64 3 %7 = insertelement <8 x i1> %6, i1 true, i64 4 %8 = insertelement <8 x i1> %7, i1 false, i64 5 %9 = insertelement <8 x i1> %8, i1 true, i64 6 %10 = insertelement <8 x i1> %9, i1 false, i64 7 store <8 x i1> %10, <8 x i1>* %0 %11 = load <8 x i1>, <8 x i1>* %0, align 8 %12 = load <8 x i1>, <8 x i1>* %0, align 8 %13 = load <8 x i1>, <8 x i1>* %0, align 8 %14 = extractelement <8 x i1> %11, i64 0 %15 = extractelement <8 x i1> %12, i64 1 %16 = extractelement <8 x i1> %13, i64 2 %17 = zext i1 %14 to i32 %18 = zext i1 %15 to i32 %19 = zext i1 %16 to i32 %20 = add i32 %17, 65 ; + 'A' %21 = add i32 %18, 65 ; + 'A' %22 = add i32 %19, 65 ; + 'A' %23 = call i32 @putchar(i32 %20) %24 = call i32 @putchar(i32 %21) %25 = call i32 @putchar(i32 %22) %26 = call i32 @putchar(i32 10) ; \n ret void } | -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160815/c5b9d97e/attachment-0001.html>
Hal Finkel via llvm-dev
2016-Aug-15 20:49 UTC
[llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector instructions ordering causing different results
Hi Bill, I highly recommend that you use only vectors of elements which have a size which is a whole number of bytes. There are known issues with how we handle the more-general cases, see: https://llvm.org/bugs/show_bug.cgi?id=1784 https://llvm.org/bugs/show_bug.cgi?id=22603 https://llvm.org/bugs/show_bug.cgi?id=27600 In short, different parts of the compiler disagree on whether <8 x i1> is one or eight bytes long, and some parts do nonsensical things altogether. There are a limited subset of cases where the current infrastructure works well (mostly for handling vectors of i1 for vectorized comparisons), but if you stray too far you'll run into problems. That having been said, we would like to fix these things, and so if you find problems, please do file bug reports about them. -Hal ----- Original Message -----> From: "Ginger Bill via llvm-dev" <llvm-dev at lists.llvm.org> > To: llvm-dev at lists.llvm.org > Sent: Monday, August 15, 2016 3:54:58 AM > Subject: [llvm-dev] [LLVM IR] Possible compiler bug: <N x i1> vector > instructions ordering causing different results> I have been using LLVM as a backend for my compiler (I'm not using > the LLVM libraries but my own to generate the necessary IR for > numerous reasons). At the moment, I am implementing vector > operations. Comparisons of vectors emit vectors of booleans <N x i1> > and these are causing me problems. > To access vector elements, I have been using extractelement and > insertelement however, I am getting some weird behaviour when I > execute these instructions in a different orders. The code examples > below have the same instructions and should be logically the same. > Version 1 outputs BAA while Version 2 outputs BAB . Version 2 is the > logically correct version but I cannot figure out why version 1 > outputs the wrong version but has the exact same instructions, just > in a different order.> I'm suspecting this may be a compiler bug rather than a code error. > It may be due to the way vectors structured. I could not find any > documentation on what the size of a vector is but it seems that > vector elements get packed. <N x iM> size == (N*M+7)/8 bytes. This > also suggests why the FAQ suggests not to use getelementptr on > vectors as the elements may not be byte aligned.> As a workaround, is there a way to make sure a boolean vector isn't > packed where each element takes up a byte, or convert a vector of > booleans to either a vector of i8, or extra instructions to prevent > this from happening?> Regards, > Bill > -------- > Version 1 > ; Version 1 - Generated by my naïve SSA generator > ; Outputs: BAA (incorrect) > declare i32 @putchar(i32)> define void @main() { > entry: > %0 = alloca <8 x i1>, align 8 ; v > store <8 x i1> zeroinitializer, <8 x i1>* %0 > %1 = alloca <8 x i1>, align 8 > store <8 x i1> zeroinitializer, <8 x i1>* %1 > %2 = load <8 x i1>, <8 x i1>* %1, align 8 > %3 = insertelement <8 x i1> %2, i1 true, i64 0 > %4 = insertelement <8 x i1> %3, i1 false, i64 1 > %5 = insertelement <8 x i1> %4, i1 true, i64 2 > %6 = insertelement <8 x i1> %5, i1 false, i64 3 > %7 = insertelement <8 x i1> %6, i1 true, i64 4 > %8 = insertelement <8 x i1> %7, i1 false, i64 5 > %9 = insertelement <8 x i1> %8, i1 true, i64 6 > %10 = insertelement <8 x i1> %9, i1 false, i64 7 > store <8 x i1> %10, <8 x i1>* %0> %11 = load <8 x i1>, <8 x i1>* %0, align 8 > %12 = extractelement <8 x i1> %11, i64 0 > %13 = zext i1 %12 to i32 > %14 = add i32 %13, 65 ; + 'A' > %15 = call i32 @putchar(i32 %14)> %16 = load <8 x i1>, <8 x i1>* %0, align 8 > %17 = extractelement <8 x i1> %16, i64 1 > %18 = zext i1 %17 to i32 > %19 = add i32 %18, 65 ; + 'A' > %20 = call i32 @putchar(i32 %19)> %21 = load <8 x i1>, <8 x i1>* %0, align 8 > %22 = extractelement <8 x i1> %21, i64 2 > %23 = zext i1 %22 to i32 > %24 = add i32 %23, 65 ; + 'A' > %25 = call i32 @putchar(i32 %24)> %26 = call i32 @putchar(i32 10) ; \n> ret void > }> Version 2 > ; Version 2 - Manually modified version of Version 1 > ; Outputs: BAB (correct) > declare i32 @putchar(i32)> define void @main() { > entry: > %0 = alloca <8 x i1>, align 8 ; v > store <8 x i1> zeroinitializer, <8 x i1>* %0 > %1 = alloca <8 x i1>, align 8 > store <8 x i1> zeroinitializer, <8 x i1>* %1 > %2 = load <8 x i1>, <8 x i1>* %1, align 8 > %3 = insertelement <8 x i1> %2, i1 true, i64 0 > %4 = insertelement <8 x i1> %3, i1 false, i64 1 > %5 = insertelement <8 x i1> %4, i1 true, i64 2 > %6 = insertelement <8 x i1> %5, i1 false, i64 3 > %7 = insertelement <8 x i1> %6, i1 true, i64 4 > %8 = insertelement <8 x i1> %7, i1 false, i64 5 > %9 = insertelement <8 x i1> %8, i1 true, i64 6 > %10 = insertelement <8 x i1> %9, i1 false, i64 7 > store <8 x i1> %10, <8 x i1>* %0> %11 = load <8 x i1>, <8 x i1>* %0, align 8 > %12 = load <8 x i1>, <8 x i1>* %0, align 8 > %13 = load <8 x i1>, <8 x i1>* %0, align 8> %14 = extractelement <8 x i1> %11, i64 0 > %15 = extractelement <8 x i1> %12, i64 1 > %16 = extractelement <8 x i1> %13, i64 2> %17 = zext i1 %14 to i32 > %18 = zext i1 %15 to i32 > %19 = zext i1 %16 to i32> %20 = add i32 %17, 65 ; + 'A' > %21 = add i32 %18, 65 ; + 'A' > %22 = add i32 %19, 65 ; + 'A'> %23 = call i32 @putchar(i32 %20) > %24 = call i32 @putchar(i32 %21) > %25 = call i32 @putchar(i32 %22)> %26 = call i32 @putchar(i32 10) ; \n> ret void > }> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160815/2a49735c/attachment.html>