Markus Lavin via llvm-dev
2021-Jan-11 10:21 UTC
[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?
While debugging an OOT issue with masked memory intrinsics I came across lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the following form are introduced %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8 That is when emulating masked stores on machine that is lacking hardware support the <8 x i1> mask vector is bitcasted to a i8 scalar type. Now the problem is that this appears to yield different results for big-endian and little-endian targets. AFIK in general LLVM IR vectors are laid out in memory with the first element at the lowest address (i.e. independent of endianness) but for the i1 type (and possibly all sub-byte sized types) there seem to be a dependence on target endianness. For example define i8 @foo() { entry: %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0 %bc = bitcast <8 x i1> %v to i8 ret i8 %bc } $ llc -O3 bitcast.ll --mtriple arm -o - # lsb is set in scalar $ llc -O3 bitcast.ll --mtriple armeb -o - # msb is set in scalar with similar results for mips (big-endian) and amd64 (little-endian) Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since the mask gets reversed for big-endian targets. I tried addressing this by compensating for endianness when, later in the pass, checking the individual bits of the scalar. This compensation seemed to work well for our big-endian target but rather surprisingly (to me) ARM specific lit-tests then started failing Failed Tests (3): LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll LLVM :: CodeGen/Thumb2/mve-masked-load.ll LLVM :: CodeGen/Thumb2/mve-masked-store.ll This leaves me with several questions: 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the result supposed to be dependent on target endianness? 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets? 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then aren't the three lit-tests also broken since they brake when I try to fix the alleged brokenness of ScalarizeMaskedMemIntrin.cpp? Best regards, -Markus
Björn Pettersson A via llvm-dev
2021-Jan-15 11:39 UTC
[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?
> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Markus > Lavin via llvm-dev > Sent: den 11 januari 2021 11:21 > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness? > > While debugging an OOT issue with masked memory intrinsics I came across > lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the > following form are introduced > > %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8 > > That is when emulating masked stores on machine that is lacking hardware > support the <8 x i1> mask vector is bitcasted to a i8 scalar type. Now > the problem is that this appears to yield different results for big- > endian and little-endian targets. > > AFIK in general LLVM IR vectors are laid out in memory with the first > element at the lowest address (i.e. independent of endianness) but for > the i1 type (and possibly all sub-byte sized types) there seem to be a > dependence on target endianness. > > For example > > define i8 @foo() { > entry: > %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0 > %bc = bitcast <8 x i1> %v to i8 > ret i8 %bc > } > > $ llc -O3 bitcast.ll --mtriple arm -o - # lsb is set in scalar > $ llc -O3 bitcast.ll --mtriple armeb -o - # msb is set in scalar > > with similar results for mips (big-endian) and amd64 (little-endian) > > Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since > the mask gets reversed for big-endian targets. I tried addressing this by > compensating for endianness when, later in the pass, checking the > individual bits of the scalar. This compensation seemed to work well for > our big-endian target but rather surprisingly (to me) ARM specific lit- > tests then started failing > > Failed Tests (3): > LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll > LLVM :: CodeGen/Thumb2/mve-masked-load.ll > LLVM :: CodeGen/Thumb2/mve-masked-store.llIt would be nice if someone from ARM could acknowledge that the codegen actually is faulty for big-endian now (all I know is that David Green has done lots of changes to those test cases in the past according to git log, but anyone with mve knowledge could perhaps look at it). @markus: Could you help out locating the functions that we think is wrong in those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to Phabricator to show the differences both to the LLVM code and the new codegen for those test cases?> > This leaves me with several questions: > > 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the result > supposed to be dependent on target endianness? > 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets? > 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then > aren't the three lit-tests also broken since they brake when I try to fix > the alleged brokenness of ScalarizeMaskedMemIntrin.cpp? > > Best regards, > -Markus > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://protect2.fireeye.com/v1/url?k=8023e26a-dfb8d873-8023a2f1- > 8682aaa22bc0-a28214bd5e17ca25&q=1&e=7feee9fa-d638-4a7e-a187- > bff3673adec6&u=https%3A%2F%2Flists.llvm.org%2Fcgi- > bin%2Fmailman%2Flistinfo%2Fllvm-dev