thr3ads.net - llvm dev - [llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness? [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Markus Lavin via llvm-dev

2021-Jan-11 10:21 UTC

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

While debugging an OOT issue with masked memory intrinsics I came across
lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the
following form are introduced

%scalar_mask = bitcast <8 x i1> %interleaved.mask to i8

That is when emulating masked stores on machine that is lacking hardware support
the <8 x i1> mask vector is bitcasted to a i8 scalar type. Now the problem
is that this appears to yield different results for big-endian and little-endian
targets.

AFIK in general LLVM IR vectors are laid out in memory with the first element at
the lowest address (i.e. independent of endianness) but for the i1 type (and
possibly all sub-byte sized types) there seem to be a dependence on target
endianness.

For example

define i8 @foo() {
entry:
  %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
  %bc = bitcast <8 x i1> %v to i8
  ret i8 %bc
}

$ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
$ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar

with similar results for mips (big-endian) and amd64 (little-endian)

Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since the
mask gets reversed for big-endian targets. I tried addressing this by
compensating for endianness when, later in the pass, checking the individual
bits of the scalar. This compensation seemed to work well for our big-endian
target but rather surprisingly (to me) ARM specific lit-tests then started
failing

Failed Tests (3):
  LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
  LLVM :: CodeGen/Thumb2/mve-masked-load.ll
  LLVM :: CodeGen/Thumb2/mve-masked-store.ll

This leaves me with several questions:

1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the result
supposed to be dependent on target endianness?
2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then
aren't the three lit-tests also broken since they brake when I try to fix
the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?

Best regards,
-Markus

Björn Pettersson A via llvm-dev

2021-Jan-15 11:39 UTC

head link

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Markus
> Lavin via llvm-dev
> Sent: den 11 januari 2021 11:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] bitcast <8 x i1> to i8 - dependence on
endianness?
> 
> While debugging an OOT issue with masked memory intrinsics I came across
> lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the
> following form are introduced
> 
> %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8
> 
> That is when emulating masked stores on machine that is lacking hardware
> support the <8 x i1> mask vector is bitcasted to a i8 scalar type.
Now
> the problem is that this appears to yield different results for big-
> endian and little-endian targets.
> 
> AFIK in general LLVM IR vectors are laid out in memory with the first
> element at the lowest address (i.e. independent of endianness) but for
> the i1 type (and possibly all sub-byte sized types) there seem to be a
> dependence on target endianness.
> 
> For example
> 
> define i8 @foo() {
> entry:
>   %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
>   %bc = bitcast <8 x i1> %v to i8
>   ret i8 %bc
> }
> 
> $ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
> $ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar
> 
> with similar results for mips (big-endian) and amd64 (little-endian)
> 
> Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since
> the mask gets reversed for big-endian targets. I tried addressing this by
> compensating for endianness when, later in the pass, checking the
> individual bits of the scalar. This compensation seemed to work well for
> our big-endian target but rather surprisingly (to me) ARM specific lit-
> tests then started failing
> 
> Failed Tests (3):
>   LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-load.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-store.ll
It would be nice if someone from ARM could acknowledge that the codegen actually
is faulty for big-endian now (all I know is that David Green has done lots of
changes to those test cases in the past according to git log, but anyone with
mve knowledge could perhaps look at it).

@markus: Could you help out locating the functions that we think is wrong in
those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to
Phabricator to show the differences both to the LLVM code and the new codegen
for those test cases?
> 
> This leaves me with several questions:
> 
> 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the
result
> supposed to be dependent on target endianness?
> 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
> 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then
> aren't the three lit-tests also broken since they brake when I try to
fix
> the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?
> 
> Best regards,
> -Markus
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=8023e26a-dfb8d873-8023a2f1-
> 8682aaa22bc0-a28214bd5e17ca25&q=1&e=7feee9fa-d638-4a7e-a187-
> bff3673adec6&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev

llvm dev - Jan 2021 - bitcast <8 x i1> to i8 - dependence on endianness?

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?