thr3ads.net - llvm dev - [llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness? [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Björn Pettersson A via llvm-dev

2021-Jan-15 11:39 UTC

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Markus
> Lavin via llvm-dev
> Sent: den 11 januari 2021 11:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] bitcast <8 x i1> to i8 - dependence on
endianness?
> 
> While debugging an OOT issue with masked memory intrinsics I came across
> lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the
> following form are introduced
> 
> %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8
> 
> That is when emulating masked stores on machine that is lacking hardware
> support the <8 x i1> mask vector is bitcasted to a i8 scalar type.
Now
> the problem is that this appears to yield different results for big-
> endian and little-endian targets.
> 
> AFIK in general LLVM IR vectors are laid out in memory with the first
> element at the lowest address (i.e. independent of endianness) but for
> the i1 type (and possibly all sub-byte sized types) there seem to be a
> dependence on target endianness.
> 
> For example
> 
> define i8 @foo() {
> entry:
>   %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
>   %bc = bitcast <8 x i1> %v to i8
>   ret i8 %bc
> }
> 
> $ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
> $ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar
> 
> with similar results for mips (big-endian) and amd64 (little-endian)
> 
> Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since
> the mask gets reversed for big-endian targets. I tried addressing this by
> compensating for endianness when, later in the pass, checking the
> individual bits of the scalar. This compensation seemed to work well for
> our big-endian target but rather surprisingly (to me) ARM specific lit-
> tests then started failing
> 
> Failed Tests (3):
>   LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-load.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-store.ll
It would be nice if someone from ARM could acknowledge that the codegen actually
is faulty for big-endian now (all I know is that David Green has done lots of
changes to those test cases in the past according to git log, but anyone with
mve knowledge could perhaps look at it).

@markus: Could you help out locating the functions that we think is wrong in
those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to
Phabricator to show the differences both to the LLVM code and the new codegen
for those test cases?
> 
> This leaves me with several questions:
> 
> 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the
result
> supposed to be dependent on target endianness?
> 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
> 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then
> aren't the three lit-tests also broken since they brake when I try to
fix
> the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?
> 
> Best regards,
> -Markus
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=8023e26a-dfb8d873-8023a2f1-
> 8682aaa22bc0-a28214bd5e17ca25&q=1&e=7feee9fa-d638-4a7e-a187-
> bff3673adec6&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev

David Green via llvm-dev

2021-Jan-15 12:03 UTC

head link

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

Sorry - Yes. The ARM/MVE tests are correct as-is, in that they produce the
correct output under big endian as far as I can tell. (The aligned test not
being scalarized produces the same output as the unaligned case that is). When
MVE is enabled the backend is assuming that low lanes end up in low bits of the
predicate mask. So the two cancel each other out and we happen to end up with
the correct code.

Apparently this is different to the rest of llvm, which assumes the opposite for
non-byte sized vectors? That is surprising, we even have some instructions under
MVE for storing predicates which under big endian assume the low lane is in the
low bits. I would not be surprised if this was causing problems somewhere under
big endian though, it does not get nearly as much use as little endian.
> @markus: Could you help out locating the functions that we think is wrong
in those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to
Phabricator to show the differences both to the LLVM code and the new codegen
for those test cases?
Yeah, If you can upload a phabricator review for the changes in the expansion of
masked intrinsics, I can take a look into the MVE codegen and see if I can get
it to store in the opposite order sensibly. I have not looked at what that would
take yet, but I'm hoping it's not too difficult.

Thanks,
Dave


From: Björn Pettersson A
Sent: 15 January 2021 11:39
To: llvm-dev <llvm-dev at lists.llvm.org>; David Green; Markus Lavin
Subject: RE: bitcast <8 x i1> to i8 - dependence on endianness? 
 > -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Markus
> Lavin via llvm-dev
> Sent: den 11 januari 2021 11:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] bitcast <8 x i1> to i8 - dependence on
endianness?
> 
> While debugging an OOT issue with masked memory intrinsics I came across
> lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp where bitcasts of the
> following form are introduced
> 
> %scalar_mask = bitcast <8 x i1> %interleaved.mask to i8
> 
> That is when emulating masked stores on machine that is lacking hardware
> support the <8 x i1> mask vector is bitcasted to a i8 scalar type.
Now
> the problem is that this appears to yield different results for big-
> endian and little-endian targets.
> 
> AFIK in general LLVM IR vectors are laid out in memory with the first
> element at the lowest address (i.e. independent of endianness) but for
> the i1 type (and possibly all sub-byte sized types) there seem to be a
> dependence on target endianness.
> 
> For example
> 
> define i8 @foo() {
> entry:
>   %v = insertelement <8 x i1> zeroinitializer, i1 true, i8 0
>   %bc = bitcast <8 x i1> %v to i8
>   ret i8 %bc
> }
> 
> $ llc -O3 bitcast.ll --mtriple arm -o -     # lsb is set in scalar
> $ llc -O3 bitcast.ll --mtriple armeb -o -     # msb is set in scalar
> 
> with similar results for mips (big-endian) and amd64 (little-endian)
> 
> Now for ScalarizeMaskedMemIntrin.cpp this must surely be a problem since
> the mask gets reversed for big-endian targets. I tried addressing this by
> compensating for endianness when, later in the pass, checking the
> individual bits of the scalar. This compensation seemed to work well for
> our big-endian target but rather surprisingly (to me) ARM specific lit-
> tests then started failing
> 
> Failed Tests (3):
>   LLVM :: CodeGen/Thumb2/mve-masked-ldst.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-load.ll
>   LLVM :: CodeGen/Thumb2/mve-masked-store.ll
It would be nice if someone from ARM could acknowledge that the codegen actually
is faulty for big-endian now (all I know is that David Green has done lots of
changes to those test cases in the past according to git log, but anyone with
mve knowledge could perhaps look at it).

@markus: Could you help out locating the functions that we think is wrong in
those tests? Maybe even upload your fixes in ScalarizeMaskedMemIntrin.cpp to
Phabricator to show the differences both to the LLVM code and the new codegen
for those test cases?
> 
> This leaves me with several questions:
> 
> 1. Is a bitcast <8 x i1> %v to i8 well defined and if so is the
result
> supposed to be dependent on target endianness?
> 2. Is ScalarizeMaskedMemIntrin.cpp broken for big-endian targets?
> 3. If ScalarizeMaskedMemIntrin.cpp is broken for big-endian targets then
> aren't the three lit-tests also broken since they brake when I try to
fix
> the alleged brokenness of ScalarizeMaskedMemIntrin.cpp?
> 
> Best regards,
> -Markus
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://protect2.fireeye.com/v1/url?k=8023e26a-dfb8d873-8023a2f1-
> 8682aaa22bc0-a28214bd5e17ca25&q=1&e=7feee9fa-d638-4a7e-a187-
> bff3673adec6&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev

llvm dev - Jan 2021 - bitcast <8 x i1> to i8 - dependence on endianness?

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?

[llvm-dev] bitcast <8 x i1> to i8 - dependence on endianness?