Gao, Yunzhong via llvm-dev
2015-Dec-01 19:52 UTC
[llvm-dev] Endianness for multi-word types
> -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: Tuesday, December 01, 2015 1:01 AM > To: Tim Shen > Cc: Gao, Yunzhong; llvm-dev at lists.llvm.org; Kit Barton; Nemanja Ivanovic > Subject: Re: [llvm-dev] Endianness for multi-word types > > ----- Original Message ----- > > From: "Tim Shen via llvm-dev" <llvm-dev at lists.llvm.org> > > To: "Yunzhong Gao" <yunzhong_gao at playstation.sony.com>, > > llvm-dev at lists.llvm.org > > Sent: Monday, November 30, 2015 10:14:05 PM > > Subject: Re: [llvm-dev] Endianness for multi-word types > > > > On Mon, Nov 30, 2015 at 7:24 PM Gao, Yunzhong < > > yunzhong_gao at playstation.sony.com > wrote: > > > > > > According to > > http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html , "The > > high-order double-precision value (the one that comes first in > > storage) must have the larger magnitude." > > > > So the order of the two doubles in your fp128 is not affected by the > > endianness; although each individual double is. > > > > Well I'm still trying to understand the code base, so I may get this > > completely wrong, but here is my understanding: > > > > Looking at the comment of TargetLowering::hasBigEndianPartOrdering, > > it seems that, all "internal types" (e.g. ppc_fp128, i128) are assumed > > to be little-endian, and for non-ppc_fp128 external values (raw > > register tuples or memory), the endianness follows the data layout > > endianness; for ppc_fp128 external values (a pair of float registers > > or memory chunk), it's always "big-endian" due to the policy. > > > > Right. ppc_fp128 is special in this regard. Its layout is fixed in an Endian- > independent way. This can be confusing, however, because it means that > casts from i128 <-> ppc_fp128 have an Endianness-dependent result. This > requires a number of special cases in various places (unfortunately). > > This is why we needed to disable ppc_fp128 constant folding through bitcasts, > for example. From IR/ConstantFold.cpp: > > // Handle ConstantFP input: FP -> Integral. > if (ConstantFP *FP = dyn_cast<ConstantFP>(V)) { > // PPC_FP128 is really the sum of two consecutive doubles, where the first > // double is always stored first in memory, regardless of the target > // endianness. The memory layout of i128, however, depends on the > target > // endianness, and so we can't fold this without target endianness > // information. This should instead be handled by > // Analysis/ConstantFolding.cpp > if (FP->getType()->isPPC_FP128Ty()) > return nullptr; > > > The problem is when we bitcast from ppc_fp128 to i128, do we exchange > > the two registers? Today we do, because ppc_fp128 requires a mandatory > > big-endian representation, but at the same time it's a i128, which is > > interpreted as a little-endian 128-bit integer. > > That's why the combiner I previously mentioned gets confused. > > If we don't exchange the two registers, combiners are happy because > > they get a little-endian view of the two doubles (e.g. the most > > significant bit is indeed the sign bit); but if someone stores this > > casted i128 to memory, the two words are in little-endian, which > > breaks the ppc_fp128 layout policy. It also breaks the bitcast > > semantic: "The conversion is done as if the value had been stored to > > memory and read back as type ty2." > > If that's what's happening, I agree, that seems broken. The ppc_fp128 <-> > i128 bitcast should legitimately have different results on big-Endian and little- > Endian systems. Swapping the registers when we do a bitcast in some > attempt to remove this effect is not self consistent. However, we seem to > get this right currently: > > $ cat /tmp/ld.c > long foo(long double x) { > return *(long *) &x; > // returns f1 on both powerpc64 and powerpc64le } > > long bar(long double x) { > __int128 y = *(__int128 *) &x; > return (long) y; > // returns f2 on powerpc64 and f1 on powerpc64le } > > Given that the higher-order double is defined to be the one that comes first > in storage, in the first case, foo(), regardless of Endianness, should return the > higher-order double as an integer. In the second case, bar(), we're returning > the lower-order (integer) part of the i128 overlaying the ppc_fp128 value. On > a little-Endian system, this should be the first double stored in memory, as an > integer. That's the higher-order double, and so the code should be the same > as for foo(). For a big-Endian system, the lower-order part of the i128 integer > is stored in the second double (in layout), and so it should return the lower- > order double, as an integer. This is the case that should differ from foo(), but > only on big-Endian systems. > > Luckily, this is what happens (and what GCC does as well).What Hal said is correct, when bitcasting from ppc_fp128 to i128, you need to exchange the two registers, otherwise you will break the test case in Hal's email.> > I did some investigation, it turns out in > > lib/CodeGen/SelectionDAG/DAGCombiner.cpp, a combination is wrongly > > assuming the endianness for the i128 bitcasted from a ppc_fp128 (two > > doubles bit-concatenated together): > > // fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit)) > > > > This reveals that endianness conversion concerning ppc_fp128 or i128, > > and big-endian or little-endian target is too confusing to me. > > > > I can certainly fix this case by using a "smarter signbit" (e.g. > > 0x00000000000000001000000000000000), but this seems more confusing. > > I wonder if anyone has the best background knowledge here? > > And, unfortunately, this is exactly what you should do. ppc_fp128 is a special > case here (again).Actually, I think the smart signbit here is not correct. For the fneg case, you need to toggle the sign bits of both components of a ppc_fp128: negate(d1 + d2) should produce (-d1 - d2) instead of (-d1 + d2). It probably indicates that this pass is wrong for both big-endian and little-endian ppc64 targets, but existing test cases were not strong enough to catch such cases. For the fabs case, you can test positivity using only one signbit, but to negate the number, you still need to toggle both sign bits. I have a feeling that you will run into more cases in the DAG combiner where you want to add special handling codes for ppc_fp128, and I also feel that this type is peculiar enough that other backend owners upon trying to add more DAG combiner passes are likely to miss adding the special handling codes as they probably should. In that case it would be a better idea to handle the ppc_fp128 DAG combiner cases somewhere under llvm::PPCTargetLowering::PerformDAGCombine() instead of in the target-independent DAGCombiner.cpp, - Gao> > -Hal > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
On Tue, Dec 1, 2015 at 11:52 AM Gao, Yunzhong < yunzhong_gao at playstation.sony.com> wrote:> Actually, I think the smart signbit here is not correct. For the fneg > case, you need to >toggle the sign bits of both components of a ppc_fp128: negate(d1 + d2)> should > produce (-d1 - d2) instead of (-d1 + d2). It probably indicates that this > pass is wrong > for both big-endian and little-endian ppc64 targets, but existing test > cases were not strong > enough to catch such cases. For the fabs case, you can test positivity > using only one > signbit, but to negate the number, you still need to toggle both sign bits. >Sure, I didn't pay much attention on crafting the signbit.> > I have a feeling that you will run into more cases in the DAG combiner > where you want > to add special handling codes for ppc_fp128, and I also feel that this > type is peculiar > enough that other backend owners upon trying to add more DAG combiner > passes are > likely to miss adding the special handling codes as they probably should.This is what I was worrying about.> In that case it > would be a better idea to handle the ppc_fp128 DAG combiner cases > somewhere under > llvm::PPCTargetLowering::PerformDAGCombine() instead of in the > target-independent > DAGCombiner.cpp, >It will work, but it seems doubling the maintenance for floating point combiners. I wonder if there is any better solution to make both combiners and endianness less painful - the problem sounds not as hard as what we have now. As a simple solution, when see a LLVM IR bitcast, instead of generating (ISD::BITCAST x), can we generate (exchange_hi_lo (ISD::BITCAST x)) instead? ISD::BITCAST itself doesn't swap the registers anymore. In such case: (exchange_hi_lo (ISD::BITCAST (fabs x)) is transformed to: (exchange_hi_lo (and (ISD::BITCAST x) (not signbit)), which is correct. I'm not sure if exchange_hi_lo exists or not. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151201/398cf751/attachment.html>
On Tue, Dec 1, 2015 at 1:41 PM Tim Shen <timshen at google.com> wrote:> (ISD::BITCAST x), can we generate (exchange_hi_lo (ISD::BITCAST x)) >Generate this for only when x is ppcf128, of course. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151201/58aa3c51/attachment.html>
Tim Northover via llvm-dev
2015-Dec-01 22:00 UTC
[llvm-dev] Endianness for multi-word types
On 1 December 2015 at 13:41, Tim Shen via llvm-dev <llvm-dev at lists.llvm.org> wrote:> As a simple solution, when see a LLVM IR bitcast, instead of generating > (ISD::BITCAST x), can we generate (exchange_hi_lo (ISD::BITCAST x)) instead?An LLVM bitcast is defined to be equivalent to a store/load pair. Changing that for ISD::BITCAST would be very surprising, and I wouldn't recommend it. It's a very useful invariant for reasoning about what should happen. When we had to work around similar endian nightmares in ARM I think we ended up creating AArch64ISD::NVCAST to represent a true nop cast. Cheers. Tim.