Hi, I'd like to use LLVM to compile and optimise code when I don't know whether the target CPU is big- or little-endian. This would allow me to create a single optimised LLVM bitcode binary of an application, and then run it through a JIT compiler on systems of differening endianness. I realise that in general the LLVM IR depends on various characteristics of the target; I'd just like to be able to remove this dependency for the specific case of unknown target endianness. Here's a sketch of how it would work: 1. Extend TargetData::isBigEndian() and LLVM bitcode's "target data layout string" so that endianness is represented as either big, little or unknown. (I see there's already support for something like this in Module::getEndianness().) 2. For optimisations (like parts of SRA) that depend on knowing the target endianness, restrict or disable them as necessary if the target endianness is unknown. I think this will only affect a small handful of optimisations. 3. In llvm-gcc, if the LLVM backend reports unknown endianness, make sure that the conversion from GCC trees to LLVM IR doesn't depend on endianness. This seems to be fairly straightforward, *except* for access to bitfields, which is a bit convoluted. 4. In llvm-gcc, if the LLVM backend reports unknown endianness, make sure that GCC's optimisations on trees don't depend on endianness. 5. Have the linker refuse to link a big-endian module with a little-endian one, but allow linking a module of unknown endianness with a module of any endianness at all. (I think this might work already.) I'm already working on this myself. Would you be interested in having this work contributed back to LLVM? Thanks, Jay.
I would find this functionality useful if it made it back into trunk. scott On Tue, Oct 21, 2008 at 2:27 AM, Jay Foad <jay.foad at gmail.com> wrote:> Hi, > > I'd like to use LLVM to compile and optimise code when I don't know > whether the target CPU is big- or little-endian. This would allow me > to create a single optimised LLVM bitcode binary of an application, > and then run it through a JIT compiler on systems of differening > endianness. > > I realise that in general the LLVM IR depends on various > characteristics of the target; I'd just like to be able to remove this > dependency for the specific case of unknown target endianness. > > Here's a sketch of how it would work: > > 1. Extend TargetData::isBigEndian() and LLVM bitcode's "target data > layout string" so that endianness is represented as either big, little > or unknown. (I see there's already support for something like this in > Module::getEndianness().) > > 2. For optimisations (like parts of SRA) that depend on knowing the > target endianness, restrict or disable them as necessary if the target > endianness is unknown. I think this will only affect a small handful > of optimisations. > > 3. In llvm-gcc, if the LLVM backend reports unknown endianness, make > sure that the conversion from GCC trees to LLVM IR doesn't depend on > endianness. This seems to be fairly straightforward, *except* for > access to bitfields, which is a bit convoluted. > > 4. In llvm-gcc, if the LLVM backend reports unknown endianness, make > sure that GCC's optimisations on trees don't depend on endianness. > > 5. Have the linker refuse to link a big-endian module with a > little-endian one, but allow linking a module of unknown endianness > with a module of any endianness at all. (I think this might work > already.) > > I'm already working on this myself. Would you be interested in having > this work contributed back to LLVM? > > Thanks, > Jay. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Oct 21, 2008, at 2:27 AM, Jay Foad wrote:> Hi, > > I'd like to use LLVM to compile and optimise code when I don't know > whether the target CPU is big- or little-endian. This would allow me > to create a single optimised LLVM bitcode binary of an application, > and then run it through a JIT compiler on systems of differening > endianness.Ok.> I realise that in general the LLVM IR depends on various > characteristics of the target; I'd just like to be able to remove this > dependency for the specific case of unknown target endianness.Sure. In practice, it should be possible to produce target- independent LLVM IR if you have a target-independent input language. The trick is making it so that the optimizers preserve this property. Endianness is only one piece of this puzzle.> 3. In llvm-gcc, if the LLVM backend reports unknown endianness, make > sure that the conversion from GCC trees to LLVM IR doesn't depend on > endianness. This seems to be fairly straightforward, *except* for > access to bitfields, which is a bit convoluted.This will never work for llvm-gcc. To much target-specific stuff is already folded before the llvm backend is even involved.> I'm already working on this myself. Would you be interested in having > this work contributed back to LLVM?If this were to better support target independent languages, it would be very useful. If you're just trying to *reduce* the endianness assumptions that leak through, I don't think it's a good approach. There is just no way to solve this problem with C. By the time the preprocessor has run, your C code has already had #ifdef __LITTLE_ENDIAN__ etc evaluated, for example. How do you propose to handle things like: struct foo { #ifdef __LITTLE_ENDIAN__ int x, y; #else int y, x; #endif }; -Chris
>> I'm already working on this myself. Would you be interested in having >> this work contributed back to LLVM? > > If this were to better support target independent languages, it would > be very useful. If you're just trying to *reduce* the endianness > assumptions that leak through, I don't think it's a good approach. > There is just no way to solve this problem with C.Yes, I can see that the llvm part of this is more straightforward and less controversial than the llvm-gcc part. Maybe I should submit the llvm part (since it applies to all source languages) and keep the llvm-gcc part as a local hack.> How do you propose to handle things like: > > struct foo { > #ifdef __LITTLE_ENDIAN__ > int x, y; > #else > int y, x; > #endif > };I can't make all C programs work regardless of target endianness. This one will only work on little-endian: int x = 1; assert(*(char *)&x == 1); You've just highlighted another restriction that I'll have to impose: you shouldn't expect to be able to detect target endianness at compile time. All I want is that, if you write your source code so that it doesn't make assumptions about endianness, then the compiler and its optimisations won't introduce any new assumptions about endianness. Thanks, Jay.
Hi, Chris Lattner wrote:> On Oct 21, 2008, at 2:27 AM, Jay Foad wrote: > > >> Hi, >> >> I'd like to use LLVM to compile and optimise code when I don't know >> whether the target CPU is big- or little-endian. This would allow me >> to create a single optimised LLVM bitcode binary of an application, >> and then run it through a JIT compiler on systems of differening >> endianness. >> > > Ok. > > >> I realise that in general the LLVM IR depends on various >> characteristics of the target; I'd just like to be able to remove this >> dependency for the specific case of unknown target endianness. >> > > Sure. In practice, it should be possible to produce target- > independent LLVM IR if you have a target-independent input language. > The trick is making it so that the optimizers preserve this property. > Endianness is only one piece of this puzzle. > > >> 3. In llvm-gcc, if the LLVM backend reports unknown endianness, make >> sure that the conversion from GCC trees to LLVM IR doesn't depend on >> endianness. This seems to be fairly straightforward, *except* for >> access to bitfields, which is a bit convoluted. >> > > This will never work for llvm-gcc. To much target-specific stuff is > already folded before the llvm backend is even involved. > > >> I'm already working on this myself. Would you be interested in having >> this work contributed back to LLVM? >> > > If this were to better support target independent languages, it would > be very useful. If you're just trying to *reduce* the endianness > assumptions that leak through, I don't think it's a good approach. > There is just no way to solve this problem with C. By the time the > preprocessor has run, your C code has already had #ifdef > __LITTLE_ENDIAN__ etc evaluated, for example. > > How do you propose to handle things like: > > struct foo { > #ifdef __LITTLE_ENDIAN__ > int x, y; > #else > int y, x; > #endif > }; >Define a fixed endianess as in-memory representation and let the optimizer at JIT time optimize the shuffling access patterns away, if possible. E.g. either all big-endian as it's the network order, or all little-endian because more "not so well written" software would continue to just work. -- René Rebe - ExactCODE GmbH - Europe, Germany, Berlin http://exactcode.de | http://t2-project.org | http://rene.rebe.name