Christian Plessl
2008-Sep-29 13:18 UTC
[LLVMdev] Architecture Dependency of LLVM bitcode (was Re: compile linux kernel)
On 29.09.2008, at 11:53, Jonathan S. Shapiro wrote:> Watching this thread, it occurs to me that the "V" in "LLVM" is > creating > confusion. So far as I know, LLVM is the first project to use > "virtual" > to refer to the instruction set of the intermediate form. I understand > why this labeling made sense (sort of), but it was unfortunate. The > machine is abstract, not virtual, and the use of "virtual" here is so > out of keeping with any other use of the term that it really does > generate confusion.The topic whether LLVM bitcode is independent of the target platform was raised several times on the mailing list, but it was never discussed in detail. I would appreciate learning more about the following questions: - Is the architecture dependence of LLVM IR only an artifact of llvm- gcc producing architecture dependent results? - What architecture-specific features are present in the IR that prevent running the same LLVM bitcode on different architectures?> Is this worth a FAQ entry?I would definitely appreciate such a FAQ entry. Best regards, Christian
Andrew Lenharth
2008-Sep-29 13:46 UTC
[LLVMdev] Architecture Dependency of LLVM bitcode (was Re: compile linux kernel)
On Mon, Sep 29, 2008 at 8:18 AM, Christian Plessl <christian.plessl at uni-paderborn.de> wrote:> - Is the architecture dependence of LLVM IR only an artifact of llvm- > gcc producing architecture dependent results?No. It also is an artifact of code compiling architecture and OS dependent features based on what they detect at configure time It is an artifact of compiling non-type safe languages. It is an artifact of system headers including inline asm. It is an artifact of the endianness of the system.> - What architecture-specific features are present in the IR that > prevent running the same LLVM bitcode on different architectures?A better question is: what architecture-abstracting features would make writing target independent LLVM bitcode easier? There is 1 that I think is critical, and 3 more that would make life much easier (though technically redundant). hton and ntoh intrinsics. These are needed to allow target code to deal with endianness in a target independent way. (Ok, you could potentially write code that detected endiannes at runtime and chose multiversioned code based on that, but that is ugly and optimization prohibiting). redundant, but greatly simplifying: iptr aliased type. There are legitamate cases where you want to perform arithmetic and comparisons on pointers that the semantics of GEP make illegal so the only way to do so in a target independent way is to either cast to an int you hope is >= than any pointer, or violate the GEP semantics (which is generally works). GBP instruction (GetBasePointer). The inverse of a GEP. A GEP selects an offset into a object in a target independent way based on the type. What GBP would do would be to get a pointer to the base of an object based on a pointer to field, a type, and the same specifier as the GEP would use to get the field. x == GBP (GEP x, 0, 1, 1), typeof(x), 0, 1, 1 This would make upcasts or any conversion from an embedded object to a parent object not need arch dependent offsets and raw pointer manipulation. (yes you could figure out the offset from GEP off null trick and use raw pointer manipulation and casts) sizeof instruction. Again, you can use the GEP off null trick, but this isn't very obvious, but since it doesn't involve raw pointer manipulation. Andrew
Sherief N. Farouk
2008-Sep-29 19:03 UTC
[LLVMdev] Architecture Dependency of LLVM bitcode (was Re: compile linux kernel)
> hton and ntoh intrinsics. These are needed to allow target code to > deal with endianness in a target independent way. (Ok, you could > potentially write code that detected endiannes at runtime and chose > multiversioned code based on that, but that is ugly and optimization > prohibiting). >Why not add types with explicit endianess? A trick I use for reading binary files across platforms is to define the types int32, int32_le and int32_be : int32 is platform-native, _le and _be are little and big endian, respectively. I use and #ifdef in my types.hpp to determine which of _le and _be is a typedef for the standard uint32, and the other is implemented as a class with operator int32(). "add i32_be %X, 8" looks elegant to me, and quite easy (for someone writing the ir output to a text file, like me :) to bolt on to existing code. - Sherief
Eli Friedman
2008-Sep-29 19:28 UTC
[LLVMdev] Architecture Dependency of LLVM bitcode (was Re: compile linux kernel)
On Mon, Sep 29, 2008 at 6:46 AM, Andrew Lenharth <andrewl at lenharth.org> wrote:> hton and ntoh intrinsics.You can write these portably already; just store to an i32, cast the pointer to i8, read out the bytes, then reconstruct the i32. If I recall correctly, scalarrepl+instcombine should be able to eliminate the abstraction if they have target information. -Eli
On Sep 29, 2008, at 6:18 AM, Christian Plessl wrote:> On 29.09.2008, at 11:53, Jonathan S. Shapiro wrote: >> Watching this thread, it occurs to me that the "V" in "LLVM" is >> creating >> confusion. So far as I know, LLVM is the first project to use >> "virtual" >> to refer to the instruction set of the intermediate form. I >> understand >> why this labeling made sense (sort of), but it was unfortunate. The >> machine is abstract, not virtual, and the use of "virtual" here is so >> out of keeping with any other use of the term that it really does >> generate confusion. > > The topic whether LLVM bitcode is independent of the target platform > was raised several times on the mailing list,Wow, there is a lot of FUD and misinformation on this thread.> but it was never > discussed in detail. I would appreciate learning more about the > following questions: > > - Is the architecture dependence of LLVM IR only an artifact of llvm- > gcc producing architecture dependent results?No, it inherent to any C compiler. The preprocessor introduces target specific details and things just go downhill from there: http://llvm.org/docs/tutorial/LangImpl8.html#targetindep If you start from a target-independent *language*, you can generate target independent LLVM IR.> - What architecture-specific features are present in the IR that > prevent running the same LLVM bitcode on different architectures?Many things are target independent, but the most significant is that LLVM allows unrestricted pointer casting. An example that allows the programmer to "see" the underlying endianness of the target is C code like this: int X = ... char C = *(char*)&X A language that is target independent (java, perl, ...) would not allow the programmer to express such things.>> Is this worth a FAQ entry? > > I would definitely appreciate such a FAQ entry.Patches welcome :) -Chris
Thanks to anyone for these helpful answers. At least to me the causes for architecture dependencies in LLVM IR are much clearer now. On 29.09.2008, at 23:19, Chris Lattner wrote:>>> Is this worth a FAQ entry? >> I would definitely appreciate such a FAQ entry. > Patches welcome :)> > Many things are target independent, but the most significant is that > LLVM allows unrestricted pointer casting. An example that allows the > programmer to "see" the underlying endianness of the target is C code > like this: > > int X = ... > char C = *(char*)&XI don't feel sufficiently confident with the matter to write such a FAQ entry myself. But wouldn't it make sense to move the notes on target independency from the Kaleidoscope tutorial (http://llvm.org/docs/tutorial/LangImpl8.html#targetindep ) to the FAQ page? In my opinion, these explanations and the additional endianness example you gave, explain the issues with target dependencies quite well. Best regards, Christian