Renato Golin
2011-Mar-25 15:27 UTC
[LLVMdev] Named metadata to represent language specific logic
Hi all, I was wondering if we could use named metadata to store some of C++ logic without changing the IR. This is primarily only for front-end buiding and the resulting IR (with or without metadata) should be the same as it is today (or better). I say this because of the number of global variables front-ends need to keep because LLVM IR cannot represent all the information of types, vatriables, functions (like sizes, offsets, alignment, linkage semantics etc). So, if we could generate some generic IR with annotations, and run a pass before validation that would convert all those annotations into another, lower, IR, coding front-ends would be much simpler. That would also allow back-ends to understand those named metadata and possibly generate correct code without the necessity of the final pass, but I gather that some people find it repulsing to have metadata with meaning in IR, so I won't go as far as to suggest that... ;) Some examples below. Don't pay too much attention to the syntax or the contents, I'm just brainstorming... ;===================================; Unions & bitfields ; union U { int a; int b:3; int c:3; char d; } %union.U = type { i32 }, !union; !union = metadata { metadata !U.a, metadata !U.bc, metadata !U.d }; !U.a = metadata { metadata !intID, metadata !"align", i8 4 }; !U.bc = metadata { metadata !U.b, metadata !U.c }; !U.b = metadata { metadata !charID, metadata !"align", i8 4, metadata !"size", i8 3 }; !U.c = metadata { metadata !charID, metadata !"align", i8 4, metadata !"size", i8 3, metadata !"offset", i8 3 }; !U.d = metadata { metadata !charID, metadata !"align", i8 4 }; ;===================================; Linkage information on a function ; extern inline f_() { return "const string"; } // "const string" HAS to be common to ALL comp.units define linkonce_odr i8* @_Z2f_f() nounwind inlinehint, !extern { entry: ret i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0) } !extern = metadata { metadata !"common group", metadata !"_Z2f_f" }; -> so, if inside a function that has metadata "extern", returning the constant string should place the string into a common group, even though it's not declared itself as such. ;===================================; Class size ; struct Base { char a[3]; Base() {} }; ; struct Derived : Base { char b; } %struct.Base = type { [3 x i8] }, !BasePadding; %struct.Derived type { %struct.Base, i8 }, !DerivedPadding; !BasePadding = { metadata !"size", i8 1 }; !DerivedPadding = { metadata !"size", i8 3 }; So, Base's padding is only applied when inside Derived, and GEP can still work on the element directly. Sizes could be relative to WORD size, if one wanted a truly generic IR, but that would raise a lot of questions... Ignore that for now. The final pass would replace all GEPs to those classes, unions, constant returns into the confusing IR we have today. I know each front-end could do that on its own, but if there an interest among other front-end developers (specially C++) to have such feature, we could do a more generic approach, so we could extend support for specific languages without drastically changing the LangRef. (As a matter of fact, is that something we want in the long run?) Would that benefit other languages that cannot be properly represented in IR? OpenCL? Thoughts welcome, even harsh ones. ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Jin Gu Kang
2011-Mar-26 01:09 UTC
[LLVMdev] Named metadata to represent language specific logic
Hi Renato, I was also applying extensible metadata to my project. :)>;===================================>; Unions & bitfields >; union U { int a; int b:3; int c:3; char d; } >%union.U = type { i32 }, !union; >!union = metadata { metadata !U.a, metadata !U.bc, metadata !U.d }; >!U.a = metadata { metadata !intID, metadata !"align", i8 4 }; >!U.bc = metadata { metadata !U.b, metadata !U.c }; >!U.b = metadata { metadata !charID, metadata !"align", i8 4, metadata >!"size", i8 3 }; >!U.c = metadata { metadata !charID, metadata !"align", i8 4, metadata >!"size", i8 3, metadata !"offset", i8 3 }; >!U.d = metadata { metadata !charID, metadata !"align", i8 4 };I think type cannot have Named Metadata on current llvm code. If you will distinguish type with Named Metadata and type without Named Metadata, you will also have to change type system and codes related to it. For example, To distinguish "type { i32 } !union" and "type { i32 }", StructValtype has to be change and then type of value with "type { i32 } !union" is distinguished from type of value with "type { i32 }". This property will give confusion to related codes. I suggest Named Metadata with all of union types as following: %union.U = type { i32 }; !llvm.uniontypes = metadata !{!0} !0 = metadata !{metadata !"union.U", metadata !1, metadata !2, metadata !5 }; !1 = metadata !{ metadata !intID, metadata !"align", i8 4 }; !2 = metadata !{ metadata !3, metadata !4 }; !3 = metadata !{ metadata !charID, metadata !"align", i8 4, metadata !"size", i8 3 }; !4 = metadata !{ metadata !charID, metadata !"align", i8 4, metadata !"size", i8 3, metadata !"offset", i8 3 }; !5 = metadata !{ metadata !charID, metadata !"align", i8 4 }; "!0" will must point to own IR type like first argument (metadata !"union.U"). Other method may be needed to point to own IR type becuase initializer of union type sometimes has temporary type.>;===================================>; Class size >; struct Base { char a[3]; Base() {} }; >; struct Derived : Base { char b; } >%struct.Base = type { [3 x i8] }, !BasePadding; >%struct.Derived type { %struct.Base, i8 }, !DerivedPadding; >!BasePadding = { metadata !"size", i8 1 }; >!DerivedPadding = { metadata !"size", i8 3 };%struct.Base = type { [3 x i8] } %struct.Derived type { %struct.Base, i8 }; !llvm.classtypes = metadata !{!0, !1} !0 = !{ metadata !"struct.Base", metadata !"size", i8 1, and other informations }; !1 = !{ metadata !"struct.Derived type", metadata !"size", i8 3, and other informations }; I agree to use extensible metadata to store informations of high level language without changing the IR. I think it is so hard working for me to change IR becase of a lot of side effets and compatibility. Thanks, Jin-Gu Kang