Hi folks, I have a few questions I was saving for later and never got around to ask them, so I'll send a few emails to the list, one with each question, to ease the further discussions that may come from them... The first question is: According to the language reference, LLVM IR is type safe. It means, for instance, that you won't be able to perform ADD operations in two different types or call functions with the wrong arguments, etc. But, when declaring two types that happen to (supposedly) have the same layout, LLVM ignores the second type and use the first's name instead. In one module, it doesn't matter, but once you join different modules with, possibly, different data layouts, the data types are not the same any more. Is this a declaration that you will never be able (with an error message, assert or whatever) to join two IRs with different data layouts? Or it was never thought that you could mix them? In my view, that is the precise reason why we have the data layout. Unions can't rely on them (why we don't have unions any more) and compiler data (RTTI, VT, VTT, etc) are all statically created with the correct size. -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Renato Golin wrote:> Hi folks, > > I have a few questions I was saving for later and never got around to > ask them, so I'll send a few emails to the list, one with each > question, to ease the further discussions that may come from them... > > The first question is: > > According to the language reference, LLVM IR is type safe. It means, > for instance, that you won't be able to perform ADD operations in two > different types or call functions with the wrong arguments, etc. >First, this is only partially correct. LLVM IR is typed, and most operations are type-safe. However, LLVM can represent type-unsafe code through at least the following: 1) LLVM has a cast instruction (and cast constant expression) that can cast one type to another. It's possible to take a float, cast it to an int, and add it to another int. 2) LLVM does not require garbage collection or region-based memory management. You can get implicit casting of values if you dereference a dangling pointer. 3) LLVM does not prevent a function from returning a pointer to stack-allocated memory. Dangling pointers to stack-allocated objects is possible. That said, you can generate type-safe LLVM IR, and if you force your front-end to generate IR with certain restrictions, you can probably prove that it is type-safe.> But, when declaring two types that happen to (supposedly) have the > same layout, LLVM ignores the second type and use the first's name > instead. > > In one module, it doesn't matter, but once you join different modules > with, possibly, different data layouts, the data types are not the > same any more. > > Is this a declaration that you will never be able (with an error > message, assert or whatever) to join two IRs with different data > layouts? Or it was never thought that you could mix them? >I think linking two LLVM bitcode files with different data layouts would be hard (especially given different endians); I think LLVM 2.7 prints a warning when data layout doesn't match. However, I'll let people more knowledgeable of LLVM data layout answer this part of your question. -- John T.> In my view, that is the precise reason why we have the data layout. > Unions can't rely on them (why we don't have unions any more) and > compiler data (RTTI, VT, VTT, etc) are all statically created with the > correct size. > > >
On Sep 21, 2010, at 9:26 AM, Renato Golin wrote:> But, when declaring two types that happen to (supposedly) have the > same layout, LLVM ignores the second type and use the first's name > instead. > > In one module, it doesn't matter, but once you join different modules > with, possibly, different data layouts, the data types are not the > same any more.Try linking following two modules using llvm-ld and see what happens. --- x.ll --- %struct.x = type { i32, i32 } %struct.y = type { i32, i32 } @p = common global %struct.x zeroinitializer, align 4 @p2 = common global %struct.x zeroinitializer, align 4 --- --- y.ll --- %struct.x = type { i32, i32, i32 } @p3 = common global %struct.x zeroinitializer, align 4 --- In the combined llvm IR, @p3 and @p won't match as expected. - Devang
On 21 September 2010 17:40, John Criswell <criswell at illinois.edu> wrote:> That said, you can generate type-safe LLVM IR, and if you force your > front-end to generate IR with certain restrictions, you can probably prove > that it is type-safe.Indeed, I was referring to that kind of type safety... ;) -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
On 21 September 2010 17:48, Devang Patel <dpatel at apple.com> wrote:> In the combined llvm IR, @p3 and @p won't match as expected.Hi Devang, That's not quite what I was thinking... Maybe I explained badly... Imagine this: -- a.ll -- %struct.x = type { i32, i32 } %a = call void @func (%struct.x %b) -- b.ll -- %struct.y = type { i32, i32 } declare i32 @func (%struct.y) Now, imagine that X and Y are completely different structures, they don't reflect the same type in the code, but in the IR it got flattened out, so the modules can't distinguish between X or Y. If I distribute IR (with the same data layout, target triple, etc), and you try to link against it, it will allow you to put apples in place of bananas... Does it make sense? -- cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm