Tim Armstrong via llvm-dev
2016-Mar-24 16:18 UTC
[llvm-dev] Possible bug with struct types and linking
Hi All, I ran into what I think is a linker bug around structural uniquing of types when trying to make a long-overdue upgrade from LLVM 3.3 to LLVM 3.8.0. Verification can fail after linking together two well-formed modules due to mismatched pointer types. What happens is that both of the types in the source module get mapped to the same type in the destination module. The old behaviour was to map each type to the type in the destination with matching name and structure. This means that the type signatures of functions called from the Dst module can change, causing verification failures, e.g. if a call instruction originally from Dst calls a function in Src with the wrong typed pointer. Dst Module =======%"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } %"struct.impala_udf::AnyVal" = type { i8 } %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } <forward declarations of IdentityTiny and IdentityBool> <functions that call IdentityTiny and IdentityBool> Src Module =======%"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } %"struct.impala_udf::AnyVal" = type { i8 } %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture readonly dereferenceable(2) %arg) #0 { %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* %2 = load i16, i16* %1, align 1 ret i16 %2 } define i16 @IdentityBool(%"struct.impala_udf::BooleanVal"* nocapture readonly dereferenceable(2) %arg) #0 { %1 = bitcast %"struct.impala_udf::BooleanVal"* %arg to i16* %2 = load i16, i16* %1, align 1 ret i16 %2 } Combined Module =============%"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } %"struct.impala_udf::AnyVal" = type { i8 } %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } <functions that call IdentityTiny and IdentityBool with TinyIntVal* and BooleanVal* args> define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture readonly dereferenceable(2) %arg) #0 { %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* %2 = load i16, i16* %1, align 1 ret i16 %2 } ; Function signature changed => define i16 @IdentityBool(%"struct.impala_udf::TinyIntVal"* nocapture readonly dereferenceable(2) %arg) #0 { %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* %2 = load i16, i16* %1, align 1 ret i16 %2 } This seems like a bug to me, I wouldn't expect this scenario to result in malformed modules but it's not really clear what the expected behaviour of the linker is in situations like this. It seems like this hasn't been hit by the LLVM linker since it starts with an empty module and links in all non-empty modules, so all non-opaque struct types in the destination module are already structurally unique. In our case we are starting with a main module then adding in IRBuilder code and code from other modules. I have a fix and I'm happy to put together a patch with regression tests, but I wanted to check that this would be considered useful, given that it doesn't seem to be a common use-case for LLVM and the planned work on typeless pointers will make it irrelevant. - Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160324/c91e020c/attachment.html>
Rafael Espíndola via llvm-dev
2016-Mar-24 17:08 UTC
[llvm-dev] Possible bug with struct types and linking
On 24 March 2016 at 12:18, Tim Armstrong via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi All, > > I ran into what I think is a linker bug around structural uniquing of > types when trying to make a long-overdue upgrade from LLVM 3.3 to LLVM > 3.8.0. Verification can fail after linking together two well-formed modules > due to mismatched pointer types. > > What happens is that both of the types in the source module get mapped to > the same type in the destination module. The old behaviour was to map each > type to the type in the destination with matching name and structure. This > means that the type signatures of functions called from the Dst module can > change, causing verification failures, e.g. if a call instruction originally > from Dst calls a function in Src with the wrong typed pointer. > > Dst Module > =======> %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } > %"struct.impala_udf::AnyVal" = type { i8 } > %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } > > <forward declarations of IdentityTiny and IdentityBool> > > <functions that call IdentityTiny and IdentityBool> > > Src Module > =======> %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } > %"struct.impala_udf::AnyVal" = type { i8 } > %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } > > define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture > readonly dereferenceable(2) %arg) #0 { > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > %2 = load i16, i16* %1, align 1 > ret i16 %2 > } > > define i16 @IdentityBool(%"struct.impala_udf::BooleanVal"* nocapture > readonly dereferenceable(2) %arg) #0 { > %1 = bitcast %"struct.impala_udf::BooleanVal"* %arg to i16* > %2 = load i16, i16* %1, align 1 > ret i16 %2 > } > > Combined Module > =============> %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", i8 } > %"struct.impala_udf::AnyVal" = type { i8 } > %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", i8 } > > <functions that call IdentityTiny and IdentityBool with TinyIntVal* and > BooleanVal* args> > > define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture > readonly dereferenceable(2) %arg) #0 { > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > %2 = load i16, i16* %1, align 1 > ret i16 %2 > } > > ; Function signature changed => > define i16 @IdentityBool(%"struct.impala_udf::TinyIntVal"* nocapture > readonly dereferenceable(2) %arg) #0 { > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > %2 = load i16, i16* %1, align 1 > ret i16 %2 > } > > This seems like a bug to me, I wouldn't expect this scenario to result in > malformed modules but it's not really clear what the expected behaviour of > the linker is in situations like this. > > It seems like this hasn't been hit by the LLVM linker since it starts with > an empty module and links in all non-empty modules, so all non-opaque struct > types in the destination module are already structurally unique. In our case > we are starting with a main module then adding in IRBuilder code and code > from other modules. > > I have a fix and I'm happy to put together a patch with regression tests, > but I wanted to check that this would be considered useful, given that it > doesn't seem to be a common use-case for LLVM and the planned work on > typeless pointers will make it irrelevant.I would be interested in seeing the fix and full testcase. It has been a long time since I looked at the type merging, but I would expect it to work for this case. Cheers, Rafael
Tim Armstrong via llvm-dev
2016-Mar-24 17:36 UTC
[llvm-dev] Possible bug with struct types and linking
Thanks for the quick response Rafael, that's what I hoped to hear - I'll see what I can do! The original repro is ~150000 lines of IR so I'll see if I can reduce it :) - Tim On Thu, Mar 24, 2016 at 10:08 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> On 24 March 2016 at 12:18, Tim Armstrong via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi All, > > > > I ran into what I think is a linker bug around structural uniquing of > > types when trying to make a long-overdue upgrade from LLVM 3.3 to LLVM > > 3.8.0. Verification can fail after linking together two well-formed > modules > > due to mismatched pointer types. > > > > What happens is that both of the types in the source module get mapped to > > the same type in the destination module. The old behaviour was to map > each > > type to the type in the destination with matching name and structure. > This > > means that the type signatures of functions called from the Dst module > can > > change, causing verification failures, e.g. if a call instruction > originally > > from Dst calls a function in Src with the wrong typed pointer. > > > > Dst Module > > =======> > %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > %"struct.impala_udf::AnyVal" = type { i8 } > > %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > > > <forward declarations of IdentityTiny and IdentityBool> > > > > <functions that call IdentityTiny and IdentityBool> > > > > Src Module > > =======> > %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > %"struct.impala_udf::AnyVal" = type { i8 } > > %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > > > define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture > > readonly dereferenceable(2) %arg) #0 { > > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > > %2 = load i16, i16* %1, align 1 > > ret i16 %2 > > } > > > > define i16 @IdentityBool(%"struct.impala_udf::BooleanVal"* nocapture > > readonly dereferenceable(2) %arg) #0 { > > %1 = bitcast %"struct.impala_udf::BooleanVal"* %arg to i16* > > %2 = load i16, i16* %1, align 1 > > ret i16 %2 > > } > > > > Combined Module > > =============> > %"struct.impala_udf::TinyIntVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > %"struct.impala_udf::AnyVal" = type { i8 } > > %"struct.impala_udf::BooleanVal" = type { %"struct.impala_udf::AnyVal", > i8 } > > > > <functions that call IdentityTiny and IdentityBool with TinyIntVal* and > > BooleanVal* args> > > > > define i16 @IdentityTiny(%"struct.impala_udf::TinyIntVal"* nocapture > > readonly dereferenceable(2) %arg) #0 { > > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > > %2 = load i16, i16* %1, align 1 > > ret i16 %2 > > } > > > > ; Function signature changed => > > define i16 @IdentityBool(%"struct.impala_udf::TinyIntVal"* nocapture > > readonly dereferenceable(2) %arg) #0 { > > %1 = bitcast %"struct.impala_udf::TinyIntVal"* %arg to i16* > > %2 = load i16, i16* %1, align 1 > > ret i16 %2 > > } > > > > This seems like a bug to me, I wouldn't expect this scenario to result in > > malformed modules but it's not really clear what the expected behaviour > of > > the linker is in situations like this. > > > > It seems like this hasn't been hit by the LLVM linker since it starts > with > > an empty module and links in all non-empty modules, so all non-opaque > struct > > types in the destination module are already structurally unique. In our > case > > we are starting with a main module then adding in IRBuilder code and code > > from other modules. > > > > I have a fix and I'm happy to put together a patch with regression tests, > > but I wanted to check that this would be considered useful, given that it > > doesn't seem to be a common use-case for LLVM and the planned work on > > typeless pointers will make it irrelevant. > > I would be interested in seeing the fix and full testcase. > > It has been a long time since I looked at the type merging, but I > would expect it to work for this case. > > Cheers, > Rafael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160324/b12ee17d/attachment-0001.html>