Daniel Berlin
2007-Nov-15 17:56 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
On 11/15/07, Duncan Sands <baldrick at free.fr> wrote:> Hi, > > > Sadly, this will break a very common idiom. In GCC, we discovered it > > to be common enough that it broke a *bunch* of C code. > > > > In particular, you will break > > > > struct foo { > > int a; > > char name[0]; > > } > > > > bar = malloc(sizeof (struct foo) + strlen("thisismyname") + 1); > > strcpy(bar->name, "thisismyname"); > > > > > > It only started turning up when we started doing higher level loop > > opts and used alias info in dependence testing. It would end up > > reversing or interchanging loops around these things which while > > legal, broke enough software that we got yelled at. > > > > So we special case the [0] at end of struct case. > > as noted in LangRef, > > "Note that 'variable sized arrays' can be implemented in LLVM with a zero > length array. Normally, accesses past the end of an array are undefined in > LLVM (e.g. it is illegal to access the 5th element of a 3 element array). As > a special case, however, zero length arrays are recognized to be variable > length. This allows implementation of 'pascal style arrays' with the LLVM > type "{ i32, [0 x float]}", for example." > > so this example should work fine (it wouldn't work if it was char name[1] > though). >Then the original reported code is fine, and the bug is in llvm or llvm-gc (IE Owen is wrong) Note: struct device; struct usb_bus { struct device *controller; }; struct usb_hcd { struct usb_bus self; unsigned long hcd_priv[0]; }; ...> Ciao, > > Duncan. >
Owen Anderson
2007-Nov-15 19:49 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
On Nov 15, 2007, at 11:56 AM, Daniel Berlin wrote:> Then the original reported code is fine, and the bug is in llvm or > llvm-gc (IE Owen is wrong) > Note: > > > struct device; > > struct usb_bus { > struct device *controller; > }; > > struct usb_hcd { > struct usb_bus self; > > unsigned long hcd_priv[0]; > };You'll notice that the LLVM assembly example he gave is NOT equivalent to the C code. The LLVM assembly uses arrays of declared sized, whereas a correct translation of the above C code into LLVM assembly would not. --Owen -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2555 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071115/d5bd9d85/attachment.bin>
Wojciech Matyjewicz
2007-Nov-15 20:17 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
Hi, Daniel Berlin wrote:> Then the original reported code is fine, and the bug is in llvm or > llvm-gc (IE Owen is wrong)There is, actually, no problem with this example. I attached it, because it contains some specific programming technique, for which, after instcombining, a weird GEP is generated. I've pasted fragments of generated assembly code below, if someone is interested. These are the types declared in the code: %struct.device = type opaque ; simplified %struct.ehci_hcd = type opaque ; simplified %struct.usb_bus = type { %struct.device* } %struct.usb_hcd = type { %struct.usb_bus, [0 x i32] } And this is are the interesting instructions. %hcd is a function argument of type %struct.usb_hcd: -= before instcombine =- ; based on some facts it is known the second field of ; structure pointed by %hcd is of type %struct.ehci_hcd %tmp9 = getelementptr %struct.usb_hcd* %hcd, i32 0, i32 1 %tmp910 = bitcast [0 x i32]* %tmp9 to %struct.ehci_hcd* ; later in the source, a pointer to the parent struct is obtained ; from %tmp910 using inner field's offset knowledge ; (__builtin_offsetof operator in the C source) %tmp1415 = bitcast %struct.ehci_hcd* %tmp910 to [0 x i32]* %tmp1617 = bitcast [0 x i32]* %tmp1415 to i8* %tmp18 = getelementptr i8* %tmp1617, i32 -4 %tmp1819 = bitcast i8* %tmp18 to %struct.usb_hcd* -= after instcombine =- %tmp18 = getelementptr %struct.usb_hcd* %hcd, i32 0, i32 1, i32 -1 %tmp1819 = bitcast i32* %tmp18 to %struct.usb_hcd* It is an example of GEP instruction that, on purpose, crosses the bounds of an inner field to reach a field from the outer structure. It seems to be correct, assuming negative index is also allowed for a variable-sized array. BasicAA is correct for in this case, as it seems to treat conservatively any indexing of a variable-sized array. --Wojtek
Owen Anderson
2007-Nov-15 20:41 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
On Nov 15, 2007, at 2:17 PM, Wojciech Matyjewicz wrote:> BasicAA is correct for in this case, as it seems to treat > conservatively any indexing of a variable-sized array.Right, that's the key distinction. For a VLA, you don't know where it might end (without more complex reasoning), so you can't assume that a given GEP is undefined. However, for arrays with declared length, it is easy to tell when you're doing something undefined by GEPing past the end. This latter case is the optimization opportunity. --Owen -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2555 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071115/4d851db6/attachment.bin>
Apparently Analagous Threads
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] Deleting Instructions after Intrinsic Creation
- [LLVMdev] Deleting Instructions after Intrinsic Creation