Daniel Berlin
2007-Nov-15 07:36 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
Sadly, this will break a very common idiom. In GCC, we discovered it to be common enough that it broke a *bunch* of C code. In particular, you will break struct foo { int a; char name[0]; } bar = malloc(sizeof (struct foo) + strlen("thisismyname") + 1); strcpy(bar->name, "thisismyname"); It only started turning up when we started doing higher level loop opts and used alias info in dependence testing. It would end up reversing or interchanging loops around these things which while legal, broke enough software that we got yelled at. So we special case the [0] at end of struct case. On 11/13/07, Owen Anderson <resistor at mac.com> wrote:> It's an optimization opportunity! > > When behavior is undefined, we're free to interpret it to be "whatever > makes optimization easiest." If the two do actually happen to alias, > well, it's the programmer's fault anyways, because they were doing > something undefined! > > --Owen > > On Nov 13, 2007, at 4:13 PM, Wojciech Matyjewicz wrote: > > > Hi! > > > > While investigating into the PR1782 I spent some time analyzing > > BasicAliasAnalysis.cpp. While the mentioned problem should be fixed > > now > > (I hope), I have discovered some other possibilities for a bug to > > occur. > > > > In the case of checking for aliasing of two pointer values, where at > > least one of them is a GEP instruction with out-of-bound indices, > > BasicAliasAnalysis can return NoAlias, even if the values point the > > same > > memory cell. I'm not sure if it's an error, as out-of-bound indexing > > can > > bring undefined result (as stated in LangRef.html). > > > > Please, take a look at this code (BasicAA results are in comments). > > --- > > target datalayout > > "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32- > > f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32" > > > > %ty1 = type [2 x [2 x i8]] > > %ty2 = type [2 x {i8, [2 x i8]}] > > > > @G = external global %ty1 > > @H = external global %ty2 > > > > ; alias(%a, %b, 1, 1) returns NoAlias, > > ; although they point to the same memory cell > > define void @fun1() { > > entry: > > %a = getelementptr %ty1* @G, i32 0, i32 1, i32 -2 > > %b = bitcast %ty1* @G to i8* > > ret void > > } > > > > ; alias(%c, %d, 1, 1) returns NoAlias, > > ; although they may point to the same memory cell (if %x == -2) > > define void @fun2(i32 %x) { > > entry: > > %c = getelementptr %ty1* @G, i32 0, i32 1, i32 %x > > %d = bitcast %ty1* @G to i8* > > ret void > > } > > > > ; alias(%e, %f, 1, 1) returns NoAlias, > > ; although they may point to the same memory cell (if %z == 2) > > define void @fun3(i32 %z) { > > entry: > > %e = getelementptr %ty2* @H, i32 0, i32 1, i32 0 > > %f = getelementptr %ty2* @H, i32 0, i32 0, i32 1, i32 %z > > ret void > > } > > --- > > > > Should we treat this as a bug (I may attempt to prepare a patch > > then)?. > > I suppose these are real corner cases, rather improbable to occur in > > real-world programs. However, something similiar (but no so weird) can > > be produced from the attached C source. It is a reduced version of > > preprocessed ehci-hcd.c file (from Linux kernel; compilation of this > > file discovered bug #1782). > > > > --Wojtek > > struct device; > > > > struct usb_bus { > > struct device *controller; > > }; > > > > struct usb_hcd { > > struct usb_bus self; > > > > unsigned long hcd_priv[0]; > > }; > > > > struct ehci_hcd; > > > > static inline __attribute__((always_inline)) struct ehci_hcd > > *hcd_to_ehci (struct usb_hcd *hcd) > > { > > return (struct ehci_hcd *) (hcd->hcd_priv); > > } > > > > static inline __attribute__((always_inline)) struct usb_hcd > > *ehci_to_hcd (struct ehci_hcd *ehci) > > { > > return ({ const typeof( ((struct usb_hcd *)0)->hcd_priv ) *__mptr > > ((void *) ehci); (struct usb_hcd *)( (char *)__mptr - > > __builtin_offsetof(struct usb_hcd,hcd_priv) );}); > > } > > > > struct device *x; > > > > static int ehci_pci_setup(struct usb_hcd *hcd) > > { > > struct ehci_hcd *ehci = hcd_to_ehci(hcd); > > // [...] > > // store of expression added in order to prevent it from being DCE'ed > > x = ehci_to_hcd(ehci)->self.controller; > > // [...] > > } > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >
Duncan Sands
2007-Nov-15 09:18 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
Hi,> Sadly, this will break a very common idiom. In GCC, we discovered it > to be common enough that it broke a *bunch* of C code. > > In particular, you will break > > struct foo { > int a; > char name[0]; > } > > bar = malloc(sizeof (struct foo) + strlen("thisismyname") + 1); > strcpy(bar->name, "thisismyname"); > > > It only started turning up when we started doing higher level loop > opts and used alias info in dependence testing. It would end up > reversing or interchanging loops around these things which while > legal, broke enough software that we got yelled at. > > So we special case the [0] at end of struct case.as noted in LangRef, "Note that 'variable sized arrays' can be implemented in LLVM with a zero length array. Normally, accesses past the end of an array are undefined in LLVM (e.g. it is illegal to access the 5th element of a 3 element array). As a special case, however, zero length arrays are recognized to be variable length. This allows implementation of 'pascal style arrays' with the LLVM type "{ i32, [0 x float]}", for example." so this example should work fine (it wouldn't work if it was char name[1] though). Ciao, Duncan.
Daniel Berlin
2007-Nov-15 17:56 UTC
[LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
On 11/15/07, Duncan Sands <baldrick at free.fr> wrote:> Hi, > > > Sadly, this will break a very common idiom. In GCC, we discovered it > > to be common enough that it broke a *bunch* of C code. > > > > In particular, you will break > > > > struct foo { > > int a; > > char name[0]; > > } > > > > bar = malloc(sizeof (struct foo) + strlen("thisismyname") + 1); > > strcpy(bar->name, "thisismyname"); > > > > > > It only started turning up when we started doing higher level loop > > opts and used alias info in dependence testing. It would end up > > reversing or interchanging loops around these things which while > > legal, broke enough software that we got yelled at. > > > > So we special case the [0] at end of struct case. > > as noted in LangRef, > > "Note that 'variable sized arrays' can be implemented in LLVM with a zero > length array. Normally, accesses past the end of an array are undefined in > LLVM (e.g. it is illegal to access the 5th element of a 3 element array). As > a special case, however, zero length arrays are recognized to be variable > length. This allows implementation of 'pascal style arrays' with the LLVM > type "{ i32, [0 x float]}", for example." > > so this example should work fine (it wouldn't work if it was char name[1] > though). >Then the original reported code is fine, and the bug is in llvm or llvm-gc (IE Owen is wrong) Note: struct device; struct usb_bus { struct device *controller; }; struct usb_hcd { struct usb_bus self; unsigned long hcd_priv[0]; }; ...> Ciao, > > Duncan. >
Possibly Parallel Threads
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices
- [LLVMdev] BasicAliasAnalysis and out-of-bound GEP indices