Chris Lattner
2011-Dec-16 22:45 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Dec 16, 2011, at 2:41 PM, John Criswell wrote:> On 12/16/11 4:14 PM, Chris Lattner wrote: >> >> On Dec 16, 2011, at 12:39 PM, Kostya Serebryany wrote: >>> > Do we consider the above transformation legal? >> >> Yes, the transformation is perfectly legal for the normal compiler. > > So how do you guarantee that the behavior is predictable regardless of hardware platform if you don't define what the behavior should be?I'm not sure what you mean. What isn't defined?>>> > I would argue that it should not be legal. We don't actually know what >>> > comes after the 22 byte object. Is it another memory object? A >>> > memory-mapped I/O device? Unmapped memory? Padded junk space? Reading >>> > memory-mapped I/O could have nasty side effects, and accessing unmapped >>> > memory could cause the program to fault even though it was written correctly >>> > as the source-language level. >> >> Device memory accesses need to be done with volatile. This can't cause a paging problem (e.g. causing an additional page fault where none existed before) on systems that use power-of-two sized pages. > > I think people are misunderstanding my point about I/O memory. I wasn't saying that the alloca is supposed to access I/O memory; I was saying that it is possible for I/O memory to be located contiguously after the memory object should the memory object be the last object on its memory page.There is no way for this transformation to introduce a page spanning load.> Now, after thinking about it, I realize why that can't happen if the memory is aligned to a 16-byte boundary on most architectures. However, does load-widening actually check that the memory is 16-byte aligned?Yes.> What if you have a funky architecture that someone is porting LLVM to, or someone is using x86-32 segments in an interesting way?We'll burn that bridge when we get to it ;-)> Moreover, I don't really understand the rationale for allowing a transform to introduce undefined behavior into programs that exhibit no undefined behavior.There is no undefined behavior here. This is exactly analogous to the code you get for bitfield accesses. If you have an uninitialized struct and start storing into its fields (to initialize it) you get a series of "load + mask + or + store" operations. These are loading and touching "undefined" bits in a completely defined way. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111216/7dae1cfb/attachment.html>
John Criswell
2011-Dec-16 23:29 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On 12/16/11 4:45 PM, Chris Lattner wrote:> > On Dec 16, 2011, at 2:41 PM, John Criswell wrote: > >> On 12/16/11 4:14 PM, Chris Lattner wrote: >>> On Dec 16, 2011, at 12:39 PM, Kostya Serebryany wrote: >>>> >>>> > Do we consider the above transformation legal? >>>> >>> >>> Yes, the transformation is perfectly legal for the normal compiler. >> >> So how do you guarantee that the behavior is predictable regardless >> of hardware platform if you don't define what the behavior should be? > > I'm not sure what you mean. What isn't defined?The alloca in question allocates 22 bytes. The 64-bit load in Kostya's original email is accessing two additional bytes past the end of the alloca (i.e., it is accessing array "elements" a[22] and a[23]). Accessing that memory with a read or write is undefined behavior. The program could fault, read zeros, read arbitrary bit patterns, etc. In other words, the compiler is transforming this: return a[16] + a[21]; into something like this: unsigned long * p = &(a[16]); unsigned long v = *p; // This accesses memory locations a[22] and a[23]; doing so is undefined behavior (do some bit fiddling to extra a[16] and a[17] from v) The original code is memory safe and exhibits defined behavior. You can do whatever crazy, semantics-preserving optimization you want, run it on any crazy architecture you want, and it'll always exhibit the same behavior. The optimized code exhibits undefined behavior. On most systems, it just reads garbage data that is ignored by the compiler, but that's really just a side-effect of how most OS's and architectures do things. If you do some crazy transforms or run on some obscure architecture, the optimized code may break.> > [snip] >> What if you have a funky architecture that someone is porting LLVM >> to, or someone is using x86-32 segments in an interesting way? > > We'll burn that bridge when we get to it ;-)ASAN got burnt; SAFECode probably got burnt, too. If we work around it, some poor researcher or developer may get burnt by it, too, and spend some time figuring out why his correct-looking program is not acting properly. In other words, you're burning someone else's bridge. Granted, perhaps the benefits of an incorrect optimization may outweigh the loss of using LLVM on novel systems, but are you sure that making the optimization work properly is going to be so detrimental?> >> Moreover, I don't really understand the rationale for allowing a >> transform to introduce undefined behavior into programs that exhibit >> no undefined behavior. > > There is no undefined behavior here. This is exactly analogous to the > code you get for bitfield accesses. If you have an uninitialized > struct and start storing into its fields (to initialize it) you get a > series of "load + mask + or + store" operations. These are loading > and touching "undefined" bits in a completely defined way.I'll agree that they're both undefined behavior, but I don't think they fall within the same category. The bit-mask initializing issue is a compromise you made because there either isn't an alternative way to do it that has defined behavior, or such an alternative is too expensive and difficult to implement and/or use. This appears to be a different case. Fixing the optimization looks simple enough to me (am I missing something?), and I'm not convinced that fixing it would hurt performance (although since I haven't run an experiment, that is conjecture). So, perhaps I should ask this: if someone took the time to fix the transform so that it checks both the alignment *and* the allocation size and measures the resulting performance change, how much would performance need to suffer before the cure was deemed worse than the disease? -- John T.> > -Chris >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111216/2a86984d/attachment.html>
Chris Lattner
2011-Dec-17 01:56 UTC
[LLVMdev] load widening conflicts with AddressSanitizer
On Dec 16, 2011, at 3:29 PM, John Criswell <criswell at illinois.edu> wrote:> On 12/16/11 4:45 PM, Chris Lattner wrote: >> >> >> On Dec 16, 2011, at 2:41 PM, John Criswell wrote: >> >>> On 12/16/11 4:14 PM, Chris Lattner wrote: >>>> >>>> On Dec 16, 2011, at 12:39 PM, Kostya Serebryany wrote: >>>>> > Do we consider the above transformation legal? >>>> >>>> Yes, the transformation is perfectly legal for the normal compiler. >>> >>> So how do you guarantee that the behavior is predictable regardless of hardware platform if you don't define what the behavior should be? >> >> I'm not sure what you mean. What isn't defined? > > The alloca in question allocates 22 bytes. The 64-bit load in Kostya's original email is accessing two additional bytes past the end of the alloca (i.e., it is accessing array "elements" a[22] and a[23]). Accessing that memory with a read or write is undefined behavior. The program could fault, read zeros, read arbitrary bit patterns, etc.John, I think that you are missing that these operations are fully defined by LLVM IR. I'm not sure what languages rules you are drawing these rules from, but they are not the rules of IR. Doing this inside a compiler (the way we do) also is not invalid according to the C notions of undefined behavior, as it has the "as if" rule. I agree that doing this at the source level would be invalid. Again, I'm not opposed to having a way to disable these transformations, we just need a clean way to express it. -Chris> In other words, the compiler is transforming this: > > return a[16] + a[21]; > > into something like this: > > unsigned long * p = &(a[16]); > unsigned long v = *p; // This accesses memory locations a[22] and a[23]; doing so is undefined behavior > (do some bit fiddling to extra a[16] and a[17] from v) > > The original code is memory safe and exhibits defined behavior. You can do whatever crazy, semantics-preserving optimization you want, run it on any crazy architecture you want, and it'll always exhibit the same behavior. > > The optimized code exhibits undefined behavior. On most systems, it just reads garbage data that is ignored by the compiler, but that's really just a side-effect of how most OS's and architectures do things. If you do some crazy transforms or run on some obscure architecture, the optimized code may break. > >> >> [snip] >>> What if you have a funky architecture that someone is porting LLVM to, or someone is using x86-32 segments in an interesting way? >> >> We'll burn that bridge when we get to it ;-) > > ASAN got burnt; SAFECode probably got burnt, too. If we work around it, some poor researcher or developer may get burnt by it, too, and spend some time figuring out why his correct-looking program is not acting properly. In other words, you're burning someone else's bridge. > > Granted, perhaps the benefits of an incorrect optimization may outweigh the loss of using LLVM on novel systems, but are you sure that making the optimization work properly is going to be so detrimental? > >> >>> Moreover, I don't really understand the rationale for allowing a transform to introduce undefined behavior into programs that exhibit no undefined behavior. >> >> There is no undefined behavior here. This is exactly analogous to the code you get for bitfield accesses. If you have an uninitialized struct and start storing into its fields (to initialize it) you get a series of "load + mask + or + store" operations. These are loading and touching "undefined" bits in a completely defined way. > > I'll agree that they're both undefined behavior, but I don't think they fall within the same category. The bit-mask initializing issue is a compromise you made because there either isn't an alternative way to do it that has defined behavior, or such an alternative is too expensive and difficult to implement and/or use. > > This appears to be a different case. Fixing the optimization looks simple enough to me (am I missing something?), and I'm not convinced that fixing it would hurt performance (although since I haven't run an experiment, that is conjecture). > > So, perhaps I should ask this: if someone took the time to fix the transform so that it checks both the alignment *and* the allocation size and measures the resulting performance change, how much would performance need to suffer before the cure was deemed worse than the disease? > > -- John T. > > > > >> >> -Chris >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111216/ae0d6b9c/attachment.html>
Reasonably Related Threads
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer
- [LLVMdev] load widening conflicts with AddressSanitizer