Jonathan S. Shapiro
2008-May-01 01:35 UTC
[LLVMdev] optimization assumes malloc return is non-null
On Wed, 2008-04-30 at 18:26 -0400, David Vandevoorde wrote:> > Daveed: > > > > Good to know that I was looking at the correct section. I do not agree > > that your interpretation follows the as-if rule, because I do not > > agree > > with your interpretation of the C library specification of malloc(). > > > Before I go on, let me state that this is not a contentious issue > among WG14: There is no doubt that the intent of the standard is that > this be a valid optimization.And of course, ANSI C working groups are infallible. Consider "noalias". :-) But I think I am starting to understand your argument, and if so I concur that the optimization is correct in this case. Let me see ask some further questions just to confirm. Let me attempt to replay what I believe is happening here: 1. There is an observably linear chain of calls to malloc() in the example, and because of the structure of the example, the compiler can see that there are no *unobserved* calls to malloc() in the program. 2. After all inlining of malloc() has proceeded, the compiler is free to observe that all calls succeed by partial evaluation (thus the non-zero return value) and that after this is done there is no witness to the internal heap limit variable. 3. Progressive application of dead variable elimination will then delete the entire chain of updates to the heap limit variable, eliminating the side effects that would normally be induced by malloc(). They will also delete the uncaptured pointer result from malloc(). 4. Since the result pointer is not captured, and the heap limit variable is not consulted, the entire result of the call to malloc() is the return of a non-NULL pointer. 5. Constant folding then reduces the IF condition (malloc() == NULL) to false. 6. Dead code elimination then reduces the if/then to simply the then body. 7. Optimizer now reduces the whole thing to "return 1". Is this more or less the chain of events here (or if not this, then something of a similar nature)? If so, I understand and concur that this optimization is valid in this example, primarily due to perverse and non-obvious consequences of the particular bit of sample code. Just to cement my understanding, I suspect that *either* of the following two changes to the sample code would make this optimization invalid: 1. Rename main() to f() 2. Introduce a call to some external procedure anywhere in main(). In either case, the compiler can no longer witness all reachable calls to malloc(), and must assume conservatively that the internally produced side effect results of malloc() may be sourced by other units of compilation, and therefore cannot be elided. Is this correct? Hmm. No. I can weasel out of this in case (2), because the standard does not require that calls to malloc() be processed in order, so in some cases calls to malloc() appearing in main() can still be eliminated by logically rearranging them to occur before any of the external calls to malloc(). I bet LLVM doesn't know about this. :-)> (I should have been more careful in my earlier response and clarify > that modifying a nonvolatile object is a side-effect that can be > optimized away in many cases.)If I understand the standard correctly, this can be done exactly if there is no witness to the side effect. Furthermore, chains of intermediate unwitnessed side effects can be collapsed if the optimizer can prove that the intermediate results are unwitnessed. Yes?> ...However, the optimization is allowed even without that: Just by noting > that the return value of malloc isn't used (nor leaked); I suspect > that's what the compiler does in this case (though I don't know).I think that is not quite strong enough. If there is any possibility of preceding or succeeding calls to malloc() in compilation units that are not visible at compile time, then the compiler cannot determine that the current call to malloc() must succeed or that it's side effects are unwitnessed. I suppose that the compiler can do special case things in main on the theory that these calls to malloc() may be executed out of order by a conforming implementation, but I suspect that goes well beyond the cleverness of the current LLVM implementation. It's the sort of thing that the VAX/VMS optimizer or the IBM PL.8 research optimizer might actually have done. And certainly the Bulldog optimizer.> > Setting the matter of the standard entirely aside, the currently > > implemented behavior deviates so blatantly from common sense and > > universal convention that it really must be viewed as a bug.Just to be explicit: I now see that my statement above is wrong in the present example. Daveed: Thanks for being patient on this. I appreciate your willingness to help me see this one. shap
David Vandevoorde
2008-May-01 13:57 UTC
[LLVMdev] optimization assumes malloc return is non-null
On Apr 30, 2008, at 9:35 PM, Jonathan S. Shapiro wrote:> On Wed, 2008-04-30 at 18:26 -0400, David Vandevoorde wrote: >>> Daveed: >>> >>> Good to know that I was looking at the correct section. I do not >>> agree >>> that your interpretation follows the as-if rule, because I do not >>> agree >>> with your interpretation of the C library specification of malloc(). >> >> >> Before I go on, let me state that this is not a contentious issue >> among WG14: There is no doubt that the intent of the standard is that >> this be a valid optimization. > > And of course, ANSI C working groups are infallible. Consider > "noalias". :-)Ah, but with the help of DMR, noalias was avoided and infallibility maintained. (Kidding.) Seriously, I don't think this particular item is a committee booboo, but I didn't mean to imply it cannot be. I only intended to note that the intent is that such optimization be allowed (and that to the best of my knowledge the words reflect that).> But I think I am starting to understand your argument, and if so I > concur that the optimization is correct in this case. Let me see ask > some further questions just to confirm. > > > Let me attempt to replay what I believe is happening here: > > 1. There is an observably linear chain of calls to malloc() in the > example, and because of the structure of the example, the compiler can > see that there are no *unobserved* calls to malloc() in the program.(You meant "no *observed*", right?)> 2. After all inlining of malloc() has proceeded, the compiler is > free to > observe that all calls succeed by partial evaluation (thus the non- > zero > return value) and that after this is done there is no witness to the > internal heap limit variable. > > 3. Progressive application of dead variable elimination will then > delete > the entire chain of updates to the heap limit variable, eliminating > the > side effects that would normally be induced by malloc(). They will > also > delete the uncaptured pointer result from malloc(). > > 4. Since the result pointer is not captured, and the heap limit > variable > is not consulted, the entire result of the call to malloc() is the > return of a non-NULL pointer. > > 5. Constant folding then reduces the IF condition (malloc() == NULL) > to > false. > > 6. Dead code elimination then reduces the if/then to simply the then > body. > > 7. Optimizer now reduces the whole thing to "return 1". > > > Is this more or less the chain of events here (or if not this, then > something of a similar nature)?I think it's a valid set of transformations, but I don't think it's the only approach that can lead to this result. I was thinking that a compiler could be aware of the semantics of malloc: It wouldn't necessarily "inline" the call, but if it could prove that the result was not leaked or "used" (indirected), then a non-heap address could be produced. Alternatively, the malloc call could be promoted to alloca (as Chris noted in his reply, only if a small bound of allocation was known; but clearly the case here), and then it becomes akin to a normal dead local variable thing.> > > If so, I understand and concur that this optimization is valid in this > example, primarily due to perverse and non-obvious consequences of the > particular bit of sample code. > > Just to cement my understanding, I suspect that *either* of the > following two changes to the sample code would make this optimization > invalid: > > 1. Rename main() to f() > > 2. Introduce a call to some external procedure anywhere in main(). > > In either case, the compiler can no longer witness all reachable calls > to malloc(), and must assume conservatively that the internally > produced > side effect results of malloc() may be sourced by other units of > compilation, and therefore cannot be elided. > > Is this correct?I'm of the opinion that even with such changes the optimization is valid. Let me rename main to f (and drop the unused parameters) and add calls to a function g. The original example is then: #include <stdlib.h> int f() { g(); if (malloc(sizeof(int)) == NULL) { return 0; } else { return 1; } g(); } The malloc value cannot escape and the allocated storage isn't used (in other words, it can be "garbage collected" right away). So it can be replaced by a unique address (though the same address can be used for each invocation, IMO). Note that within the confines of the C (or C++) standard, g has no way of observing the result of the call to malloc. (We might "know" that there is a static-lifetime variable somewhere that changed, but that variable isn't required by the standard, and a compiler may instead posit that for this very call to malloc, a separate heap is used.) There is (at least) a subtlety with that reasoning in C having to do with "lifetime": The allocated _object_ has a lifetime (7.20.3/1) that extends until deallocation. My view is that the storage only becomes an object when the pointer returned is bound to a typed pointer (7.20.3/1). If that's right, every instance of the particular malloc call can produce the same value; if not, we'd need to be able to bound the number instances of that call. (C++ has a convenient "conformance within resource limits" clause that provides a different way out.)> Hmm. No. I can weasel out of this in case (2), because the standard > does > not require that calls to malloc() be processed in order, so in some > cases calls to malloc() appearing in main() can still be eliminated by > logically rearranging them to occur before any of the external calls > to > malloc(). I bet LLVM doesn't know about this. :-) > >> (I should have been more careful in my earlier response and clarify >> that modifying a nonvolatile object is a side-effect that can be >> optimized away in many cases.) > > If I understand the standard correctly, this can be done exactly if > there is no witness to the side effect. Furthermore, chains of > intermediate unwitnessed side effects can be collapsed if the > optimizer > can prove that the intermediate results are unwitnessed. Yes?Yes, I think so.>> ...However, the optimization is allowed even without that: Just by >> noting >> that the return value of malloc isn't used (nor leaked); I suspect >> that's what the compiler does in this case (though I don't know). > > I think that is not quite strong enough. If there is any > possibility of > preceding or succeeding calls to malloc() in compilation units that > are > not visible at compile time, then the compiler cannot determine that > the > current call to malloc() must succeed or that it's side effects are > unwitnessed.See above. If the allocation is small enough that it can be modelled by a "reserved heap", I think it can.> I suppose that the compiler can do special case things in main on the > theory that these calls to malloc() may be executed out of order by a > conforming implementation, but I suspect that goes well beyond the > cleverness of the current LLVM implementation. It's the sort of thing > that the VAX/VMS optimizer or the IBM PL.8 research optimizer might > actually have done. And certainly the Bulldog optimizer.(I should probably mention that I'm no optimizer pro. So my reasoning is based on language specs.)>>> Setting the matter of the standard entirely aside, the currently >>> implemented behavior deviates so blatantly from common sense and >>> universal convention that it really must be viewed as a bug. > > Just to be explicit: I now see that my statement above is wrong in the > present example. > > Daveed: Thanks for being patient on this. I appreciate your > willingness > to help me see this one.No problem at all. I should probably worry that I find ISO exegesis mildly fun :-P Daveed
Jonathan S. Shapiro
2008-May-01 15:01 UTC
[LLVMdev] optimization assumes malloc return is non-null
On Thu, 2008-05-01 at 09:57 -0400, David Vandevoorde wrote:> > 1. There is an observably linear chain of calls to malloc() in the > > example, and because of the structure of the example, the compiler can > > see that there are no *unobserved* calls to malloc() in the program. > > (You meant "no *observed*", right?)I think I mean "unobserved". The requirement here seems to be that the compiler has to know that no call to malloc() exists someplace else in the overall program that might rely on the side effects of a malloc() call performed in the current compilation unit. Unless the compiler can determine this, or can show that no such side effects exist, then the internal side effects are not provably unwitnessed. If I am reading the standard correctly, merely having an unused result is insufficient to permit deleting the call. Any side effects of the call that are witnessed must still be performed. It is clearly legal to perform them directly rather than by calling malloc() if the malloc() result is unused, but the side effects of the malloc() call cannot simply disappear if later code may exist that observes those effects. But see below, because I think your "malloc from alternate heap" argument is also relevant.> > Just to cement my understanding, I suspect that *either* of the > > following two changes to the sample code would make this optimization > > invalid: > > > > 1. Rename main() to f() > > > > 2. Introduce a call to some external procedure anywhere in main(). > > > > In either case, the compiler can no longer witness all reachable calls > > to malloc(), and must assume conservatively that the internally > > produced > > side effect results of malloc() may be sourced by other units of > > compilation, and therefore cannot be elided. > > > > Is this correct? > > > I'm of the opinion that even with such changes the optimization is > valid. Let me rename main to f (and drop the unused parameters) and > add calls to a function g. The original example is then: > > #include <stdlib.h> > > int f() { > g(); > if (malloc(sizeof(int)) == NULL) { return 0; } else { return 1; } > g(); > } > > The malloc value cannot escape and the allocated storage isn't used > (in other words, it can be "garbage collected" right away). So it can > be replaced by a unique address (though the same address can be used > for each invocation, IMO). > > Note that within the confines of the C (or C++) standard, g has no way > of observing the result of the call to malloc. (We might "know" that > there is a static-lifetime variable somewhere that changed, but that > variable isn't required by the standard, and a compiler may instead > posit that for this very call to malloc, a separate heap is used.)I agree that g() cannot observe the malloc return value, but g() *can* observe the side effect on the heap bound pointer. It is not sufficient under the standard for the compiler to assume that g() **might not** do so. It is required that the compiler demonstrate that g() **does not in fact** do so. Or are you arguing that since malloc() is a built-in procedure, the compiler is entitled to substitute an alternate implementation of malloc() when it observes that the return pointer is not captured (and therefore cannot be handed back to free() or realloc()?) and that once it does this, it is entitled to reason about this call to malloc() without regard to any other call to malloc()? This would effectively constitute built-in knowledge of a two-heap implementation, and I concur that that would be within bounds according to the standard.> My view is that the storage only becomes > an object when the pointer returned is bound to a typed pointer > (7.20.3/1).I see where you are going, but I might be careful about this one. Binding to a void* pointer is sufficient to be able to observe pointer non-uniqueness, because it is legal to compare void * values for equality. In light of which, I am inclined to think that the returned value has to be unique if it is captured, without regard to the type of the capturing pointer.> See above. If the allocation is small enough that it can be modelled > by a "reserved heap", I think it can.Both small enough and partially evaluable at compile time, I should think. This would preclude certain non-freed return results from malloc() within loops, for example. But basically I agree.> > Daveed: Thanks for being patient on this. I appreciate your > > willingness > > to help me see this one. > > No problem at all. I should probably worry that I find ISO exegesis > mildly fun :-PNot to worry. If the bottom falls out of the software market you have a promising second career opportunity as an orthodox rabbi. :-) Thanks again. shap
Possibly Parallel Threads
- [LLVMdev] optimization assumes malloc return is non-null
- [LLVMdev] optimization assumes malloc return is non-null
- [LLVMdev] optimization assumes malloc return is non-null
- [LLVMdev] optimization assumes malloc return is non-null
- [LLVMdev] optimization assumes malloc return is non-null