Here are some comments. It seems to me there are 2 (mostly separate) aspects: 1. Teaching clang how to do domination / post-domination analysis, so that the lifetime information (alloca/dealloca, or lifetime markers) can be set in the right locations in the IR. Such an analysis should be able to understand the language scopes, as well as labels / gotos and EH. The results of this analysis would be used at IR codegen. Caveat: I have no expertise in this area of Clang, so take everything with a grain of salt and please feel free to correct me. Where should this analysis be run ? Presumably at the beginning of each function’s codegen’s time. This analysis looks a bit special (at least to me), as it will work over the AST, but it also requires a pretty good understanding of LLVM’s IR, which sets it apart from other clang analyses. Maybe another option (some may call it a poor’s man option) would be to enforce at IR codegen time the dominators / post-dominators on the multiple paths (normal & EH control flows) by inserting basic blocks around each statement which is codegened. Those would be fall-thru most of the time, the llvm optimizers can remove them easily. The obvious drawback is that it will insert lots of small or fall-thru BBs. 2. How liveranges are represented in LLVM’s IR. I like the idea of pairing alloca / dealloca (and removing the lifetime markers, at least for stack coloring) and I think it could even ease / improve some analysis. Currently, allocas have to be located in the entry BB in order to get a chance to be promoted to registers by the Mem2Reg pass. Allocas in other BBs are considered to be dynamic. I have no idea how difficult it would be to teach Mem2Reg to consider alloca/dealloca in other basic blocks. With the alloca / dealloca solution, in order to do stack colouring, the alloca must _not_ be in the entry block, because all allocas defined there are alive at the same time and cannot be merged. All LLVM passes would need to be teached that those alloca / dealloca pairs correspond to stack slots --- as the alloca in the entry. The pairing would also have to be preserved across transformations (same as lifetime.start/end). Cheers, Arnaud From: Reid Kleckner [mailto:rnk at google.com] Sent: 04 November 2014 19:35 To: Arnaud De Grandmaison; Nick Lewycky; Rafael Ávila de Espíndola Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] lifetime.start/end clarification Short version: I think Clang needs some analysis capabilities to widen the lifetime. --- I think we need a new approach. These intrinsics were designed to be general between heap and stack, and we don't need that extra generality for the simple problem of stack coloring that we're trying to solve here. See for example the size parameter, which I bet stack coloring doesn't need. If we have a bitcast alloca, we already know the size. Rafael had an idea at the dev meeting that maybe the IR needs a stack deallocation instruction to pair with alloca. Then we could teach LLVM to consider allocas that are post-dominated by a deallocation instruction to be static, and fold them into the entry block. He pointed out that the Swift IL actually has such a construct. It would be the responsibility of the frontend to ensure that each alloca is post-dominated by its "dealloca", so in this example with labels, we'd just have to hoist the allocation to the nearest dominating block, or just give up and go to the function entry block. Similarly the deallocation has to be moved to post-dominate the allocation, to handle cases like: void foo(int x) { if (x > 10) { // alloca y goto lbl; while (x) { int y; lbl: y = bar(); x -= y; } // dealloca y } } This representation would support more aggressive stack coloring. Furthermore, it supports a much more efficient lowering for inalloca, which is why I'm somewhat interested in it. If we don't want to do this, we can do something less drastic and either add new intrinsics or modify the current ones with the same rules proposed above. We'd have the alloca in the entry block, the lifetime start at the first block that dominates all uses, and the deallocation at the first block that post-dominates all that stuff. One other thing to think about is EH. We can often get into a situation where uses of y are statically reachable from cleanup code while being dynamically unreachable. This can happen when cleanups are not simply ordered in a stack-like manner. I think if we can teach clang to do this kind of domination analysis, then we can probably detect this case and give up on it by allocating in the entry block. On Tue, Nov 4, 2014 at 3:59 AM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote: The LRM (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic) essentially states that: - ptr is dead before a call to “lifetime.start size, ptr” - ptr is dead after a call to “lifetime.end size, ptr” This is all good and fine, and the expected use case is that all “lifetime.end size, ptr” markers are matched with a preceding “lifetime.start size, ptr” in the CFG. What is supposed to happen when a “lifetime.end size, ptr” is not matched with a “lifetime.start size, ptr” ? I think there are a few possible answers: - the memory area pointed to by ptr is assumed to be alive since function entry - the memory area pointed to by ptr is assumed to be dead since function entry, as it has not been marked alive - this is an unexpected situation I think this ambiguity should be cleared in the LRM, because today’s implicit assumption may be broken at any time. This is not a theoretical question: clang can generate such cases. For example, the following testcase: struct X { void doSomething(); char b[33]; }; void bar(X &); void baz(); void test(int i) { if (i==9) { X x; x.doSomething(); label: bar(x); } else { baz(); if (i==0) goto label; } } Produces: %struct.X = type { [33 x i8] } define void @_Z4testi(i32 %i) { entry: %x = alloca %struct.X, align 1 %cmp = icmp eq i32 %i, 9 br i1 %cmp, label %if.then, label %if.else if.then: ; preds = %entry %0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 call void @llvm.lifetime.start(i64 33, i8* %0) call void @_ZN1X11doSomethingEv(%struct.X* %x) br label %label label: ; preds = %if.else.label_crit_edge, %if.then %.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ] call void @_Z3barR1X(%struct.X* dereferenceable(33) %x) call void @llvm.lifetime.end(i64 33, i8* %.pre-phi) br label %if.end3 if.else: ; preds = %entry tail call void @_Z3bazv() %cmp1 = icmp eq i32 %i, 0 br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3 if.else.label_crit_edge: ; preds = %if.else %.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 br label %label if.end3: ; preds = %if.else, %label ret void } Note that the path thru if.else.label_crit_edge has no lifetime start. Cheers, -- Arnaud _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/a943b444/attachment.html>
On 11/05/2014 08:47 AM, Arnaud A. de Grandmaison wrote:> > Here are some comments. > > It seems to me there are 2 (mostly separate) aspects: > > 1. Teaching clang how to do domination / post-domination analysis, so > that the lifetime information (alloca/dealloca, or lifetime markers) > can be set in the right locations in the IR. Such an analysis should > be able to understand the language scopes, as well as labels / gotos > and EH. The results of this analysis would be used at IR codegen. > > Caveat: I have no expertise in this area of Clang, so take everything > with a grain of salt and please feel free to correct me. > > Where should this analysis be run ? Presumably at the beginning of > each function’s codegen’s time. > > This analysis looks a bit special (at least to me), as it will work > over the AST, but it also requires a pretty good understanding of > LLVM’s IR, which sets it apart from other clang analyses. > > Maybe another option (some may call it a poor’s man option) would be > to enforce at IR codegen time the dominators / post-dominators on the > multiple paths (normal & EH control flows) by inserting basic blocks > around each statement which is codegened. Those would be fall-thru > most of the time, the llvm optimizers can remove them easily. The > obvious drawback is that it will insert lots of small or fall-thru BBs. > > 2. How liveranges are represented in LLVM’s IR. > > I like the idea of pairing alloca / dealloca (and removing the > lifetime markers, at least for stack coloring) and I think it could > even ease / improve some analysis. > > Currently, allocas have to be located in the entry BB in order to get > a chance to be promoted to registers by the Mem2Reg pass. Allocas in > other BBs are considered to be dynamic. I have no idea how difficult > it would be to teach Mem2Reg to consider alloca/dealloca in other > basic blocks. > > With the alloca / dealloca solution, in order to do stack colouring, > the alloca must _/not/_ be in the entry block, because all allocas > defined there are alive at the same time and cannot be merged. All > LLVM passes would need to be teached that those alloca / dealloca > pairs correspond to stack slots --- as the alloca in the entry. The > pairing would also have to be preserved across transformations (same > as lifetime.start/end). >Would one of you mind taking a step back and explaining what you believe the "stack colouring problem" to be? I'm familiar with the general meaning of the term and even some of LLVM's implementation; I'm just not sure what specific issue you're referring to. Having the context would make it much easier to assess your proposals.> > Cheers, > > Arnaud > > *From:*Reid Kleckner [mailto:rnk at google.com] > *Sent:* 04 November 2014 19:35 > *To:* Arnaud De Grandmaison; Nick Lewycky; Rafael Ávila de Espíndola > *Cc:* LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] lifetime.start/end clarification > > Short version: I think Clang needs some analysis capabilities to widen > the lifetime. > > --- > > I think we need a new approach. These intrinsics were designed to be > general between heap and stack, and we don't need that extra > generality for the simple problem of stack coloring that we're trying > to solve here. See for example the size parameter, which I bet stack > coloring doesn't need. If we have a bitcast alloca, we already know > the size. > > Rafael had an idea at the dev meeting that maybe the IR needs a stack > deallocation instruction to pair with alloca. Then we could teach LLVM > to consider allocas that are post-dominated by a deallocation > instruction to be static, and fold them into the entry block. He > pointed out that the Swift IL actually has such a construct. > > It would be the responsibility of the frontend to ensure that each > alloca is post-dominated by its "dealloca", so in this example with > labels, we'd just have to hoist the allocation to the nearest > dominating block, or just give up and go to the function entry block. > Similarly the deallocation has to be moved to post-dominate the > allocation, to handle cases like: > > void foo(int x) { > > if (x > 10) { > > // alloca y > > goto lbl; > > while (x) { > > int y; > > lbl: > > y = bar(); > > x -= y; > } > > // dealloca y > > } > > } > > This representation would support more aggressive stack coloring. > Furthermore, it supports a much more efficient lowering for inalloca, > which is why I'm somewhat interested in it. > > If we don't want to do this, we can do something less drastic and > either add new intrinsics or modify the current ones with the same > rules proposed above. We'd have the alloca in the entry block, the > lifetime start at the first block that dominates all uses, and the > deallocation at the first block that post-dominates all that stuff. > > One other thing to think about is EH. We can often get into a > situation where uses of y are statically reachable from cleanup code > while being dynamically unreachable. This can happen when cleanups are > not simply ordered in a stack-like manner. I think if we can teach > clang to do this kind of domination analysis, then we can probably > detect this case and give up on it by allocating in the entry block. > > On Tue, Nov 4, 2014 at 3:59 AM, Arnaud A. de Grandmaison > <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> > wrote: > > The LRM > (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic) > essentially states that: > > - ptr is dead before a call to “lifetime.start size, ptr” > > - ptr is dead after a call to “lifetime.end size, ptr” > > This is all good and fine, and the expected use case is that all > “lifetime.end size, ptr” markers are matched with a preceding > “lifetime.start size, ptr” in the CFG. > > What is supposed to happen when a “lifetime.end size, ptr” is not > matched with a “lifetime.start size, ptr” ? I think there are a few > possible answers: > > - the memory area pointed to by ptr is assumed to be alive since > function entry > > - the memory area pointed to by ptr is assumed to be dead since > function entry, as it has not been marked alive > > - this is an unexpected situation > > I think this ambiguity should be cleared in the LRM, because today’s > implicit assumption may be broken at any time. > > This is not a theoretical question: clang can generate such cases. For > example, the following testcase: > > struct X { > > void doSomething(); > > char b[33]; > > }; > > void bar(X &); > > void baz(); > > void test(int i) { > > if (i==9) { > > X x; > > x.doSomething(); > > label: > > bar(x); > > } else { > > baz(); > > if (i==0) > > goto label; > > } > > } > > Produces: > > %struct.X = type { [33 x i8] } > > define void @_Z4testi(i32 %i) { > > entry: > > %x = alloca %struct.X, align 1 > > %cmp = icmp eq i32 %i, 9 > > br i1 %cmp, label %if.then, label %if.else > > if.then: ; preds = %entry > > %0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 > > call void @llvm.lifetime.start(i64 33, i8* %0) > > call void @_ZN1X11doSomethingEv(%struct.X* %x) > > br label %label > > label: ; preds = %if.else.label_crit_edge, %if.then > > %.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ] > > call void @_Z3barR1X(%struct.X* dereferenceable(33) %x) > > call void @llvm.lifetime.end(i64 33, i8* %.pre-phi) > > br label %if.end3 > > if.else: ; preds = %entry > > tail call void @_Z3bazv() > > %cmp1 = icmp eq i32 %i, 0 > > br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3 > > if.else.label_crit_edge: ; preds = %if.else > > %.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 > > br label %label > > if.end3: ; preds = %if.else, %label > > ret void > > } > > Note that the path thru if.else.label_crit_edge has no lifetime start. > > Cheers, > > -- > > Arnaud > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/7b2544c2/attachment.html>
On Wed, Nov 5, 2014 at 10:01 AM, Philip Reames <listmail at philipreames.com> wrote:> Would one of you mind taking a step back and explaining what you believe > the "stack colouring problem" to be? I'm familiar with the general meaning > of the term and even some of LLVM's implementation; I'm just not sure what > specific issue you're referring to. Having the context would make it much > easier to assess your proposals. >The goal of stack coloring is to reduce stack usage by storing user data with non-overlapping lifetimes in the same stack memory. C/C++ source programs usually have a naturally scoped structure, where the lifetime of data is bounded by curly braces. This information reduces the lifetime that stack coloring sees, so it saves stack memory. When we go to LLVM IR, we lose all that information. We currently try to recapture it with calls to @lifetime.start / end intrinsic calls at the point of declaration and when the variable falls out of scope. Basically we're trying to figure out how to put that scoping information back into the IR without turning it back into a tree. Furthermore, if we had better information about this in the IR, we could augment ASan to detect use-after-scope. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/79547fa9/attachment.html>