Here are some comments.
It seems to me there are 2 (mostly separate) aspects:
1. Teaching clang how to do domination / post-domination analysis, so that the
lifetime information (alloca/dealloca, or lifetime markers) can be set in the
right locations in the IR. Such an analysis should be able to understand the
language scopes, as well as labels / gotos and EH. The results of this analysis
would be used at IR codegen.
Caveat: I have no expertise in this area of Clang, so take everything with a
grain of salt and please feel free to correct me.
Where should this analysis be run ? Presumably at the beginning of each
function’s codegen’s time.
This analysis looks a bit special (at least to me), as it will work over the
AST, but it also requires a pretty good understanding of LLVM’s IR, which sets
it apart from other clang analyses.
Maybe another option (some may call it a poor’s man option) would be to enforce
at IR codegen time the dominators / post-dominators on the multiple paths
(normal & EH control flows) by inserting basic blocks around each statement
which is codegened. Those would be fall-thru most of the time, the llvm
optimizers can remove them easily. The obvious drawback is that it will insert
lots of small or fall-thru BBs.
2. How liveranges are represented in LLVM’s IR.
I like the idea of pairing alloca / dealloca (and removing the lifetime markers,
at least for stack coloring) and I think it could even ease / improve some
analysis.
Currently, allocas have to be located in the entry BB in order to get a chance
to be promoted to registers by the Mem2Reg pass. Allocas in other BBs are
considered to be dynamic. I have no idea how difficult it would be to teach
Mem2Reg to consider alloca/dealloca in other basic blocks.
With the alloca / dealloca solution, in order to do stack colouring, the alloca
must _not_ be in the entry block, because all allocas defined there are alive at
the same time and cannot be merged. All LLVM passes would need to be teached
that those alloca / dealloca pairs correspond to stack slots --- as the alloca
in the entry. The pairing would also have to be preserved across transformations
(same as lifetime.start/end).
Cheers,
Arnaud
From: Reid Kleckner [mailto:rnk at google.com]
Sent: 04 November 2014 19:35
To: Arnaud De Grandmaison; Nick Lewycky; Rafael Ávila de Espíndola
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] lifetime.start/end clarification
Short version: I think Clang needs some analysis capabilities to widen the
lifetime.
---
I think we need a new approach. These intrinsics were designed to be general
between heap and stack, and we don't need that extra generality for the
simple problem of stack coloring that we're trying to solve here. See for
example the size parameter, which I bet stack coloring doesn't need. If we
have a bitcast alloca, we already know the size.
Rafael had an idea at the dev meeting that maybe the IR needs a stack
deallocation instruction to pair with alloca. Then we could teach LLVM to
consider allocas that are post-dominated by a deallocation instruction to be
static, and fold them into the entry block. He pointed out that the Swift IL
actually has such a construct.
It would be the responsibility of the frontend to ensure that each alloca is
post-dominated by its "dealloca", so in this example with labels,
we'd just have to hoist the allocation to the nearest dominating block, or
just give up and go to the function entry block. Similarly the deallocation has
to be moved to post-dominate the allocation, to handle cases like:
void foo(int x) {
if (x > 10) {
// alloca y
goto lbl;
while (x) {
int y;
lbl:
y = bar();
x -= y;
}
// dealloca y
}
}
This representation would support more aggressive stack coloring. Furthermore,
it supports a much more efficient lowering for inalloca, which is why I'm
somewhat interested in it.
If we don't want to do this, we can do something less drastic and either add
new intrinsics or modify the current ones with the same rules proposed above.
We'd have the alloca in the entry block, the lifetime start at the first
block that dominates all uses, and the deallocation at the first block that
post-dominates all that stuff.
One other thing to think about is EH. We can often get into a situation where
uses of y are statically reachable from cleanup code while being dynamically
unreachable. This can happen when cleanups are not simply ordered in a
stack-like manner. I think if we can teach clang to do this kind of domination
analysis, then we can probably detect this case and give up on it by allocating
in the entry block.
On Tue, Nov 4, 2014 at 3:59 AM, Arnaud A. de Grandmaison
<arnaud.degrandmaison at arm.com> wrote:
The LRM (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic)
essentially states that:
- ptr is dead before a call to “lifetime.start size, ptr”
- ptr is dead after a call to “lifetime.end size, ptr”
This is all good and fine, and the expected use case is that all “lifetime.end
size, ptr” markers are matched with a preceding “lifetime.start size, ptr” in
the CFG.
What is supposed to happen when a “lifetime.end size, ptr” is not matched with a
“lifetime.start size, ptr” ? I think there are a few possible answers:
- the memory area pointed to by ptr is assumed to be alive since function entry
- the memory area pointed to by ptr is assumed to be dead since function entry,
as it has not been marked alive
- this is an unexpected situation
I think this ambiguity should be cleared in the LRM, because today’s implicit
assumption may be broken at any time.
This is not a theoretical question: clang can generate such cases. For example,
the following testcase:
struct X {
void doSomething();
char b[33];
};
void bar(X &);
void baz();
void test(int i) {
if (i==9) {
X x;
x.doSomething();
label:
bar(x);
} else {
baz();
if (i==0)
goto label;
}
}
Produces:
%struct.X = type { [33 x i8] }
define void @_Z4testi(i32 %i) {
entry:
%x = alloca %struct.X, align 1
%cmp = icmp eq i32 %i, 9
br i1 %cmp, label %if.then, label %if.else
if.then: ; preds = %entry
%0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0
call void @llvm.lifetime.start(i64 33, i8* %0)
call void @_ZN1X11doSomethingEv(%struct.X* %x)
br label %label
label: ; preds =
%if.else.label_crit_edge, %if.then
%.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ]
call void @_Z3barR1X(%struct.X* dereferenceable(33) %x)
call void @llvm.lifetime.end(i64 33, i8* %.pre-phi)
br label %if.end3
if.else: ; preds = %entry
tail call void @_Z3bazv()
%cmp1 = icmp eq i32 %i, 0
br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3
if.else.label_crit_edge: ; preds = %if.else
%.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0
br label %label
if.end3: ; preds = %if.else, %label
ret void
}
Note that the path thru if.else.label_crit_edge has no lifetime start.
Cheers,
--
Arnaud
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/a943b444/attachment.html>
On 11/05/2014 08:47 AM, Arnaud A. de Grandmaison wrote:> > Here are some comments. > > It seems to me there are 2 (mostly separate) aspects: > > 1. Teaching clang how to do domination / post-domination analysis, so > that the lifetime information (alloca/dealloca, or lifetime markers) > can be set in the right locations in the IR. Such an analysis should > be able to understand the language scopes, as well as labels / gotos > and EH. The results of this analysis would be used at IR codegen. > > Caveat: I have no expertise in this area of Clang, so take everything > with a grain of salt and please feel free to correct me. > > Where should this analysis be run ? Presumably at the beginning of > each function’s codegen’s time. > > This analysis looks a bit special (at least to me), as it will work > over the AST, but it also requires a pretty good understanding of > LLVM’s IR, which sets it apart from other clang analyses. > > Maybe another option (some may call it a poor’s man option) would be > to enforce at IR codegen time the dominators / post-dominators on the > multiple paths (normal & EH control flows) by inserting basic blocks > around each statement which is codegened. Those would be fall-thru > most of the time, the llvm optimizers can remove them easily. The > obvious drawback is that it will insert lots of small or fall-thru BBs. > > 2. How liveranges are represented in LLVM’s IR. > > I like the idea of pairing alloca / dealloca (and removing the > lifetime markers, at least for stack coloring) and I think it could > even ease / improve some analysis. > > Currently, allocas have to be located in the entry BB in order to get > a chance to be promoted to registers by the Mem2Reg pass. Allocas in > other BBs are considered to be dynamic. I have no idea how difficult > it would be to teach Mem2Reg to consider alloca/dealloca in other > basic blocks. > > With the alloca / dealloca solution, in order to do stack colouring, > the alloca must _/not/_ be in the entry block, because all allocas > defined there are alive at the same time and cannot be merged. All > LLVM passes would need to be teached that those alloca / dealloca > pairs correspond to stack slots --- as the alloca in the entry. The > pairing would also have to be preserved across transformations (same > as lifetime.start/end). >Would one of you mind taking a step back and explaining what you believe the "stack colouring problem" to be? I'm familiar with the general meaning of the term and even some of LLVM's implementation; I'm just not sure what specific issue you're referring to. Having the context would make it much easier to assess your proposals.> > Cheers, > > Arnaud > > *From:*Reid Kleckner [mailto:rnk at google.com] > *Sent:* 04 November 2014 19:35 > *To:* Arnaud De Grandmaison; Nick Lewycky; Rafael Ávila de Espíndola > *Cc:* LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] lifetime.start/end clarification > > Short version: I think Clang needs some analysis capabilities to widen > the lifetime. > > --- > > I think we need a new approach. These intrinsics were designed to be > general between heap and stack, and we don't need that extra > generality for the simple problem of stack coloring that we're trying > to solve here. See for example the size parameter, which I bet stack > coloring doesn't need. If we have a bitcast alloca, we already know > the size. > > Rafael had an idea at the dev meeting that maybe the IR needs a stack > deallocation instruction to pair with alloca. Then we could teach LLVM > to consider allocas that are post-dominated by a deallocation > instruction to be static, and fold them into the entry block. He > pointed out that the Swift IL actually has such a construct. > > It would be the responsibility of the frontend to ensure that each > alloca is post-dominated by its "dealloca", so in this example with > labels, we'd just have to hoist the allocation to the nearest > dominating block, or just give up and go to the function entry block. > Similarly the deallocation has to be moved to post-dominate the > allocation, to handle cases like: > > void foo(int x) { > > if (x > 10) { > > // alloca y > > goto lbl; > > while (x) { > > int y; > > lbl: > > y = bar(); > > x -= y; > } > > // dealloca y > > } > > } > > This representation would support more aggressive stack coloring. > Furthermore, it supports a much more efficient lowering for inalloca, > which is why I'm somewhat interested in it. > > If we don't want to do this, we can do something less drastic and > either add new intrinsics or modify the current ones with the same > rules proposed above. We'd have the alloca in the entry block, the > lifetime start at the first block that dominates all uses, and the > deallocation at the first block that post-dominates all that stuff. > > One other thing to think about is EH. We can often get into a > situation where uses of y are statically reachable from cleanup code > while being dynamically unreachable. This can happen when cleanups are > not simply ordered in a stack-like manner. I think if we can teach > clang to do this kind of domination analysis, then we can probably > detect this case and give up on it by allocating in the entry block. > > On Tue, Nov 4, 2014 at 3:59 AM, Arnaud A. de Grandmaison > <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> > wrote: > > The LRM > (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic) > essentially states that: > > - ptr is dead before a call to “lifetime.start size, ptr” > > - ptr is dead after a call to “lifetime.end size, ptr” > > This is all good and fine, and the expected use case is that all > “lifetime.end size, ptr” markers are matched with a preceding > “lifetime.start size, ptr” in the CFG. > > What is supposed to happen when a “lifetime.end size, ptr” is not > matched with a “lifetime.start size, ptr” ? I think there are a few > possible answers: > > - the memory area pointed to by ptr is assumed to be alive since > function entry > > - the memory area pointed to by ptr is assumed to be dead since > function entry, as it has not been marked alive > > - this is an unexpected situation > > I think this ambiguity should be cleared in the LRM, because today’s > implicit assumption may be broken at any time. > > This is not a theoretical question: clang can generate such cases. For > example, the following testcase: > > struct X { > > void doSomething(); > > char b[33]; > > }; > > void bar(X &); > > void baz(); > > void test(int i) { > > if (i==9) { > > X x; > > x.doSomething(); > > label: > > bar(x); > > } else { > > baz(); > > if (i==0) > > goto label; > > } > > } > > Produces: > > %struct.X = type { [33 x i8] } > > define void @_Z4testi(i32 %i) { > > entry: > > %x = alloca %struct.X, align 1 > > %cmp = icmp eq i32 %i, 9 > > br i1 %cmp, label %if.then, label %if.else > > if.then: ; preds = %entry > > %0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 > > call void @llvm.lifetime.start(i64 33, i8* %0) > > call void @_ZN1X11doSomethingEv(%struct.X* %x) > > br label %label > > label: ; preds = %if.else.label_crit_edge, %if.then > > %.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ] > > call void @_Z3barR1X(%struct.X* dereferenceable(33) %x) > > call void @llvm.lifetime.end(i64 33, i8* %.pre-phi) > > br label %if.end3 > > if.else: ; preds = %entry > > tail call void @_Z3bazv() > > %cmp1 = icmp eq i32 %i, 0 > > br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3 > > if.else.label_crit_edge: ; preds = %if.else > > %.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0 > > br label %label > > if.end3: ; preds = %if.else, %label > > ret void > > } > > Note that the path thru if.else.label_crit_edge has no lifetime start. > > Cheers, > > -- > > Arnaud > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/7b2544c2/attachment.html>
On Wed, Nov 5, 2014 at 10:01 AM, Philip Reames <listmail at philipreames.com> wrote:> Would one of you mind taking a step back and explaining what you believe > the "stack colouring problem" to be? I'm familiar with the general meaning > of the term and even some of LLVM's implementation; I'm just not sure what > specific issue you're referring to. Having the context would make it much > easier to assess your proposals. >The goal of stack coloring is to reduce stack usage by storing user data with non-overlapping lifetimes in the same stack memory. C/C++ source programs usually have a naturally scoped structure, where the lifetime of data is bounded by curly braces. This information reduces the lifetime that stack coloring sees, so it saves stack memory. When we go to LLVM IR, we lose all that information. We currently try to recapture it with calls to @lifetime.start / end intrinsic calls at the point of declaration and when the variable falls out of scope. Basically we're trying to figure out how to put that scoping information back into the IR without turning it back into a tree. Furthermore, if we had better information about this in the IR, we could augment ASan to detect use-after-scope. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/79547fa9/attachment.html>