thr3ads.net - llvm dev - [llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Arthur Eubanks via llvm-dev

2020-Jun-25 20:42 UTC

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated

Bringing this back up for discussion on handling exceptions.

According to the inalloca design <https://llvm.org/docs/InAlloca.html>,
there should be a stackrestore after an invoke in both the non-exceptional
and exceptional case (that doesn't seem to be happening in some cases as
we've seen, but that's beside the point).

Does it make sense to model a preallocated call as handling the cleanup of
the stack in normal control flow (e.g. always for a normal call, and in the
non-exceptional path for an invoke)? Then @llvm.call.preallocated.teardown
is only necessary in the exceptional path to cleanup the stack.

On Thu, Apr 16, 2020 at 1:40 PM Eli Friedman <efriedma at quicinc.com>
wrote:
>
>
> *From:* Reid Kleckner <rnk at google.com>
> *Sent:* Thursday, April 16, 2020 1:06 PM
> *To:* Eli Friedman <efriedma at quicinc.com>
> *Cc:* Arthur Eubanks <aeubanks at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with
> llvm.call.setup and preallocated
>
>
>
> On Sat, Mar 28, 2020 at 2:20 PM Eli Friedman <efriedma at
quicinc.com> wrote:
>
> This would specifically be for cases where we try to rewrite the
> signature?  I would assume we should forbid rewriting the signature of a
> call with an operand bundle.  And once some optimization drops the bundle
> and preallocated marking, to allow such rewriting, the signature doesn’t
> need to match anymore.
>
>
>
> Yes, I really would like to enable DAE and other signature rewriting IPO
> transforms. Maybe today DAE doesn't run on calls with bundles, but this
> feature is designed to allow the non-preallocated arguments to be removed
> or expanded into multiple arguments without disturbing the preallocated
> argument numbering.
>
>
>
> This is sort of besides the point.  I think we get the best of everything
> if we just say the bundle and preallocated attribute have to be dropped
> before any other IPO transforms.  (If we control the function and all the
> callers, this transform should be relatively simple: we just remove the
> attribute and bundle from the function and all the callers, and insert an
> llvm.call.teardown after each call where we dropped a bundle.)
>
>
>
> Even if we can potentially rewrite signatures in some cases without
> dropping the preallocated attribute, there isn’t really any incentive to do
> that, as far as I know.
>
> Good.  It might be a good idea to try to write out an algorithm for this
> to ensure this works out.  In particular, I’m concerned about cases where
> two predecessors of a basic block appear to have a different stack size (an
> if-then-else, or a loop backedge).  We need to make sure such cases are
> either invalid, or UB on entry to the block.
>
>
>
> I spent a little time thinking, and I’m not sure what rules we need to
> make this work out.  For example, should we forbid tail-merging multiple
> calls to abort()?  IF we should, how would we write a rule which restricts
> that?
>
>
>
> This is actually a big open problem, and it came up again in the SEH
> discussion. It seems to be my fate to struggle against the LLVM IR design
> decision to not have scopes.
>
>
>
> Without introducing new IR constructs, we could define a list of
> instructions that set the current region. We could write an algorithm for
> assigning regions to each block. The region is implicit in the IR. These
> are the things that could create regions:
>
> - call.preallocated.setup
>
> - catchpad
>
> - cleanuppad
>
> - lifetime.start? unclear
>
>
>
> stacksave/stackrestore.
>
>
>
> I don’t think there’s any reason to make lifetime intrinsics properly
> nest; nothing really cares, as far as I know.  (The current lifetime
> intrinsics have problems, but I don’t think that’s one of them.)
>
>
>
> Passes are required to ensure that each BB belongs to exactly one region.
> Each region belongs to its parent, and ending a region returns to the
> parent of the ending region.
>
>
> I don't think this idea is ready to be added to LangRef, but it is a
good
> future direction, perhaps with new supporting IR constructs.
>
>
>
> I think for now we have to live with the possibility that the analysis
> which assigns SP adjustment levels to MBBs may fail to find a unique SP
> level, in which case we must use a frame pointer.
>
>
>
> Okay.  Can we at least list out the cases where we expect this might
> happen?
>
>
>
> OTOH, we can easily establish the invariant at the MIR level. We should
> always be able to assign each MBB a unique most recently active call site
> and an SP adjustment level. We can easily teach BranchFolding to preserve
> this invariant. We already do it for funclets.
>
>
>
> I’m not really concerned with funny usage of calls to alloca() in call
> arguments, or anything like that. I’m happy to pick whatever rule is
> easiest for us.  I’m more concerned with ensuring nothing blows up if we
> inline a call to a function that contains a VLA, or something like that.
>
>
>
> Sounds good. Inlining dynamic allocas and VLAs should already just work.
> The inliner places stacksave/stackrestore calls around the original call
> site, if dynamic allocas were present in the inlined code.
>
>
>
> So stacksave/stackrestore regions properly nest with
> call.preallocated.setup regions?  Sure, that makes sense.
>
>
>
> -Eli
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200625/c9fac8ac/attachment.html>

Eli Friedman via llvm-dev

2020-Jun-25 21:15 UTC

head link

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated

It makes sense to say that the instruction with the preallocated bundle frees
the allocation, I think, on both the normal and exceptional codepaths of an
invoke.  inalloca has to use a stackrestore to match the stacksave, but there
isn’t any reason to match that here.  And there isn’t any reason to distinguish
between the normal and unwind paths of the invoke.

The llvm.call.preallocated.teardown, then, is only necessary if we manage to
throw an exception before we reach the call with the preallocated bundle.  (For
example, if the constructor for a function argument throws an exception.)

The one part that’s a little tricky is I’m not sure how
llvm.call.preallocated.teardown is supposed to work on the exceptional path.  If
you try to take stack memory allocated outside a funclet, and free it inside a
funclet, we can’t actually free the memory at the point of the call, so there
isn’t any simple lowering.  And in general, the llvm.call.preallocated.setup
doesn’t dominate the point where we exit the funclet.  So I’m not sure what the
best way to represent that is.  I guess we could stick the teardown call in the
funclet, say we “free” the memory at that point, and let the backend do whatever
magic is necessary to ensure we actually adjust the stack pointer correctly when
the funclet returns.

-Eli

From: Arthur Eubanks <aeubanks at google.com>
Sent: Thursday, June 25, 2020 1:42 PM
To: Eli Friedman <efriedma at quicinc.com>
Cc: Reid Kleckner <rnk at google.com>; llvm-dev <llvm-dev at
lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and
preallocated

Bringing this back up for discussion on handling exceptions.

According to the inalloca design<https://llvm.org/docs/InAlloca.html>,
there should be a stackrestore after an invoke in both the non-exceptional and
exceptional case (that doesn't seem to be happening in some cases as
we've seen, but that's beside the point).

Does it make sense to model a preallocated call as handling the cleanup of the
stack in normal control flow (e.g. always for a normal call, and in the
non-exceptional path for an invoke)? Then @llvm.call.preallocated.teardown is
only necessary in the exceptional path to cleanup the stack.

On Thu, Apr 16, 2020 at 1:40 PM Eli Friedman <efriedma at
quicinc.com<mailto:efriedma at quicinc.com>> wrote:

From: Reid Kleckner <rnk at google.com<mailto:rnk at google.com>>
Sent: Thursday, April 16, 2020 1:06 PM
To: Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>
Cc: Arthur Eubanks <aeubanks at google.com<mailto:aeubanks at
google.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev
at lists.llvm.org>>
Subject: [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and
preallocated

On Sat, Mar 28, 2020 at 2:20 PM Eli Friedman <efriedma at
quicinc.com<mailto:efriedma at quicinc.com>> wrote:
This would specifically be for cases where we try to rewrite the signature?  I
would assume we should forbid rewriting the signature of a call with an operand
bundle.  And once some optimization drops the bundle and preallocated marking,
to allow such rewriting, the signature doesn’t need to match anymore.

Yes, I really would like to enable DAE and other signature rewriting IPO
transforms. Maybe today DAE doesn't run on calls with bundles, but this
feature is designed to allow the non-preallocated arguments to be removed or
expanded into multiple arguments without disturbing the preallocated argument
numbering.

This is sort of besides the point.  I think we get the best of everything if we
just say the bundle and preallocated attribute have to be dropped before any
other IPO transforms.  (If we control the function and all the callers, this
transform should be relatively simple: we just remove the attribute and bundle
from the function and all the callers, and insert an llvm.call.teardown after
each call where we dropped a bundle.)

Even if we can potentially rewrite signatures in some cases without dropping the
preallocated attribute, there isn’t really any incentive to do that, as far as I
know.
Good.  It might be a good idea to try to write out an algorithm for this to
ensure this works out.  In particular, I’m concerned about cases where two
predecessors of a basic block appear to have a different stack size (an
if-then-else, or a loop backedge).  We need to make sure such cases are either
invalid, or UB on entry to the block.

I spent a little time thinking, and I’m not sure what rules we need to make this
work out.  For example, should we forbid tail-merging multiple calls to abort()?
IF we should, how would we write a rule which restricts that?

This is actually a big open problem, and it came up again in the SEH discussion.
It seems to be my fate to struggle against the LLVM IR design decision to not
have scopes.

Without introducing new IR constructs, we could define a list of instructions
that set the current region. We could write an algorithm for assigning regions
to each block. The region is implicit in the IR. These are the things that could
create regions:
- call.preallocated.setup
- catchpad
- cleanuppad
- lifetime.start? unclear

stacksave/stackrestore.

I don’t think there’s any reason to make lifetime intrinsics properly nest;
nothing really cares, as far as I know.  (The current lifetime intrinsics have
problems, but I don’t think that’s one of them.)

Passes are required to ensure that each BB belongs to exactly one region. Each
region belongs to its parent, and ending a region returns to the parent of the
ending region.

I don't think this idea is ready to be added to LangRef, but it is a good
future direction, perhaps with new supporting IR constructs.

I think for now we have to live with the possibility that the analysis which
assigns SP adjustment levels to MBBs may fail to find a unique SP level, in
which case we must use a frame pointer.

Okay.  Can we at least list out the cases where we expect this might happen?

OTOH, we can easily establish the invariant at the MIR level. We should always
be able to assign each MBB a unique most recently active call site and an SP
adjustment level. We can easily teach BranchFolding to preserve this invariant.
We already do it for funclets.

I’m not really concerned with funny usage of calls to alloca() in call
arguments, or anything like that. I’m happy to pick whatever rule is easiest for
us.  I’m more concerned with ensuring nothing blows up if we inline a call to a
function that contains a VLA, or something like that.

Sounds good. Inlining dynamic allocas and VLAs should already just work. The
inliner places stacksave/stackrestore calls around the original call site, if
dynamic allocas were present in the inlined code.

So stacksave/stackrestore regions properly nest with call.preallocated.setup
regions?  Sure, that makes sense.

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200625/e1b39413/attachment-0001.html>

Arthur Eubanks via llvm-dev

2020-Jul-06 18:56 UTC

head link

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated

On Thu, Jun 25, 2020 at 2:15 PM Eli Friedman <efriedma at quicinc.com>
wrote:
> It makes sense to say that the instruction with the preallocated bundle
> frees the allocation, I think, on both the normal and exceptional codepaths
> of an invoke.  inalloca has to use a stackrestore to match the stacksave,
> but there isn’t any reason to match that here.  And there isn’t any reason
> to distinguish between the normal and unwind paths of the invoke.
>Ah I see, the stack cleanup would be implicit after the preallocated call,
whether the normal or exceptional path, makes sense. I assume that's what
invoke + byval must do.
> The llvm.call.preallocated.teardown, then, is only necessary if we manage
> to throw an exception before we reach the call with the preallocated
> bundle.  (For example, if the constructor for a function argument throws an
> exception.)
>Does it make sense to say that in order to avoid stack leaks, exactly one
of either the preallocated call or the teardown must be called? And if both
are called it's UB?
>
>
> The one part that’s a little tricky is I’m not sure how
> llvm.call.preallocated.teardown is supposed to work on the exceptional
> path.  If you try to take stack memory allocated outside a funclet, and
> free it inside a funclet, we can’t actually free the memory at the point of
> the call, so there isn’t any simple lowering.
>I assume you mean the exceptional path of e.g. a previous constructor, not
the preallocated call. I don't understand how it would be any different
from a typical stackrestore of an inalloca in a funclet.
> And in general, the llvm.call.preallocated.setup doesn’t dominate the
> point where we exit the funclet.  So I’m not sure what the best way to
> represent that is.  I guess we could stick the teardown call in the
> funclet, say we “free” the memory at that point, and let the backend do
> whatever magic is necessary to ensure we actually adjust the stack pointer
> correctly when the funclet returns.
>
>
> -Eli
>
>
>
> *From:* Arthur Eubanks <aeubanks at google.com>
> *Sent:* Thursday, June 25, 2020 1:42 PM
> *To:* Eli Friedman <efriedma at quicinc.com>
> *Cc:* Reid Kleckner <rnk at google.com>; llvm-dev <llvm-dev at
lists.llvm.org>
> *Subject:* [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with
> llvm.call.setup and preallocated
>
>
>
> Bringing this back up for discussion on handling exceptions.
>
>
>
> According to the inalloca design
<https://llvm.org/docs/InAlloca.html>,
> there should be a stackrestore after an invoke in both the non-exceptional
> and exceptional case (that doesn't seem to be happening in some cases
as
> we've seen, but that's beside the point).
>
>
>
> Does it make sense to model a preallocated call as handling the cleanup of
> the stack in normal control flow (e.g. always for a normal call, and in the
> non-exceptional path for an invoke)? Then @llvm.call.preallocated.teardown
> is only necessary in the exceptional path to cleanup the stack.
>
>
>
> On Thu, Apr 16, 2020 at 1:40 PM Eli Friedman <efriedma at
quicinc.com> wrote:
>
>
>
> *From:* Reid Kleckner <rnk at google.com>
> *Sent:* Thursday, April 16, 2020 1:06 PM
> *To:* Eli Friedman <efriedma at quicinc.com>
> *Cc:* Arthur Eubanks <aeubanks at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with
> llvm.call.setup and preallocated
>
>
>
> On Sat, Mar 28, 2020 at 2:20 PM Eli Friedman <efriedma at
quicinc.com> wrote:
>
> This would specifically be for cases where we try to rewrite the
> signature?  I would assume we should forbid rewriting the signature of a
> call with an operand bundle.  And once some optimization drops the bundle
> and preallocated marking, to allow such rewriting, the signature doesn’t
> need to match anymore.
>
>
>
> Yes, I really would like to enable DAE and other signature rewriting IPO
> transforms. Maybe today DAE doesn't run on calls with bundles, but this
> feature is designed to allow the non-preallocated arguments to be removed
> or expanded into multiple arguments without disturbing the preallocated
> argument numbering.
>
>
>
> This is sort of besides the point.  I think we get the best of everything
> if we just say the bundle and preallocated attribute have to be dropped
> before any other IPO transforms.  (If we control the function and all the
> callers, this transform should be relatively simple: we just remove the
> attribute and bundle from the function and all the callers, and insert an
> llvm.call.teardown after each call where we dropped a bundle.)
>
>
>
> Even if we can potentially rewrite signatures in some cases without
> dropping the preallocated attribute, there isn’t really any incentive to do
> that, as far as I know.
>
> Good.  It might be a good idea to try to write out an algorithm for this
> to ensure this works out.  In particular, I’m concerned about cases where
> two predecessors of a basic block appear to have a different stack size (an
> if-then-else, or a loop backedge).  We need to make sure such cases are
> either invalid, or UB on entry to the block.
>
>
>
> I spent a little time thinking, and I’m not sure what rules we need to
> make this work out.  For example, should we forbid tail-merging multiple
> calls to abort()?  IF we should, how would we write a rule which restricts
> that?
>
>
>
> This is actually a big open problem, and it came up again in the SEH
> discussion. It seems to be my fate to struggle against the LLVM IR design
> decision to not have scopes.
>
>
>
> Without introducing new IR constructs, we could define a list of
> instructions that set the current region. We could write an algorithm for
> assigning regions to each block. The region is implicit in the IR. These
> are the things that could create regions:
>
> - call.preallocated.setup
>
> - catchpad
>
> - cleanuppad
>
> - lifetime.start? unclear
>
>
>
> stacksave/stackrestore.
>
>
>
> I don’t think there’s any reason to make lifetime intrinsics properly
> nest; nothing really cares, as far as I know.  (The current lifetime
> intrinsics have problems, but I don’t think that’s one of them.)
>
>
>
> Passes are required to ensure that each BB belongs to exactly one region.
> Each region belongs to its parent, and ending a region returns to the
> parent of the ending region.
>
>
> I don't think this idea is ready to be added to LangRef, but it is a
good
> future direction, perhaps with new supporting IR constructs.
>
>
>
> I think for now we have to live with the possibility that the analysis
> which assigns SP adjustment levels to MBBs may fail to find a unique SP
> level, in which case we must use a frame pointer.
>
>
>
> Okay.  Can we at least list out the cases where we expect this might
> happen?
>
>
>
> OTOH, we can easily establish the invariant at the MIR level. We should
> always be able to assign each MBB a unique most recently active call site
> and an SP adjustment level. We can easily teach BranchFolding to preserve
> this invariant. We already do it for funclets.
>
>
>
> I’m not really concerned with funny usage of calls to alloca() in call
> arguments, or anything like that. I’m happy to pick whatever rule is
> easiest for us.  I’m more concerned with ensuring nothing blows up if we
> inline a call to a function that contains a VLA, or something like that.
>
>
>
> Sounds good. Inlining dynamic allocas and VLAs should already just work.
> The inliner places stacksave/stackrestore calls around the original call
> site, if dynamic allocas were present in the inlined code.
>
>
>
> So stacksave/stackrestore regions properly nest with
> call.preallocated.setup regions?  Sure, that makes sense.
>
>
>
> -Eli
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200706/c2eebcc9/attachment-0001.html>

llvm dev - Jun 2020 - [RFC] Replacing inalloca with llvm.call.setup and preallocated

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated

[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated