thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Bill Wendling via llvm-dev

2019-Jun-28 21:53 UTC

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at google.com>
wrote:
> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at gmail.com>
wrote:
>
>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at
gmail.com> wrote:
>>
>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com>
>>> wrote:
>>>
>>>> I think this is fine, except that it stops at the point where
things
>>>> actually start to get interesting and tricky.
>>>>
>>>> How will you actually handle the flow of values from the callbr
into
>>>> the error blocks? A callbr can specify requirements on where
its outputs
>>>> live. So, what if two callbr, in different branches of code,
specify
>>>> _different_ constraints for the same output, and list the same
block as a
>>>> possible error successor? How can the resulting phi be
codegened?
>>>>
>>>> This is where I fall back on the statement about how "the
programmer
>>> knows what they're doing". Perhaps I'm being too
cavalier here? My concern,
>>> if you want to call it that, is that we don't be too
restrictive on the new
>>> behavior. For example, the "asm goto" may set a register
to an error value
>>> (made up on the spot; may not be a common use). But, if there's
no real
>>> reason to have the value be valid on the abnormal path, then sure
we can
>>> declare that it's not valid on the abnormal path.
>>>
>>> I think I should explain my "programmer knows what they're
doing"
>> statement a bit better. I'm specifically referring to inline asm
here. The
>> more general "callbr" case may still need to be considered
(see Reid's
>> reply).
>>
>> When a programmer uses inline asm, they're implicitly telling the
>> compiler that they *do* know what they're doing  (I know this is
common
>> knowledge, but I wanted to reiterate it.). In particular, either they
need
>> to reference an instruction not readily available from the compiler
(e.g.
>> "cpuid") or the compiler isn't able to give them the
needed performance in
>> a critical section. I'm extending this sentiment to callbr with
output
>> constraints. Let's take your example below and write it as
"normal" asm
>> statements one on each branch of an if-then-else (please ignore any
syntax
>> errors):
>>
>> if:
>>   br i1 %cmp, label %true, label %false
>>
>> true:
>>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
>>   br label %end
>>
>> false:
>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}"
>> ()
>>   br label %end
>>
>> end:
>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>
>> How is this handled in codegen? Is it an error or does the back-end
>> handle it? Whatever's done today for "normal" inline asm
is what I *think*
>> should be the behavior for the inline asm callbr variant. If this
doesn't
>> seem sensible (and I realize that I may be thinking of an "in a
perfect
>> world" scenario), then we'll need to come up with a more
sensible solution
>> which may be to disallow the values on the error block until we can
think
>> of a better way to handle them.
>>
>
> This example is no problem, because instructions can be emitted between
> what's emitted by "call asm" and the end of the block (be it
a fallthrough,
> or a jump instruction. What gets emitted there is a move of the output
> register to another location -- either a register or to the stack. And
> therefore at the beginning of the "end" block, "%vals"
is always in a
> consistent location, no matter how you got to that block.
>
> But in the callbr case, there is not a location at which those moves can
> be emitted, after the callbr, before the jump to "error".
>
I see what you mean. Let's say we create a pseudo-instruction (similar to
landingpad, et al) that needs to be lowered by the backend in a reasonable
manner. The EH stuff has an external process/library that performs the
actual unwinding and which sets the values accordingly. We won't have this.
What we could do instead is split the edges and insert the copy-to-<where
ever> statements there. So something like:
>>>
bb1:

  callbr ... [label %asm.goto.dest]


bb2:

  callbr ... [label %asm.goto.dest]


asm.goto.dest:

  ...

<<<

converted to something like:
>>>
bb1:

  callbr ... [label %asm.goto.dest.bb1]


bb2:

  callbr ... [label %asm.goto.dest.bb2]


asm.goto.dest.bb1:

  %v.bb1 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest.bb2:

  %v.bb2 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest:

  %v = phi [%v.bb1, label %asm.goto.dest.bb1], [%v.bb2, label %asm.goto.bb2]

  ...

  ...

<<<

It's not 100% not barfy, but it's what the compiler does in similar
situations.

-bw
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/5aa00255/attachment.html>

James Y Knight via llvm-dev

2019-Jun-28 22:35 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <isanbard at gmail.com>
wrote:
> On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at
google.com>
> wrote:
>
>> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at
gmail.com> wrote:
>>
>>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at
gmail.com>
>>> wrote:
>>>
>>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com>
>>>> wrote:
>>>>
>>>>> I think this is fine, except that it stops at the point
where things
>>>>> actually start to get interesting and tricky.
>>>>>
>>>>> How will you actually handle the flow of values from the
callbr into
>>>>> the error blocks? A callbr can specify requirements on
where its outputs
>>>>> live. So, what if two callbr, in different branches of
code, specify
>>>>> _different_ constraints for the same output, and list the
same block as a
>>>>> possible error successor? How can the resulting phi be
codegened?
>>>>>
>>>>> This is where I fall back on the statement about how
"the programmer
>>>> knows what they're doing". Perhaps I'm being too
cavalier here? My concern,
>>>> if you want to call it that, is that we don't be too
restrictive on the new
>>>> behavior. For example, the "asm goto" may set a
register to an error value
>>>> (made up on the spot; may not be a common use). But, if
there's no real
>>>> reason to have the value be valid on the abnormal path, then
sure we can
>>>> declare that it's not valid on the abnormal path.
>>>>
>>>> I think I should explain my "programmer knows what
they're doing"
>>> statement a bit better. I'm specifically referring to inline
asm here. The
>>> more general "callbr" case may still need to be
considered (see Reid's
>>> reply).
>>>
>>> When a programmer uses inline asm, they're implicitly telling
the
>>> compiler that they *do* know what they're doing  (I know this
is common
>>> knowledge, but I wanted to reiterate it.). In particular, either
they need
>>> to reference an instruction not readily available from the compiler
(e.g.
>>> "cpuid") or the compiler isn't able to give them the
needed performance in
>>> a critical section. I'm extending this sentiment to callbr with
output
>>> constraints. Let's take your example below and write it as
"normal" asm
>>> statements one on each branch of an if-then-else (please ignore any
syntax
>>> errors):
>>>
>>> if:
>>>   br i1 %cmp, label %true, label %false
>>>
>>> true:
>>>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
>>>   br label %end
>>>
>>> false:
>>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
>>> "={r10},={r11}" ()
>>>   br label %end
>>>
>>> end:
>>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>>
>>> How is this handled in codegen? Is it an error or does the back-end
>>> handle it? Whatever's done today for "normal" inline
asm is what I *think*
>>> should be the behavior for the inline asm callbr variant. If this
doesn't
>>> seem sensible (and I realize that I may be thinking of an "in
a perfect
>>> world" scenario), then we'll need to come up with a more
sensible solution
>>> which may be to disallow the values on the error block until we can
think
>>> of a better way to handle them.
>>>
>>
>> This example is no problem, because instructions can be emitted between
>> what's emitted by "call asm" and the end of the block (be
it a fallthrough,
>> or a jump instruction. What gets emitted there is a move of the output
>> register to another location -- either a register or to the stack. And
>> therefore at the beginning of the "end" block,
"%vals" is always in a
>> consistent location, no matter how you got to that block.
>>
>> But in the callbr case, there is not a location at which those moves
can
>> be emitted, after the callbr, before the jump to "error".
>>
>
> I see what you mean. Let's say we create a pseudo-instruction (similar
to
> landingpad, et al) that needs to be lowered by the backend in a reasonable
> manner. The EH stuff has an external process/library that performs the
> actual unwinding and which sets the values accordingly. We won't have
this.
>

> What we could do instead is split the edges and insert the
copy-to-<where
> ever> statements there.
>
Exactly -- except that doing that is potentially an invalid transform,
because the address is being used as a value, not simply a jump target. The
label list is just a list of _possible_ jump targets, changing those won't
actually affect anything. You'd instead need to change the blockaddress
constant, but in the general case you don't know where that address came
from -- (and it may therefore be required that you have the same address
for two separate callbr instructions).

I guess this kinda touches on some of the same issues as in the other
discussion about the handling of the blockaddress in callbr and inlining,
etc...

I wonder if we could put some validity restrictions on the IR structure,
rather than trying to fix things up after the fact by attempting to split
blocks. E.g., we could state that it's invalid to have a phi which uses the
value defined by a callbr, if it's conditioned on that same block as
predecessor.  That is: it's valid to use _other_ values defined in the
block ending in callbr, because they can be moved prior to the callbr. It's
also valid to use the value defined by the callbr in a phi conditioned on
some other intermediate block as predecessor, because then any required
moves can happen in the intermediate block.

I believe such an IR restriction should be sufficient to make it possible
to emit valid code from the IR in all cases, but I'm a bit afraid of how
badly adding such odd edge-cases might screw up the rest of the compiler
and optimizer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/46c7b13d/attachment.html>

Finkel, Hal J. via llvm-dev

2019-Jun-29 00:39 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On 6/28/19 5:35 PM, James Y Knight via llvm-dev wrote:

On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <isanbard at
gmail.com<mailto:isanbard at gmail.com>> wrote:
On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at
google.com<mailto:jyknight at google.com>> wrote:
On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at
gmail.com<mailto:isanbard at gmail.com>> wrote:
On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at
gmail.com<mailto:isanbard at gmail.com>> wrote:
On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com<mailto:jyknight at google.com>> wrote:
I think this is fine, except that it stops at the point where things actually
start to get interesting and tricky.

How will you actually handle the flow of values from the callbr into the error
blocks? A callbr can specify requirements on where its outputs live. So, what if
two callbr, in different branches of code, specify _different_ constraints for
the same output, and list the same block as a possible error successor? How can
the resulting phi be codegened?

This is where I fall back on the statement about how "the programmer knows
what they're doing". Perhaps I'm being too cavalier here? My
concern, if you want to call it that, is that we don't be too restrictive on
the new behavior. For example, the "asm goto" may set a register to an
error value (made up on the spot; may not be a common use). But, if there's
no real reason to have the value be valid on the abnormal path, then sure we can
declare that it's not valid on the abnormal path.

I think I should explain my "programmer knows what they're doing"
statement a bit better. I'm specifically referring to inline asm here. The
more general "callbr" case may still need to be considered (see
Reid's reply).

When a programmer uses inline asm, they're implicitly telling the compiler
that they *do* know what they're doing (I know this is common knowledge,
but I wanted to reiterate it.). In particular, either they need to reference an
instruction not readily available from the compiler (e.g. "cpuid") or
the compiler isn't able to give them the needed performance in a critical
section. I'm extending this sentiment to callbr with output constraints.
Let's take your example below and write it as "normal" asm
statements one on each branch of an if-then-else (please ignore any syntax
errors):

if:
br i1 %cmp, label %true, label %false

true:
%0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
br label %end

false:
%1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}" ()
br label %end

end:
%vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle it?
Whatever's done today for "normal" inline asm is what I *think*
should be the behavior for the inline asm callbr variant. If this doesn't
seem sensible (and I realize that I may be thinking of an "in a perfect
world" scenario), then we'll need to come up with a more sensible
solution which may be to disallow the values on the error block until we can
think of a better way to handle them.

This example is no problem, because instructions can be emitted between
what's emitted by "call asm" and the end of the block (be it a
fallthrough, or a jump instruction. What gets emitted there is a move of the
output register to another location -- either a register or to the stack. And
therefore at the beginning of the "end" block, "%vals" is
always in a consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be
emitted, after the callbr, before the jump to "error".

I see what you mean. Let's say we create a pseudo-instruction (similar to
landingpad, et al) that needs to be lowered by the backend in a reasonable
manner. The EH stuff has an external process/library that performs the actual
unwinding and which sets the values accordingly. We won't have this.

What we could do instead is split the edges and insert the copy-to-<where
ever> statements there.

Exactly -- except that doing that is potentially an invalid transform, because
the address is being used as a value, not simply a jump target. The label list
is just a list of _possible_ jump targets, changing those won't actually
affect anything. You'd instead need to change the blockaddress constant, but
in the general case you don't know where that address came from -- (and it
may therefore be required that you have the same address for two separate callbr
instructions).

I guess this kinda touches on some of the same issues as in the other discussion
about the handling of the blockaddress in callbr and inlining, etc...

I wonder if we could put some validity restrictions on the IR structure, rather
than trying to fix things up after the fact by attempting to split blocks. E.g.,
we could state that it's invalid to have a phi which uses the value defined
by a callbr, if it's conditioned on that same block as predecessor. That
is: it's valid to use _other_ values defined in the block ending in callbr,
because they can be moved prior to the callbr. It's also valid to use the
value defined by the callbr in a phi conditioned on some other intermediate
block as predecessor, because then any required moves can happen in the
intermediate block.

I believe such an IR restriction should be sufficient to make it possible to
emit valid code from the IR in all cases, but I'm a bit afraid of how badly
adding such odd edge-cases might screw up the rest of the compiler and
optimizer.

I think that your fear is justified.

In any case, if we're going to support forming this kind of callbr in Clang,
then Clang still needs a place to put the stack stores after the inline asm in
order to represent the output constraints - which are specified in terms of
source-level variables and those are always in stack locations when Clang is
generating IR. I think that we can make all of this work if we say that the
output constraints, and thus the outputs of the callbr, dominate only uses on
the normal "fallthrough" branch. Then the compiler has a single place
to put the stores (and, later, a place to put register copies, etc.).

-Hal

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190629/9065c7f5/attachment-0001.html>

Bill Wendling via llvm-dev

2019-Jul-01 18:38 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 3:35 PM James Y Knight <jyknight at google.com>
wrote:
> On Fri, Jun 28, 2019 at 5:53 PM Bill Wendling <isanbard at gmail.com>
wrote:
>
>> On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at
google.com>
>> wrote:
>>
>>> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at
gmail.com>
>>> wrote:
>>>
>>>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at
gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight
at google.com>
>>>>> wrote:
>>>>>
>>>>>> I think this is fine, except that it stops at the point
where things
>>>>>> actually start to get interesting and tricky.
>>>>>>
>>>>>> How will you actually handle the flow of values from
the callbr into
>>>>>> the error blocks? A callbr can specify requirements on
where its outputs
>>>>>> live. So, what if two callbr, in different branches of
code, specify
>>>>>> _different_ constraints for the same output, and list
the same block as a
>>>>>> possible error successor? How can the resulting phi be
codegened?
>>>>>>
>>>>>> This is where I fall back on the statement about how
"the programmer
>>>>> knows what they're doing". Perhaps I'm being
too cavalier here? My concern,
>>>>> if you want to call it that, is that we don't be too
restrictive on the new
>>>>> behavior. For example, the "asm goto" may set a
register to an error value
>>>>> (made up on the spot; may not be a common use). But, if
there's no real
>>>>> reason to have the value be valid on the abnormal path,
then sure we can
>>>>> declare that it's not valid on the abnormal path.
>>>>>
>>>>> I think I should explain my "programmer knows what
they're doing"
>>>> statement a bit better. I'm specifically referring to
inline asm here. The
>>>> more general "callbr" case may still need to be
considered (see Reid's
>>>> reply).
>>>>
>>>> When a programmer uses inline asm, they're implicitly
telling the
>>>> compiler that they *do* know what they're doing  (I know
this is common
>>>> knowledge, but I wanted to reiterate it.). In particular,
either they need
>>>> to reference an instruction not readily available from the
compiler (e.g.
>>>> "cpuid") or the compiler isn't able to give them
the needed performance in
>>>> a critical section. I'm extending this sentiment to callbr
with output
>>>> constraints. Let's take your example below and write it as
"normal" asm
>>>> statements one on each branch of an if-then-else (please ignore
any syntax
>>>> errors):
>>>>
>>>> if:
>>>>   br i1 %cmp, label %true, label %false
>>>>
>>>> true:
>>>>   %0 = call { i32, i32 } asm sideeffect "poetry $0,
$1", "={r8},={r9}"
>>>> ()
>>>>   br label %end
>>>>
>>>> false:
>>>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0,
$1",
>>>> "={r10},={r11}" ()
>>>>   br label %end
>>>>
>>>> end:
>>>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>>>
>>>> How is this handled in codegen? Is it an error or does the
back-end
>>>> handle it? Whatever's done today for "normal"
inline asm is what I *think*
>>>> should be the behavior for the inline asm callbr variant. If
this doesn't
>>>> seem sensible (and I realize that I may be thinking of an
"in a perfect
>>>> world" scenario), then we'll need to come up with a
more sensible solution
>>>> which may be to disallow the values on the error block until we
can think
>>>> of a better way to handle them.
>>>>
>>>
>>> This example is no problem, because instructions can be emitted
between
>>> what's emitted by "call asm" and the end of the block
(be it a fallthrough,
>>> or a jump instruction. What gets emitted there is a move of the
output
>>> register to another location -- either a register or to the stack.
And
>>> therefore at the beginning of the "end" block,
"%vals" is always in a
>>> consistent location, no matter how you got to that block.
>>>
>>> But in the callbr case, there is not a location at which those
moves can
>>> be emitted, after the callbr, before the jump to "error".
>>>
>>
>> I see what you mean. Let's say we create a pseudo-instruction
(similar to
>> landingpad, et al) that needs to be lowered by the backend in a
reasonable
>> manner. The EH stuff has an external process/library that performs the
>> actual unwinding and which sets the values accordingly. We won't
have this.
>>
>
>
>
>> What we could do instead is split the edges and insert the
copy-to-<where
>> ever> statements there.
>>
>
> Exactly -- except that doing that is potentially an invalid transform,
> because the address is being used as a value, not simply a jump target. The
> label list is just a list of _possible_ jump targets, changing those
won't
> actually affect anything. You'd instead need to change the blockaddress
> constant, but in the general case you don't know where that address
came
> from -- (and it may therefore be required that you have the same address
> for two separate callbr instructions).
>
> I guess this kinda touches on some of the same issues as in the other
> discussion about the handling of the blockaddress in callbr and inlining,
> etc...
>
> I wonder if we could put some validity restrictions on the IR structure,
> rather than trying to fix things up after the fact by attempting to split
> blocks. E.g., we could state that it's invalid to have a phi which uses
the
> value defined by a callbr, if it's conditioned on that same block as
> predecessor.  That is: it's valid to use _other_ values defined in the
> block ending in callbr, because they can be moved prior to the callbr.
It's
> also valid to use the value defined by the callbr in a phi conditioned on
> some other intermediate block as predecessor, because then any required
> moves can happen in the intermediate block.
>
> I believe such an IR restriction should be sufficient to make it possible
> to emit valid code from the IR in all cases, but I'm a bit afraid of
how
> badly adding such odd edge-cases might screw up the rest of the compiler
> and optimizer.
>
I want to revisit this. Here are the situations we're confronted with:


   1. The goto-target can be jumped to by 1 callbr instruction,
   2. The goto-target can be jumped to by N callbr instructions, which
   don't need a PHI node, and
   3. The goto-target can be jumped to by N callbr instructions, which *do*
   need a PHI node.

I'm going to plug the instruction I created out of thin air a few emails
back, but better explain (I'm using an instruction instead of an intrinsic
because we want that instruction to be right after all non-PHI instructions
in the goto-target block). I'm _not_ suggesting we need this instruction.
It's just for demonstration purposes.


Situations (1) and (2) don't encounter an problem. Any value used in the
goto-target can be handled by inserting the code to extract that value in
the goto-target block:


bb1:

  ...

  %x.bb1 = callbr i32 asm sideeffect "...", "=r,X"(i32 *%x*,
i8*
blockaddress(@bar, %goto.target))

          to label %fallthrough1 [label %goto.target]


fallthrough1:

  ...


bb2:

  ...

  %y.bb2 = callbr i32 asm sideeffect "...", "=r,X"(i32 *%y*,
i8*
blockaddress(@bar, %goto.target))

          to label %fallthrough2 [label %goto.target]


fallthrough2:

  ...


goto.target:

  %x.goto = <extract value from %x.bb1>

  %y.goto = <extract value from %y.bb2>

  ... <uses of %x.goto and %y.goto> ...


This leaves situation (3), which is far more complex as we've seen. To
reiterate, the issue here is that we need to extract the values returned by
callbr. This would typically be done by using a PHI node, but llvm may want
to split critical edges or push the calculation back to the predecessor
block, which won't work with the callbr asm, because it could branch out of
the asm at any point thus skipping the extraction. So we can't use PHI
nodes for these values. There are three classes of solutions to this:


   1. Don't allow the values to be used in goto-targets, or
   2. Allow them, but with significant restrictions, or
   3. Allow them without using PHI nodes.

Each has its benefits and drawbacks. As I've stated before, I think that
(1) is too restrictive, but if we can't come up with a good solution, it
may be our only option. Solution (2) could be a good compromise. However, I
want to propose a potential solution to (3).

The core of my proposal is to replace the PHI node with code that will
replicate its behavior without code lowering trying to modify the CFG (at
least not in ways that may invalidate the asm). Here is example code:


bb1:

  store i8* blockaddress(@bar, %bb1), i8** %src

  %x.bb1 = callbr i32 asm sideeffect "...", "=r,X"(i32 %x,
i8*
blockaddress(@bar, %goto.target))

          to label %fallthrough1 [label %goto.target]


fallthrough1:

  ...


bb2:

  store i8* blockaddress(@bar, %bb2), i8** %src

  %x.bb2 = callbr i32 asm sideeffect "...", "=r,X"(i32 %x,
i8*
blockaddress(@bar, %goto.target))

          to label %fallthrough2 [label %goto.target]


fallthrough2:

  ...


goto.target:

  %x1 = indirectval i8** %src, i32 [%x.bb1, %bb1], [%x.bb2, %bb2]

  <extract values from %x1>

  ...


This can be lowered to this:

bb1:

  store i8* blockaddress(@bar, %bb1), i8** %src

  %x.bb1 = callbr i32 asm sideeffect "...", "=r,X"(i32 %x,
i8*
blockaddress(@bar, %error))

          to label %fallthrough1 [label %goto.target]


fallthrough1:

  ...


bb2:

  store i8* blockaddress(@bar, %bb2), i8** %src

  %x.bb2 = callbr i32 asm sideeffect "...", "=r,X"(i32 %x,
i8*
blockaddress(@bar, %error))

          to label %fallthrough2 [label %goto.target]


fallthrough2:

  ...


goto.target:

  %src1 = load i8**, i8*** @src

  %src.bb = load i8*, i8** %src1

  switch i64 %src.bb, label %goto.target.body [ ; or if-then-else blocks

      i64 ptrtoint i8* blockaddress(@bar, %bb1) to i64, label
%goto.target.bb1

      i64 ptrtoint i8* blockaddress(@bar, %bb2) to i64, label
%goto.target.bb2

  ]


goto.target.bb1:

  %x1 = <extract value from %x.bb1>

  br label %goto.target.body


goto.target.bb2:

  %x2 = <extract value from %x.bb2>

  br label %goto.target.body


goto.target.body:

  %x.merge = phi i64 [%x1, label %goto.target.bb1], [%x1, label
%goto.target.bb2]

  ...


With this, we don't change any values used by the callbr instructions, and
the return values are extracted correctly. This has the unsavory issue of
using stores and loads, but this may be the price we need to pay.


Thoughts?


-bw
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190701/20ab0d9d/attachment-0001.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jul 2019 - [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

Maybe Matching Threads