thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Bill Wendling via llvm-dev

2019-Jun-28 19:00 UTC

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at gmail.com>
wrote:
> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com>
> wrote:
>
>> I think this is fine, except that it stops at the point where things
>> actually start to get interesting and tricky.
>>
>> How will you actually handle the flow of values from the callbr into
the
>> error blocks? A callbr can specify requirements on where its outputs
live.
>> So, what if two callbr, in different branches of code, specify
_different_
>> constraints for the same output, and list the same block as a possible
>> error successor? How can the resulting phi be codegened?
>>
>> This is where I fall back on the statement about how "the
programmer
> knows what they're doing". Perhaps I'm being too cavalier
here? My concern,
> if you want to call it that, is that we don't be too restrictive on the
new
> behavior. For example, the "asm goto" may set a register to an
error value
> (made up on the spot; may not be a common use). But, if there's no real
> reason to have the value be valid on the abnormal path, then sure we can
> declare that it's not valid on the abnormal path.
>
> I think I should explain my "programmer knows what they're
doing"statement a bit better. I'm specifically referring to inline asm here. The
more general "callbr" case may still need to be considered (see
Reid's
reply).

When a programmer uses inline asm, they're implicitly telling the compiler
that they *do* know what they're doing  (I know this is common knowledge,
but I wanted to reiterate it.). In particular, either they need to
reference an instruction not readily available from the compiler (e.g.
"cpuid") or the compiler isn't able to give them the needed
performance in
a critical section. I'm extending this sentiment to callbr with output
constraints. Let's take your example below and write it as
"normal" asm
statements one on each branch of an if-then-else (please ignore any syntax
errors):

if:
  br i1 %cmp, label %true, label %false

true:
  %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
  br label %end

false:
  %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}" ()
  br label %end

end:
  %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]

How is this handled in codegen? Is it an error or does the back-end handle
it? Whatever's done today for "normal" inline asm is what I
*think* should
be the behavior for the inline asm callbr variant. If this doesn't seem
sensible (and I realize that I may be thinking of an "in a perfect
world"
scenario), then we'll need to come up with a more sensible solution which
may be to disallow the values on the error block until we can think of a
better way to handle them.

-bw

> It'd sure be a whole lot easier to not have the values valid on the
>> secondary exit blocks. Can you present examples where preserving the
values
>> on the branches is be a requirement? (I feel like I've seen some
before,
>> but it'd be good to be reminded).
>>
>> E.g., imagine code like this:
>>
>> <<
>> entry:
>>   br i1 %cmp, label %true, label %false
>> true:
>>   %0 = callbr { i32, i32 } asm sideeffect "poetry $0, $1",
>> "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to label
>> %asm.fallthrough [label %error]
>> false:
>>   %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0, $1",
>> "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to label
>> %asm.fallthrough [label %error]
>>
>> error:
>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>> >>
>>
>> Normally, if a common register cannot be found to use across relevant
>> block transitions, it can simply fall back on storing values on the
stack.
>> But, that's not possible with callbr, since the location is fixed
by the
>> asm, and no code can be inserted after the values are written, before
the
>> branch (as both value writes and the branch are inside the asm blob).
So
>> what can be done, in that case?
>>
>> One thing you might be able to do is to duplicate the error block so
you
>> have a different target for every callbr, but I'd consider that an
invalid
>> transform (because the address of the block is potentially being used
as a
>> value in the asm too).
>>
>> Another thing you could perhaps do is reify the source-block-number as
an
>> actual value -- storing a "1" before the callbr in true, and
storing a "2"
>> before the callbr in "false". Then conditional-branch based
on that...but
>> that's real ugly...
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/e08ca2dd/attachment.html>

Reid Kleckner via llvm-dev

2019-Jun-28 20:01 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 12:00 PM Bill Wendling via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> I think I should explain my "programmer knows what they're
doing"
> statement a bit better. I'm specifically referring to inline asm here.
The
> more general "callbr" case may still need to be considered (see
Reid's
> reply).
>
> When a programmer uses inline asm, they're implicitly telling the
compiler
> that they *do* know what they're doing  (I know this is common
knowledge,
> but I wanted to reiterate it.). In particular, either they need to
> reference an instruction not readily available from the compiler (e.g.
> "cpuid") or the compiler isn't able to give them the needed
performance in
> a critical section. I'm extending this sentiment to callbr with output
> constraints. Let's take your example below and write it as
"normal" asm
> statements one on each branch of an if-then-else (please ignore any syntax
> errors):
>
> if:
>   br i1 %cmp, label %true, label %false
>
> true:
>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
>   br label %end
>
> false:
>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}"
> ()
>   br label %end
>
> end:
>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>
> How is this handled in codegen? Is it an error or does the back-end handle
> it? Whatever's done today for "normal" inline asm is what I
*think* should
> be the behavior for the inline asm callbr variant. If this doesn't seem
> sensible (and I realize that I may be thinking of an "in a perfect
world"
> scenario), then we'll need to come up with a more sensible solution
which
> may be to disallow the values on the error block until we can think of a
> better way to handle them.
>
I guess distinguishing between callbr and asm goto is reasonable. We can
tolerate optionally initialized outputs for inline asm. It's just the same
as having an output constraint register that you forget to write in the asm
blob. However, it would be good if callbr had some way to represent whether
the returned value is alive along any particular outgoing edge.

I mentioned that we could look to landingpad for inspiration here. I
mention it because it is, essentially, the alternate exceptional return
value of a possibly throwing call. The values it produces are carried in
the usual X86 return registers, RAX:RDX, so they really are kind of an
alternate return value. However, with asm goto, it's not possible to have
different output constraints along different edges, so after thinking about
it some more, I think this is overkill. It's just one way we could
implement that live value indication, and I think it's probably not as good
as changing callbr itself.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/c17b6f62/attachment.html>

James Y Knight via llvm-dev

2019-Jun-28 20:48 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at gmail.com>
wrote:
> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at gmail.com>
wrote:
>
>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com>
>> wrote:
>>
>>> I think this is fine, except that it stops at the point where
things
>>> actually start to get interesting and tricky.
>>>
>>> How will you actually handle the flow of values from the callbr
into the
>>> error blocks? A callbr can specify requirements on where its
outputs live.
>>> So, what if two callbr, in different branches of code, specify
_different_
>>> constraints for the same output, and list the same block as a
possible
>>> error successor? How can the resulting phi be codegened?
>>>
>>> This is where I fall back on the statement about how "the
programmer
>> knows what they're doing". Perhaps I'm being too cavalier
here? My concern,
>> if you want to call it that, is that we don't be too restrictive on
the new
>> behavior. For example, the "asm goto" may set a register to
an error value
>> (made up on the spot; may not be a common use). But, if there's no
real
>> reason to have the value be valid on the abnormal path, then sure we
can
>> declare that it's not valid on the abnormal path.
>>
>> I think I should explain my "programmer knows what they're
doing"
> statement a bit better. I'm specifically referring to inline asm here.
The
> more general "callbr" case may still need to be considered (see
Reid's
> reply).
>
> When a programmer uses inline asm, they're implicitly telling the
compiler
> that they *do* know what they're doing  (I know this is common
knowledge,
> but I wanted to reiterate it.). In particular, either they need to
> reference an instruction not readily available from the compiler (e.g.
> "cpuid") or the compiler isn't able to give them the needed
performance in
> a critical section. I'm extending this sentiment to callbr with output
> constraints. Let's take your example below and write it as
"normal" asm
> statements one on each branch of an if-then-else (please ignore any syntax
> errors):
>
> if:
>   br i1 %cmp, label %true, label %false
>
> true:
>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
>   br label %end
>
> false:
>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}"
> ()
>   br label %end
>
> end:
>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>
> How is this handled in codegen? Is it an error or does the back-end handle
> it? Whatever's done today for "normal" inline asm is what I
*think* should
> be the behavior for the inline asm callbr variant. If this doesn't seem
> sensible (and I realize that I may be thinking of an "in a perfect
world"
> scenario), then we'll need to come up with a more sensible solution
which
> may be to disallow the values on the error block until we can think of a
> better way to handle them.
>
This example is no problem, because instructions can be emitted between
what's emitted by "call asm" and the end of the block (be it a
fallthrough,
or a jump instruction. What gets emitted there is a move of the output
register to another location -- either a register or to the stack. And
therefore at the beginning of the "end" block, "%vals" is
always in a
consistent location, no matter how you got to that block.

But in the callbr case, there is not a location at which those moves can be
emitted, after the callbr, before the jump to "error".

>
-bw>
>
>> It'd sure be a whole lot easier to not have the values valid on the
>>> secondary exit blocks. Can you present examples where preserving
the values
>>> on the branches is be a requirement? (I feel like I've seen
some before,
>>> but it'd be good to be reminded).
>>>
>>> E.g., imagine code like this:
>>>
>>> <<
>>> entry:
>>>   br i1 %cmp, label %true, label %false
>>> true:
>>>   %0 = callbr { i32, i32 } asm sideeffect "poetry $0,
$1",
>>> "={r8},={r9},X" (i8* blockaddress(@vogon, %error)) to
label
>>> %asm.fallthrough [label %error]
>>> false:
>>>   %1 = callbr { i32, i32 } asm sideeffect "poetry2 $0,
$1",
>>> "={r10},={r11},X" (i8* blockaddress(@vogon, %error)) to
label
>>> %asm.fallthrough [label %error]
>>>
>>> error:
>>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>> >>
>>>
>>> Normally, if a common register cannot be found to use across
relevant
>>> block transitions, it can simply fall back on storing values on the
stack.
>>> But, that's not possible with callbr, since the location is
fixed by the
>>> asm, and no code can be inserted after the values are written,
before the
>>> branch (as both value writes and the branch are inside the asm
blob). So
>>> what can be done, in that case?
>>>
>>> One thing you might be able to do is to duplicate the error block
so you
>>> have a different target for every callbr, but I'd consider that
an invalid
>>> transform (because the address of the block is potentially being
used as a
>>> value in the asm too).
>>>
>>> Another thing you could perhaps do is reify the source-block-number
as
>>> an actual value -- storing a "1" before the callbr in
true, and storing a
>>> "2" before the callbr in "false". Then
conditional-branch based on
>>> that...but that's real ugly...
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/8f90e9d6/attachment-0001.html>

Bill Wendling via llvm-dev

2019-Jun-28 21:53 UTC

head link

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

On Fri, Jun 28, 2019 at 1:48 PM James Y Knight <jyknight at google.com>
wrote:
> On Fri, Jun 28, 2019 at 3:00 PM Bill Wendling <isanbard at gmail.com>
wrote:
>
>> On Thu, Jun 27, 2019 at 1:44 PM Bill Wendling <isanbard at
gmail.com> wrote:
>>
>>> On Thu, Jun 27, 2019 at 1:29 PM James Y Knight <jyknight at
google.com>
>>> wrote:
>>>
>>>> I think this is fine, except that it stops at the point where
things
>>>> actually start to get interesting and tricky.
>>>>
>>>> How will you actually handle the flow of values from the callbr
into
>>>> the error blocks? A callbr can specify requirements on where
its outputs
>>>> live. So, what if two callbr, in different branches of code,
specify
>>>> _different_ constraints for the same output, and list the same
block as a
>>>> possible error successor? How can the resulting phi be
codegened?
>>>>
>>>> This is where I fall back on the statement about how "the
programmer
>>> knows what they're doing". Perhaps I'm being too
cavalier here? My concern,
>>> if you want to call it that, is that we don't be too
restrictive on the new
>>> behavior. For example, the "asm goto" may set a register
to an error value
>>> (made up on the spot; may not be a common use). But, if there's
no real
>>> reason to have the value be valid on the abnormal path, then sure
we can
>>> declare that it's not valid on the abnormal path.
>>>
>>> I think I should explain my "programmer knows what they're
doing"
>> statement a bit better. I'm specifically referring to inline asm
here. The
>> more general "callbr" case may still need to be considered
(see Reid's
>> reply).
>>
>> When a programmer uses inline asm, they're implicitly telling the
>> compiler that they *do* know what they're doing  (I know this is
common
>> knowledge, but I wanted to reiterate it.). In particular, either they
need
>> to reference an instruction not readily available from the compiler
(e.g.
>> "cpuid") or the compiler isn't able to give them the
needed performance in
>> a critical section. I'm extending this sentiment to callbr with
output
>> constraints. Let's take your example below and write it as
"normal" asm
>> statements one on each branch of an if-then-else (please ignore any
syntax
>> errors):
>>
>> if:
>>   br i1 %cmp, label %true, label %false
>>
>> true:
>>   %0 = call { i32, i32 } asm sideeffect "poetry $0, $1",
"={r8},={r9}" ()
>>   br label %end
>>
>> false:
>>   %1 = call { i32, i32 } asm sideeffect "poetry2 $0, $1",
"={r10},={r11}"
>> ()
>>   br label %end
>>
>> end:
>>   %vals = phi { i32, i32 } [ %0, %true ], [ %1, %false ]
>>
>> How is this handled in codegen? Is it an error or does the back-end
>> handle it? Whatever's done today for "normal" inline asm
is what I *think*
>> should be the behavior for the inline asm callbr variant. If this
doesn't
>> seem sensible (and I realize that I may be thinking of an "in a
perfect
>> world" scenario), then we'll need to come up with a more
sensible solution
>> which may be to disallow the values on the error block until we can
think
>> of a better way to handle them.
>>
>
> This example is no problem, because instructions can be emitted between
> what's emitted by "call asm" and the end of the block (be it
a fallthrough,
> or a jump instruction. What gets emitted there is a move of the output
> register to another location -- either a register or to the stack. And
> therefore at the beginning of the "end" block, "%vals"
is always in a
> consistent location, no matter how you got to that block.
>
> But in the callbr case, there is not a location at which those moves can
> be emitted, after the callbr, before the jump to "error".
>
I see what you mean. Let's say we create a pseudo-instruction (similar to
landingpad, et al) that needs to be lowered by the backend in a reasonable
manner. The EH stuff has an external process/library that performs the
actual unwinding and which sets the values accordingly. We won't have this.
What we could do instead is split the edges and insert the copy-to-<where
ever> statements there. So something like:
>>>
bb1:

  callbr ... [label %asm.goto.dest]


bb2:

  callbr ... [label %asm.goto.dest]


asm.goto.dest:

  ...

<<<

converted to something like:
>>>
bb1:

  callbr ... [label %asm.goto.dest.bb1]


bb2:

  callbr ... [label %asm.goto.dest.bb2]


asm.goto.dest.bb1:

  %v.bb1 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest.bb2:

  %v.bb2 = extractvalue ...

  br label %asm.goto.dest


asm.goto.dest:

  %v = phi [%v.bb1, label %asm.goto.dest.bb1], [%v.bb2, label %asm.goto.bb2]

  ...

  ...

<<<

It's not 100% not barfy, but it's what the compiler does in similar
situations.

-bw
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190628/5aa00255/attachment.html>

llvm dev - Jun 2019 - [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints

[llvm-dev] [cfe-dev] [RFC] ASM Goto With Output Constraints