thr3ads.net - llvm dev - [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Matthias Braun via llvm-dev

2017-Apr-04 18:12 UTC

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

My two cents:

- I think inline assembly should work even if the compiler cannot parse the
contents. This would rule out msvc inline assembly (or alternatively put all the
parsing and interpretation burden on the frontend), but would work with gcc asm
goto which specifies possible targets separately.
- Supporting control flow in inline assembly by allowing jumps out of an
assembly block seems natural to me.
- Jumping into an inline assembly block seems like an unnecessary feature to me.
- To have this working in lib/CodeGen we would need an alternative opcode with
the terminator flag set. (There should also be opportunities to remodel some
instruction flags in the backend, to be part of the MachineInstr instead of the
opcode, but that is an orthogonal discussion to this)
- I don't foresee big problems in CodeGen, we should take a look on how
computed goto is implementation to find ways to reference arbitrary basic
blocks.
- The register allocator fails when the terminator instruction also writes a
register which is subsequently spilled (none of the existing targets does that,
but you could specify this situation in inline assembly).
- I'd always prefer intrinsics over inline assembly. Hey, why don't we
add a -Wassembly that warns on inline assembly usage and is enabled by
default...
- I still think inline assembly is valuable for new architecture
bringup/experimentation situations.

- Matthias
> On Apr 4, 2017, at 9:26 AM, Chandler Carruth via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Asm goto feature was introduces to GCC in order to optimize the support for
tracepoints in Linux kernel (it can be used for other things that do nop
patching).
> 
>  
> 
> GCC documentation describes their motivating example here:
> 
> 
> https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html
<https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html>
>  
>      #define TRACE1(NUM)                         \
>        do {                                      \
>          asm goto ("0: nop;"                     \
>                    ".pushsection trace_table;"   \
>                    ".long 0b, %l0;"              \
>                    ".popsection"                 \
>                    : : : : trace#NUM);           \
>          if (0) { trace#NUM: trace(); }          \
>        } while (0)
>      #define TRACE  TRACE1(__COUNTER__)
> In this example (which in fact inspired the asm goto feature) we want on
rare occasions to call the trace function; on other occasions we'd like to
keep the overhead to the absolute minimum. The normal code path consists of a
single nop instruction. However, we record the address of this nop together with
the address of a label that calls the tracefunction. This allows the nop
instruction to be patched at run time to be an unconditional branch to the
stored label. It is assumed that an optimizing compiler moves the labeled block
out of line, to optimize the fall through path from the asm.
> 
> Here is the Linux kernel RFC which discusses the old C way of implementing
it and the performance issues that were noticed.
> 
> It also states some performance numbers of the old C code vs. the asm goto:
> 
> https://lwn.net/Articles/350714/ <https://lwn.net/Articles/350714/>
>  
> This LTTng (Linux Trace Toolkit Next Generation) presentation talks about
using this feature as a way of optimize static tracepoints (slides 3-4)
>
https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf
<https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf>
> This presentation also mentions that a lot of other Linux applications use
this tracing mechanism.
> Thanks, this is exactly the kind of discussion that I think will help make
progress here.
> 
> I think this feature makes a lot of sense and is a really nice feature.
However, I think implementing it with inline assembly imposes a lot of really
unfortunate constraints on compilation -- it requires asm goto, pushsection and
popsection, etc.
> 
> I would much rather provide a much more direct way to represent a patchable
nop and the addresses of label within a function. For example, I could imagine
something like:
> 
> ```
>   if (0) { trace_call: /* code to call the trace function */ }
>   patch: __builtin_patchable_nop()
>   __builtin_save_labels(trace_call, patch)
> ```
> 
> But someone can probably design a much better way to represent this in
Clang. The advantages I see here (admittedly, mostly for the implementation in
Clang and LLVM):
> 
> 1) It allows Clang and LLVM to model this with running an assembler over
anything.
> 2) It doesn't require new terminators in LLVM's IR
> 3) We already have intrinsics in LLVM's IR that could easily be
extended to produce a nop.
> 4) It would be portable -- each backend could select an appropriate sized
nop to patch a jump into
> 
> Would this make sense?
>  
>   <>
> I believe we already have much of the infrastructure in place (using the
indirecbr instruction infrastructure).
> 
> We do need to make sure MachineBlockPlacement optimizes the fall through
path to make sure we can gain the performance for the nop patching.
> 
>  
> 
> Thanks,
> 
> Marina
> 
>  
> 
> From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc
at gmail.com>]
> Sent: Thursday, March 30, 2017 23:22
> To: Yatsina, Marina <marina.yatsina at intel.com
<mailto:marina.yatsina at intel.com>>; llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>; rnk at google.com <mailto:rnk at
google.com>; jyknight at google.com <mailto:jyknight at google.com>;
ehsan at mozilla.com <mailto:ehsan at mozilla.com>; rjmccall at apple.com
<mailto:rjmccall at apple.com>; mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>; matze at braunis.de <mailto:matze at
braunis.de>; Tayree, Coby <coby.tayree at intel.com <mailto:coby.tayree
at intel.com>>
> 
> 
> Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm
goto" in inline assembly
> 
> 
>  
> 
> Just responding to the motivation stuff as that remains an open question:
> 
>  
> 
> On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina.yatsina at
intel.com <mailto:marina.yatsina at intel.com>> wrote:
> 
> Linux kernel is using the “asm goto” feature,
> 
>  
> 
> But your original email indicated they have an alternative code path for
compilers that don't support it?
> 
>  
> 
> What might be compelling would be if there are serious performance problems
when using the other code path that cannot be addressed by less invasive (and
more general) improvements to LLVM. If this is the *only* way to get comparable
performance from the Linux Kernel, then I think that might be an interesting
discussion. But it would take a very careful and detailed analysis of why IMO.
> 
>  
> 
> other projects probably use it as well.
> 
>  
> 
> This is entirely possible, but I'd like to understand which projects
and why they use it rather than any of the alternatives before we impose the
implementation complexity on LLVM. At least that's my two cents.
> 
>  
> 
> -Chandler
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/c117f338/attachment.html>

John McCall via llvm-dev

2017-Apr-04 18:44 UTC

head link

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

> On Apr 4, 2017, at 2:12 PM, Matthias Braun <matze at braunis.de>
wrote:
> My two cents:
> 
> - I think inline assembly should work even if the compiler cannot parse the
contents. This would rule out msvc inline assembly (or alternatively put all the
parsing and interpretation burden on the frontend), but would work with gcc asm
goto which specifies possible targets separately.
> - Supporting control flow in inline assembly by allowing jumps out of an
assembly block seems natural to me.
> - Jumping into an inline assembly block seems like an unnecessary feature
to me.
> - To have this working in lib/CodeGen we would need an alternative opcode
with the terminator flag set. (There should also be opportunities to remodel
some instruction flags in the backend, to be part of the MachineInstr instead of
the opcode, but that is an orthogonal discussion to this)
> - I don't foresee big problems in CodeGen, we should take a look on how
computed goto is implementation to find ways to reference arbitrary basic
blocks.
> - The register allocator fails when the terminator instruction also writes
a register which is subsequently spilled (none of the existing targets does
that, but you could specify this situation in inline assembly).
> - I'd always prefer intrinsics over inline assembly. Hey, why don't
we add a -Wassembly that warns on inline assembly usage and is enabled by
default...
> - I still think inline assembly is valuable for new architecture
bringup/experimentation situations.
To me, this feels like a great example of "we really wanted a language
feature, but we figured out that we could hack it in using inline assembly in a
way that's ultimately significantly harder for the compiler to support than
a language feature, and now it's your problem."  I agree with Chandler
that we should just design and implement the language feature.

I would recommend:

  if (__builtin_patchable_branch("section name")) {
    trace();
  }

==>

  %0 = call i1 @llvm.patchable_branch(i8* @sectionNameString)
  br %0, ...

where @llvm.patchable_branch has the semantics of appending whatever patching
information is necessary to the given section such that, if you apply the patch,
it will change the result of the call from 0 to 1.  That can then typically be
pattern-matched in the backend to get the optimal codegen.

If I might recommend a better ABI for the patching information: consider using a
pair of relative pointers, one from the patching information to the patchable
instruction, and one from the patchable instruction to the new target.  That
would allow the patching information to be relocated at zero cost.

The actual details of how to apply the patch, and what the inline
patchable-instruction sequence needs to be in order to accept the patch, would
be target-specific.  The documented motivating example seems to assume that a
single nop is always big enough, which is pretty questionable.

This feature could be made potentially interesting to e.g. JIT authors by
allowing the patching information to be embellished with additional information
to identify the source branch.

John.


> 
> - Matthias
> 
>> On Apr 4, 2017, at 9:26 AM, Chandler Carruth via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> Asm goto feature was introduces to GCC in order to optimize the support
for tracepoints in Linux kernel (it can be used for other things that do nop
patching).
>> 
>>  
>> 
>> GCC documentation describes their motivating example here:
>> 
>> 
>> https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html
<https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html>
>>  
>>      #define TRACE1(NUM)                         \
>>        do {                                      \
>>          asm goto ("0: nop;"                     \
>>                    ".pushsection trace_table;"   \
>>                    ".long 0b, %l0;"              \
>>                    ".popsection"                 \
>>                    : : : : trace#NUM);           \
>>          if (0) { trace#NUM: trace(); }          \
>>        } while (0)
>>      #define TRACE  TRACE1(__COUNTER__)
>> In this example (which in fact inspired the asm goto feature) we want
on rare occasions to call the trace function; on other occasions we'd like
to keep the overhead to the absolute minimum. The normal code path consists of a
single nop instruction. However, we record the address of this nop together with
the address of a label that calls the tracefunction. This allows the nop
instruction to be patched at run time to be an unconditional branch to the
stored label. It is assumed that an optimizing compiler moves the labeled block
out of line, to optimize the fall through path from the asm.
>> 
>> Here is the Linux kernel RFC which discusses the old C way of
implementing it and the performance issues that were noticed.
>> 
>> It also states some performance numbers of the old C code vs. the asm
goto:
>> 
>> https://lwn.net/Articles/350714/
<https://lwn.net/Articles/350714/>
>>  
>> This LTTng (Linux Trace Toolkit Next Generation) presentation talks
about using this feature as a way of optimize static tracepoints (slides 3-4)
>>
https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf
<https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf>
>> This presentation also mentions that a lot of other Linux applications
use this tracing mechanism.
>> Thanks, this is exactly the kind of discussion that I think will help
make progress here.
>> 
>> I think this feature makes a lot of sense and is a really nice feature.
However, I think implementing it with inline assembly imposes a lot of really
unfortunate constraints on compilation -- it requires asm goto, pushsection and
popsection, etc.
>> 
>> I would much rather provide a much more direct way to represent a
patchable nop and the addresses of label within a function. For example, I could
imagine something like:
>> 
>> ```
>>   if (0) { trace_call: /* code to call the trace function */ }
>>   patch: __builtin_patchable_nop()
>>   __builtin_save_labels(trace_call, patch)
>> ```
>> 
>> But someone can probably design a much better way to represent this in
Clang. The advantages I see here (admittedly, mostly for the implementation in
Clang and LLVM):
>> 
>> 1) It allows Clang and LLVM to model this with running an assembler
over anything.
>> 2) It doesn't require new terminators in LLVM's IR
>> 3) We already have intrinsics in LLVM's IR that could easily be
extended to produce a nop.
>> 4) It would be portable -- each backend could select an appropriate
sized nop to patch a jump into
>> 
>> Would this make sense?
>>  
>>   <>
>> I believe we already have much of the infrastructure in place (using
the indirecbr instruction infrastructure).
>> 
>> We do need to make sure MachineBlockPlacement optimizes the fall
through path to make sure we can gain the performance for the nop patching.
>> 
>>  
>> 
>> Thanks,
>> 
>> Marina
>> 
>>  
>> 
>> From: Chandler Carruth [mailto:chandlerc at gmail.com
<mailto:chandlerc at gmail.com>]
>> Sent: Thursday, March 30, 2017 23:22
>> To: Yatsina, Marina <marina.yatsina at intel.com
<mailto:marina.yatsina at intel.com>>; llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>; rnk at google.com <mailto:rnk at
google.com>; jyknight at google.com <mailto:jyknight at google.com>;
ehsan at mozilla.com <mailto:ehsan at mozilla.com>; rjmccall at apple.com
<mailto:rjmccall at apple.com>; mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>; matze at braunis.de <mailto:matze at
braunis.de>; Tayree, Coby <coby.tayree at intel.com <mailto:coby.tayree
at intel.com>>
>> 
>> 
>> Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm
goto" in inline assembly
>> 
>> 
>>  
>> 
>> Just responding to the motivation stuff as that remains an open
question:
>> 
>>  
>> 
>> On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina.yatsina at
intel.com <mailto:marina.yatsina at intel.com>> wrote:
>> 
>> Linux kernel is using the “asm goto” feature,
>> 
>>  
>> 
>> But your original email indicated they have an alternative code path
for compilers that don't support it?
>> 
>>  
>> 
>> What might be compelling would be if there are serious performance
problems when using the other code path that cannot be addressed by less
invasive (and more general) improvements to LLVM. If this is the *only* way to
get comparable performance from the Linux Kernel, then I think that might be an
interesting discussion. But it would take a very careful and detailed analysis
of why IMO.
>> 
>>  
>> 
>> other projects probably use it as well.
>> 
>>  
>> 
>> This is entirely possible, but I'd like to understand which
projects and why they use it rather than any of the alternatives before we
impose the implementation complexity on LLVM. At least that's my two cents.
>> 
>>  
>> 
>> -Chandler
>> 
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/c9180b06/attachment.html>

James Y Knight via llvm-dev

2017-Apr-04 19:27 UTC

head link

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

On Tue, Apr 4, 2017 at 2:12 PM, Matthias Braun <matze at braunis.de>
wrote:
> - The register allocator fails when the terminator instruction also writes
> a register which is subsequently spilled (none of the existing targets does
> that, but you could specify this situation in inline assembly).
>
You can't actually have outputs from an asm goto in the GCC implementation
(and I'd suggest leaving that restriction in the LLVM implementation too if
it makes the implementation easier).
>From GCC docs<https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Extended-Asm.html>:
"This form of asm is restricted to not have outputs. This is due to a
internal restriction in the compiler that control transfer instructions
cannot have outputs. This restriction on asm goto may be lifted in some
future version of the compiler. In the meantime, asm goto may include a
memory clobber, and so leave outputs in memory."

You can still have register clobbers, which I suppose might trigger the
same failure case?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/434db0ec/attachment.html>

Matthias Braun via llvm-dev

2017-Apr-04 20:09 UTC

head link

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

> On Apr 4, 2017, at 12:27 PM, James Y Knight via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Tue, Apr 4, 2017 at 2:12 PM, Matthias Braun <matze at braunis.de
<mailto:matze at braunis.de>> wrote:
> - The register allocator fails when the terminator instruction also writes
a register which is subsequently spilled (none of the existing targets does
that, but you could specify this situation in inline assembly).
> 
> You can't actually have outputs from an asm goto in the GCC
implementation (and I'd suggest leaving that restriction in the LLVM
implementation too if it makes the implementation easier).
> 
> From GCC docs
<https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Extended-Asm.html>:
> "This form of asm is restricted to not have outputs. This is due to a
internal restriction in the compiler that control transfer instructions cannot
have outputs. This restriction on asm goto may be lifted in some future version
of the compiler. In the meantime, asm goto may include a memory clobber, and so
leave outputs in memory."Ah that is convenient :)
> 
> You can still have register clobbers, which I suppose might trigger the
same failure case?clobbers are less of a problem as there is no reason to spill a register that is
just clobbered by a regmask.

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/8b0661d4/attachment.html>

Matthias Braun via llvm-dev

2017-Apr-04 20:13 UTC

head link

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

> On Apr 4, 2017, at 11:44 AM, John McCall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> On Apr 4, 2017, at 2:12 PM, Matthias Braun <matze at braunis.de
<mailto:matze at braunis.de>> wrote:
>> My two cents:
>> 
>> - I think inline assembly should work even if the compiler cannot parse
the contents. This would rule out msvc inline assembly (or alternatively put all
the parsing and interpretation burden on the frontend), but would work with gcc
asm goto which specifies possible targets separately.
>> - Supporting control flow in inline assembly by allowing jumps out of
an assembly block seems natural to me.
>> - Jumping into an inline assembly block seems like an unnecessary
feature to me.
>> - To have this working in lib/CodeGen we would need an alternative
opcode with the terminator flag set. (There should also be opportunities to
remodel some instruction flags in the backend, to be part of the MachineInstr
instead of the opcode, but that is an orthogonal discussion to this)
>> - I don't foresee big problems in CodeGen, we should take a look on
how computed goto is implementation to find ways to reference arbitrary basic
blocks.
>> - The register allocator fails when the terminator instruction also
writes a register which is subsequently spilled (none of the existing targets
does that, but you could specify this situation in inline assembly).
>> - I'd always prefer intrinsics over inline assembly. Hey, why
don't we add a -Wassembly that warns on inline assembly usage and is enabled
by default...
>> - I still think inline assembly is valuable for new architecture
bringup/experimentation situations.
> 
> To me, this feels like a great example of "we really wanted a language
feature, but we figured out that we could hack it in using inline assembly in a
way that's ultimately significantly harder for the compiler to support than
a language feature, and now it's your problem."  I agree with Chandler
that we should just design and implement the language feature.
> 
> I would recommend:
> 
>   if (__builtin_patchable_branch("section name")) {
>     trace();
>   }
> 
> ==>
> 
>   %0 = call i1 @llvm.patchable_branch(i8* @sectionNameString)
>   br %0, ...
> 
> where @llvm.patchable_branch has the semantics of appending whatever
patching information is necessary to the given section such that, if you apply
the patch, it will change the result of the call from 0 to 1.  That can then
typically be pattern-matched in the backend to get the optimal codegen.
> 
> If I might recommend a better ABI for the patching information: consider
using a pair of relative pointers, one from the patching information to the
patchable instruction, and one from the patchable instruction to the new target.
That would allow the patching information to be relocated at zero cost.
> 
> The actual details of how to apply the patch, and what the inline
patchable-instruction sequence needs to be in order to accept the patch, would
be target-specific.  The documented motivating example seems to assume that a
single nop is always big enough, which is pretty questionable.
> 
> This feature could be made potentially interesting to e.g. JIT authors by
allowing the patching information to be embellished with additional information
to identify the source branch.
I completely agree that for this example we rather want a proper intrinsic. As a
matter of fact we have similar mechanism in CodeGen already to support the XRay
feature.

- Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/8746fa26/attachment.html>

llvm dev - Apr 2017 - [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly

[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly