Yatsina, Marina via llvm-dev
2017-Apr-04 13:07 UTC
[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching). GCC documentation describes their motivating example here: https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html #define TRACE1(NUM) \ do { \ asm goto ("0: nop;" \ ".pushsection trace_table;" \ ".long 0b, %l0;" \ ".popsection" \ : : : : trace#NUM); \ if (0) { trace#NUM: trace(); } \ } while (0) #define TRACE TRACE1(__COUNTER__) In this example (which in fact inspired the asm goto feature) we want on rare occasions to call the trace function; on other occasions we'd like to keep the overhead to the absolute minimum. The normal code path consists of a single nop instruction. However, we record the address of this nop together with the address of a label that calls the trace function. This allows the nop instruction to be patched at run time to be an unconditional branch to the stored label. It is assumed that an optimizing compiler moves the labeled block out of line, to optimize the fall through path from the asm. Here is the Linux kernel RFC which discusses the old C way of implementing it and the performance issues that were noticed. It also states some performance numbers of the old C code vs. the asm goto: https://lwn.net/Articles/350714/ This LTTng (Linux Trace Toolkit Next Generation) presentation talks about using this feature as a way of optimize static tracepoints (slides 3-4) https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf This presentation also mentions that a lot of other Linux applications use this tracing mechanism. I believe we already have much of the infrastructure in place (using the indirecbr instruction infrastructure). We do need to make sure MachineBlockPlacement optimizes the fall through path to make sure we can gain the performance for the nop patching. Thanks, Marina From: Chandler Carruth [mailto:chandlerc at gmail.com] Sent: Thursday, March 30, 2017 23:22 To: Yatsina, Marina <marina.yatsina at intel.com>; llvm-dev at lists.llvm.org; rnk at google.com; jyknight at google.com; ehsan at mozilla.com; rjmccall at apple.com; mehdi.amini at apple.com; matze at braunis.de; Tayree, Coby <coby.tayree at intel.com> Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly Just responding to the motivation stuff as that remains an open question: On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina.yatsina at intel.com<mailto:marina.yatsina at intel.com>> wrote: Linux kernel is using the “asm goto” feature, But your original email indicated they have an alternative code path for compilers that don't support it? What might be compelling would be if there are serious performance problems when using the other code path that cannot be addressed by less invasive (and more general) improvements to LLVM. If this is the *only* way to get comparable performance from the Linux Kernel, then I think that might be an interesting discussion. But it would take a very careful and detailed analysis of why IMO. other projects probably use it as well. This is entirely possible, but I'd like to understand which projects and why they use it rather than any of the alternatives before we impose the implementation complexity on LLVM. At least that's my two cents. -Chandler --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/9e4e52e6/attachment.html>
Chandler Carruth via llvm-dev
2017-Apr-04 16:26 UTC
[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Asm goto feature was introduces to GCC in order to optimize the support > for tracepoints in Linux kernel (it can be used for other things that do > nop patching). > > > > GCC documentation describes their motivating example here: > > > https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html > > > > #define TRACE1(NUM) \ > > do { \ > > asm goto ("0: nop;" \ > > ".pushsection trace_table;" \ > > ".long 0b, %l0;" \ > > ".popsection" \ > > : : : : trace#NUM); \ > > if (0) { trace#NUM: trace(); } \ > > } while (0) > > #define TRACE TRACE1(__COUNTER__) > > In this example (which in fact inspired the asm goto feature) we want on > rare occasions to call the trace function; on other occasions we'd like > to keep the overhead to the absolute minimum. The normal code path consists > of a single nop instruction. However, we record the address of this nop > together with the address of a label that calls the trace function. This > allows the nop instruction to be patched at run time to be an > unconditional branch to the stored label. It is assumed that an optimizing > compiler moves the labeled block out of line, to optimize the fall through > path from the asm. > > Here is the Linux kernel RFC which discusses the old C way of implementing > it and the performance issues that were noticed. > > It also states some performance numbers of the old C code vs. the asm goto: > > https://lwn.net/Articles/350714/ > > > > This LTTng (Linux Trace Toolkit Next Generation) presentation talks about using this feature as a way of optimize static tracepoints (slides 3-4) > > > https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf > > This presentation also mentions that a lot of other Linux applications use this tracing mechanism. > > Thanks, this is exactly the kind of discussion that I think will help makeprogress here. I think this feature makes a lot of sense and is a really nice feature. However, I think implementing it with inline assembly imposes a lot of really unfortunate constraints on compilation -- it requires asm goto, pushsection and popsection, etc. I would much rather provide a much more direct way to represent a patchable nop and the addresses of label within a function. For example, I could imagine something like: ``` if (0) { trace_call: /* code to call the trace function */ } patch: __builtin_patchable_nop() __builtin_save_labels(trace_call, patch) ``` But someone can probably design a much better way to represent this in Clang. The advantages I see here (admittedly, mostly for the implementation in Clang and LLVM): 1) It allows Clang and LLVM to model this with running an assembler over anything. 2) It doesn't require new terminators in LLVM's IR 3) We already have intrinsics in LLVM's IR that could easily be extended to produce a nop. 4) It would be portable -- each backend could select an appropriate sized nop to patch a jump into Would this make sense?> > > I believe we already have much of the infrastructure in place (using the > indirecbr instruction infrastructure). > > We do need to make sure MachineBlockPlacement optimizes the fall through > path to make sure we can gain the performance for the nop patching. > > > > Thanks, > > Marina > > > > *From:* Chandler Carruth [mailto:chandlerc at gmail.com] > *Sent:* Thursday, March 30, 2017 23:22 > *To:* Yatsina, Marina <marina.yatsina at intel.com>; llvm-dev at lists.llvm.org; > rnk at google.com; jyknight at google.com; ehsan at mozilla.com; rjmccall at apple.com; > mehdi.amini at apple.com; matze at braunis.de; Tayree, Coby < > coby.tayree at intel.com> > > > *Subject:* Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in > inline assembly > > > > Just responding to the motivation stuff as that remains an open question: > > > > On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina.yatsina at intel.com> > wrote: > > Linux kernel is using the “asm goto” feature, > > > > But your original email indicated they have an alternative code path for > compilers that don't support it? > > > > What might be compelling would be if there are serious performance > problems when using the other code path that cannot be addressed by less > invasive (and more general) improvements to LLVM. If this is the *only* way > to get comparable performance from the Linux Kernel, then I think that > might be an interesting discussion. But it would take a very careful and > detailed analysis of why IMO. > > > > other projects probably use it as well. > > > > This is entirely possible, but I'd like to understand which projects and > why they use it rather than any of the alternatives before we impose the > implementation complexity on LLVM. At least that's my two cents. > > > > -Chandler > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/cd28b8a7/attachment.html>
Matthias Braun via llvm-dev
2017-Apr-04 18:12 UTC
[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
My two cents: - I think inline assembly should work even if the compiler cannot parse the contents. This would rule out msvc inline assembly (or alternatively put all the parsing and interpretation burden on the frontend), but would work with gcc asm goto which specifies possible targets separately. - Supporting control flow in inline assembly by allowing jumps out of an assembly block seems natural to me. - Jumping into an inline assembly block seems like an unnecessary feature to me. - To have this working in lib/CodeGen we would need an alternative opcode with the terminator flag set. (There should also be opportunities to remodel some instruction flags in the backend, to be part of the MachineInstr instead of the opcode, but that is an orthogonal discussion to this) - I don't foresee big problems in CodeGen, we should take a look on how computed goto is implementation to find ways to reference arbitrary basic blocks. - The register allocator fails when the terminator instruction also writes a register which is subsequently spilled (none of the existing targets does that, but you could specify this situation in inline assembly). - I'd always prefer intrinsics over inline assembly. Hey, why don't we add a -Wassembly that warns on inline assembly usage and is enabled by default... - I still think inline assembly is valuable for new architecture bringup/experimentation situations. - Matthias> On Apr 4, 2017, at 9:26 AM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > Asm goto feature was introduces to GCC in order to optimize the support for tracepoints in Linux kernel (it can be used for other things that do nop patching). > > > > GCC documentation describes their motivating example here: > > > https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html <https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html> > > #define TRACE1(NUM) \ > do { \ > asm goto ("0: nop;" \ > ".pushsection trace_table;" \ > ".long 0b, %l0;" \ > ".popsection" \ > : : : : trace#NUM); \ > if (0) { trace#NUM: trace(); } \ > } while (0) > #define TRACE TRACE1(__COUNTER__) > In this example (which in fact inspired the asm goto feature) we want on rare occasions to call the trace function; on other occasions we'd like to keep the overhead to the absolute minimum. The normal code path consists of a single nop instruction. However, we record the address of this nop together with the address of a label that calls the tracefunction. This allows the nop instruction to be patched at run time to be an unconditional branch to the stored label. It is assumed that an optimizing compiler moves the labeled block out of line, to optimize the fall through path from the asm. > > Here is the Linux kernel RFC which discusses the old C way of implementing it and the performance issues that were noticed. > > It also states some performance numbers of the old C code vs. the asm goto: > > https://lwn.net/Articles/350714/ <https://lwn.net/Articles/350714/> > > This LTTng (Linux Trace Toolkit Next Generation) presentation talks about using this feature as a way of optimize static tracepoints (slides 3-4) > https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf <https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_UsingTracing.pdf> > This presentation also mentions that a lot of other Linux applications use this tracing mechanism. > Thanks, this is exactly the kind of discussion that I think will help make progress here. > > I think this feature makes a lot of sense and is a really nice feature. However, I think implementing it with inline assembly imposes a lot of really unfortunate constraints on compilation -- it requires asm goto, pushsection and popsection, etc. > > I would much rather provide a much more direct way to represent a patchable nop and the addresses of label within a function. For example, I could imagine something like: > > ``` > if (0) { trace_call: /* code to call the trace function */ } > patch: __builtin_patchable_nop() > __builtin_save_labels(trace_call, patch) > ``` > > But someone can probably design a much better way to represent this in Clang. The advantages I see here (admittedly, mostly for the implementation in Clang and LLVM): > > 1) It allows Clang and LLVM to model this with running an assembler over anything. > 2) It doesn't require new terminators in LLVM's IR > 3) We already have intrinsics in LLVM's IR that could easily be extended to produce a nop. > 4) It would be portable -- each backend could select an appropriate sized nop to patch a jump into > > Would this make sense? > > <> > I believe we already have much of the infrastructure in place (using the indirecbr instruction infrastructure). > > We do need to make sure MachineBlockPlacement optimizes the fall through path to make sure we can gain the performance for the nop patching. > > > > Thanks, > > Marina > > > > From: Chandler Carruth [mailto:chandlerc at gmail.com <mailto:chandlerc at gmail.com>] > Sent: Thursday, March 30, 2017 23:22 > To: Yatsina, Marina <marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>>; llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>; rnk at google.com <mailto:rnk at google.com>; jyknight at google.com <mailto:jyknight at google.com>; ehsan at mozilla.com <mailto:ehsan at mozilla.com>; rjmccall at apple.com <mailto:rjmccall at apple.com>; mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>; matze at braunis.de <mailto:matze at braunis.de>; Tayree, Coby <coby.tayree at intel.com <mailto:coby.tayree at intel.com>> > > > Subject: Re: [llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly > > > > > Just responding to the motivation stuff as that remains an open question: > > > > On Thu, Mar 30, 2017 at 4:44 PM Yatsina, Marina <marina.yatsina at intel.com <mailto:marina.yatsina at intel.com>> wrote: > > Linux kernel is using the “asm goto” feature, > > > > But your original email indicated they have an alternative code path for compilers that don't support it? > > > > What might be compelling would be if there are serious performance problems when using the other code path that cannot be addressed by less invasive (and more general) improvements to LLVM. If this is the *only* way to get comparable performance from the Linux Kernel, then I think that might be an interesting discussion. But it would take a very careful and detailed analysis of why IMO. > > > > other projects probably use it as well. > > > > This is entirely possible, but I'd like to understand which projects and why they use it rather than any of the alternatives before we impose the implementation complexity on LLVM. At least that's my two cents. > > > > -Chandler > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/c117f338/attachment.html>
David Woodhouse via llvm-dev
2018-Feb-14 10:34 UTC
[llvm-dev] [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
On Tue, 2017-04-04 at 16:26 +0000, Chandler Carruth via llvm-dev wrote:> On Tue, Apr 4, 2017 at 6:07 AM Yatsina, Marina via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Asm goto feature was introduces to GCC in order to optimize the > > support for tracepoints in Linux kernel (it can be used for other > > things that do nop patching). > > > > GCC documentation describes their motivating example here: > > > > https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/Extended-Asm.html > > > > #define TRACE1(NUM) \ > > do { \ > > asm goto ("0: nop;" \ > > ".pushsection trace_table;" \ > > ".long 0b, %l0;" \ > > ".popsection" \ > > : : : : trace#NUM); \ > > if (0) { trace#NUM: trace(); } \ > > } while (0) > > #define TRACE TRACE1(__COUNTER__) > > In this example (which in fact inspired the asm goto feature) we > > want on rare occasions to call the trace function; on other > > occasions we'd like to keep the overhead to the absolute minimum. > > The normal code path consists of a single nop instruction. However, > > we record the address of this nop together with the address of a > > label that calls the trace function. This allows the nop > > instruction to be patched at run time to be an unconditional branch > > to the stored label. It is assumed that an optimizing compiler > > moves the labeled block out of line, to optimize the fall through > > path from the asm. > > Here is the Linux kernel RFC which discusses the old C way of > > implementing it and the performance issues that were noticed. > > It also states some performance numbers of the old C code vs. the > > asm goto: > > https://lwn.net/Articles/350714/ > > > > This LTTng (Linux Trace Toolkit Next Generation) presentation talks > > about using this feature as a way of optimize static tracepoints > > (slides 3-4) > > https://www.computer.org/cms/ComputingNow/HomePage/2011/0111/rW_SW_ > > UsingTracing.pdf > > This presentation also mentions that a lot of other Linux > > applications use this tracing mechanism. > > Thanks, this is exactly the kind of discussion that I think will help > make progress here. > > I think this feature makes a lot of sense and is a really nice > feature. However, I think implementing it with inline assembly > imposes a lot of really unfortunate constraints on compilation -- it > requires asm goto, pushsection and popsection, etc. > > I would much rather provide a much more direct way to represent a > patchable nop and the addresses of label within a function. For > example, I could imagine something like: > > ``` > if (0) { trace_call: /* code to call the trace function */ } > patch: __builtin_patchable_nop() > __builtin_save_labels(trace_call, patch) > ``` > > But someone can probably design a much better way to represent this > in Clang. The advantages I see here (admittedly, mostly for the > implementation in Clang and LLVM): > > 1) It allows Clang and LLVM to model this with running an assembler > over anything. > 2) It doesn't require new terminators in LLVM's IR > 3) We already have intrinsics in LLVM's IR that could easily be > extended to produce a nop. > 4) It would be portable -- each backend could select an appropriate > sized nop to patch a jump into > > Would this make sense?Let's not conflate the asm-goto part with the .pushsection/.popsection. The latter ("0: .pushsection foo; .long 0b; .popsection") is used *all* over the kernel to build up tables of code locations — for exception handling of instructions which might fault, as well as for runtime patching of instructions like the above. It's not always a nop vs. call alternative. It would be nice to have the compiler assist with that. We currently have code to trawl through all the built object files and find calls to __fentry__ so we can patch them in/out at runtime, for example. And we might considered doing the same for calls to the retpoline thunks. But I think we would be best served right now by considering that out of scope, and looking *only* at the part which is handled by 'asm goto'. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180214/880a4261/attachment-0001.bin>
Possibly Parallel Threads
- [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
- [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
- [inline-asm][asm-goto] Supporting "asm goto" in inline assembly
- [PATCH v2] x86, kbuild: revert macrofying inline assembly code
- [inline-asm][asm-goto] Supporting "asm goto" in inline assembly