Serge Rogatch via llvm-dev
2016-Jul-29 19:07 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
Thanks for pointing this out, Tim. Then maybe this approach is not the best choice for x86, though ideally measuring is needed, it is just that on ARM the current x86 approach is not applicable because ARM doesn't have a single return instruction (such as RETQ on x86_64), furthermore, the return instructions on ARM can be conditional. I have another question: what happens if the instrumented function (or its callees) throws an exception and doesn't catch? I understood that currently XRay will not report an exit from this function in such case because the function doesn't return with RETQ, but rather the stack unwinder jumps through functions calling the destructors of local variable objects. If so, why not to instrument the functions by placing a tracing object as the first local variable, with its constructor calling __xray_FunctionEntry and destructor calling __xray_FunctionExit ? Perhaps this approach requires changes in the front-end (C++ compiler, before emitting IR). Cheers. On 29 July 2016 at 21:00, Tim Northover <t.p.northover at gmail.com> wrote:> On 28 July 2016 at 16:14, Serge Rogatch via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Can I ask you why you chose to patch both function entrances and exits, > > rather than just patching the entrances and (in the patches) pushing on > the > > stack the address of __xray_FunctionExit , so that the user function > returns > > normally (with RETQ or POP RIP or whatever else instruction) rather than > > jumping into __xray_FunctionExit? > > > This approach should also be faster because smaller code better fits in > CPU > > cache, and patching itself should run faster (because there is less code > to > > modify). > > It may well be slower. Larger CPUs tend to track the call stack in > hardware and returning to an address pushed manually is an inevitable > branch mispredict in those cases. > > Cheers. > > Tim. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/ff713ce1/attachment.html>
Dean Michael Berris via llvm-dev
2016-Jul-30 06:45 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
> On 30 Jul 2016, at 05:07, Serge Rogatch via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Thanks for pointing this out, Tim. Then maybe this approach is not the best choice for x86, though ideally measuring is needed, it is just that on ARM the current x86 approach is not applicable because ARM doesn't have a single return instruction (such as RETQ on x86_64), furthermore, the return instructions on ARM can be conditional.I think this is a flaw in the current implementation. I long suspected that other platforms didn't have a single return instruction like x86, and I think it's worth changing this. In an early version of the implementation that made it into LLVM, I made the determination whether an instruction was a return instruction, and make that determination per-platform. We should make that happen so we can easier support ARM.> > I have another question: what happens if the instrumented function (or its callees) throws an exception and doesn't catch? I understood that currently XRay will not report an exit from this function in such case because the function doesn't return with RETQ, but rather the stack unwinder jumps through functions calling the destructors of local variable objects. > > If so, why not to instrument the functions by placing a tracing object as the first local variable, with its constructor calling __xray_FunctionEntry and destructor calling __xray_FunctionExit ? Perhaps this approach requires changes in the front-end (C++ compiler, before emitting IR).That's right -- the approach I've been thinking about (but haven't gotten to implementing yet, because of other priorities) has been to have special instrumentation at the throw and catch points. Similarly understanding things like tail call exits and sibling call optimisations and instrumenting those exit points is another thing that is yet to be implemented. Unfortunately changing the front-end to add an RAII or guard object will cause a whole lot of problems too. For instance, optimisations on the IR can move instructions before/after the guard object initialisation and then we end up with less accurate instrumentation. This also introduces potentially unwanted semantics and changes the stack layout and affecting the performance of an application with XRay sleds but instrumentation not enabled.> > Cheers. > > On 29 July 2016 at 21:00, Tim Northover <t.p.northover at gmail.com> wrote: > On 28 July 2016 at 16:14, Serge Rogatch via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Can I ask you why you chose to patch both function entrances and exits, > > rather than just patching the entrances and (in the patches) pushing on the > > stack the address of __xray_FunctionExit , so that the user function returns > > normally (with RETQ or POP RIP or whatever else instruction) rather than > > jumping into __xray_FunctionExit? > > > This approach should also be faster because smaller code better fits in CPU > > cache, and patching itself should run faster (because there is less code to > > modify). > > It may well be slower. Larger CPUs tend to track the call stack in > hardware and returning to an address pushed manually is an inevitable > branch mispredict in those cases. > > Cheers. > > Tim. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Serge Rogatch via llvm-dev
2016-Aug-03 20:27 UTC
[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.
Hi Dean, I have a question about the following piece of code in compiler-rt/trunk/lib/xray/xray_trampoline_x86.S : movq _ZN6__xray19XRayPatchedFunctionE(%rip), %rax testq %rax, %rax je .Ltmp0 // assume that %r10d has the function id. movl %r10d, %edi xor %esi,%esi callq *%rax What happens if someone unsets the handler function (i.e. calls __xray_remove_handler() ) or changes the handler (i.e. calls __xray_set_handler() with a different pointer to function) between "movq _ZN6__xray19XRayPatchedFunctionE(%rip), %rax" and "callq *%rax" ? I understood that the old handler will still be called, but isn't it a problem? What if the code removing the handler starts destructing some objects after that, so that a handler call would result in a crash or other bad things? The same concern for analogous code in __xray_FunctionExit . Cheers, Serge On 30 July 2016 at 09:45, Dean Michael Berris <dean.berris at gmail.com> wrote:> > > On 30 Jul 2016, at 05:07, Serge Rogatch via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Thanks for pointing this out, Tim. Then maybe this approach is not the > best choice for x86, though ideally measuring is needed, it is just that on > ARM the current x86 approach is not applicable because ARM doesn't have a > single return instruction (such as RETQ on x86_64), furthermore, the return > instructions on ARM can be conditional. > > I think this is a flaw in the current implementation. I long suspected > that other platforms didn't have a single return instruction like x86, and > I think it's worth changing this. > > In an early version of the implementation that made it into LLVM, I made > the determination whether an instruction was a return instruction, and make > that determination per-platform. > > We should make that happen so we can easier support ARM. > > > > > I have another question: what happens if the instrumented function (or > its callees) throws an exception and doesn't catch? I understood that > currently XRay will not report an exit from this function in such case > because the function doesn't return with RETQ, but rather the stack > unwinder jumps through functions calling the destructors of local variable > objects. > > > > If so, why not to instrument the functions by placing a tracing object > as the first local variable, with its constructor calling > __xray_FunctionEntry and destructor calling __xray_FunctionExit ? Perhaps > this approach requires changes in the front-end (C++ compiler, before > emitting IR). > > That's right -- the approach I've been thinking about (but haven't gotten > to implementing yet, because of other priorities) has been to have special > instrumentation at the throw and catch points. Similarly understanding > things like tail call exits and sibling call optimisations and > instrumenting those exit points is another thing that is yet to be > implemented. > > Unfortunately changing the front-end to add an RAII or guard object will > cause a whole lot of problems too. For instance, optimisations on the IR > can move instructions before/after the guard object initialisation and then > we end up with less accurate instrumentation. This also introduces > potentially unwanted semantics and changes the stack layout and affecting > the performance of an application with XRay sleds but instrumentation not > enabled. > > > > > > > Cheers. > > > > On 29 July 2016 at 21:00, Tim Northover <t.p.northover at gmail.com> wrote: > > On 28 July 2016 at 16:14, Serge Rogatch via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > Can I ask you why you chose to patch both function entrances and exits, > > > rather than just patching the entrances and (in the patches) pushing > on the > > > stack the address of __xray_FunctionExit , so that the user function > returns > > > normally (with RETQ or POP RIP or whatever else instruction) rather > than > > > jumping into __xray_FunctionExit? > > > > > This approach should also be faster because smaller code better fits > in CPU > > > cache, and patching itself should run faster (because there is less > code to > > > modify). > > > > It may well be slower. Larger CPUs tend to track the call stack in > > hardware and returning to an address pushed manually is an inevitable > > branch mispredict in those cases. > > > > Cheers. > > > > Tim. > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/32a24ad4/attachment.html>
Possibly Parallel Threads
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.
- XRay: Demo on x86_64/Linux almost done; some questions.