thr3ads.net - llvm dev - [LLVMdev] MCJIT debugger registration interface. [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Lang Hames

2014-Aug-02 01:10 UTC

[LLVMdev] MCJIT debugger registration interface.

Hi All,

I'd like to revisit the MCJIT debugger-registration system, as the existing
system has a few flaws, some of which are seriously problematic.

The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:

We have two symbols in MCJIT that act as fixed points for the debugger to latch
on to:

 __jit_debug_register_code is a no-op function that the debugger can set a
breakpoint on.  MCJIT will call this function to notify the debugger when an
object file is loaded.

__jit_debug_descriptor is the head of a C linked list data structure that
contains pointers to in-memory object files. The ELF/MachO headers of the in
memory object files will have had their vaddrs fixed up by the JIT to point to
where each of the linked sections reside in memory.

There are a couple of problems with this system: (1) Modifying object-file
headers in-place violates some internal LLVM contracts. In particular, the
object files may be backed by read-only memory. This has caused crashes in the
JIT that have forced me to revert support for debugger registration on the MachO
side (We really want to replace this on the ELF side soon too). (2) The JIT has
no way of knowing whether a debugger is attached, which means keeping object
files in memory even if they're not being used, just in case there an
attached debugger that needs them.

We'd really like to come up with a system that doesn't have these
drawbacks. That is, a system where the object files remain unmodified, and the
JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.

It would be great if the debugger experts (and particularly anyone who has
experience on both the debugger and the JIT side of things) could weigh in on
these issues. In particular:

(1) Is there a reason we bake the vmaddrs into the object file headers, or could
they just as easily be passed in a side-table so as to keep the object
untouched?

(2) Is there a canonical way for the debugger to communicate to a JIT that
it's interested in inspecting the JIT's output? If we're going to
use breakpoints (or something like that) to signal to the debugger when objects
have been linked, is it reasonable to have an API that the debugger can call in
to to request the information it's looking for? If the JIT actually receives
a call then it would give us a chance to lazily populate the necessary data
structures.

Regards,
Lang.

Keno Fischer

2014-Aug-04 04:41 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

Thanks for writing up the summary. We'd like to come up with a simple
scheme that addresses the problems we've found in practice with the
current interface, but isn't unnecessarily complex. The current
interface is basically fine except for the points mentioned above
which make using it somewhat of a pain. This is also the chance to
address any other pain points we may have with this interface. For
example, the current scheme is disabled on OS X at the moment in LLDB
since looking for the symbol to set the breakpoint is too slow.
Perhaps we can come up with something better here as well (I guess
this is related to point 2 above). It would be good to get some input
from people more experienced with the debugger side on that point.

On Fri, Aug 1, 2014 at 9:10 PM, Lang Hames <lhames at gmail.com>
wrote:> Hi All,
>
> I'd like to revisit the MCJIT debugger-registration system, as the
existing system has a few flaws, some of which are seriously problematic.
>
> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
>
> We have two symbols in MCJIT that act as fixed points for the debugger to
latch on to:
>
>  __jit_debug_register_code is a no-op function that the debugger can set a
breakpoint on.  MCJIT will call this function to notify the debugger when an
object file is loaded.
>
> __jit_debug_descriptor is the head of a C linked list data structure that
contains pointers to in-memory object files. The ELF/MachO headers of the in
memory object files will have had their vaddrs fixed up by the JIT to point to
where each of the linked sections reside in memory.
>
> There are a couple of problems with this system: (1) Modifying object-file
headers in-place violates some internal LLVM contracts. In particular, the
object files may be backed by read-only memory. This has caused crashes in the
JIT that have forced me to revert support for debugger registration on the MachO
side (We really want to replace this on the ELF side soon too). (2) The JIT has
no way of knowing whether a debugger is attached, which means keeping object
files in memory even if they're not being used, just in case there an
attached debugger that needs them.
>
> We'd really like to come up with a system that doesn't have these
drawbacks. That is, a system where the object files remain unmodified, and the
JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.
>
> It would be great if the debugger experts (and particularly anyone who has
experience on both the debugger and the JIT side of things) could weigh in on
these issues. In particular:
>
> (1) Is there a reason we bake the vmaddrs into the object file headers, or
could they just as easily be passed in a side-table so as to keep the object
untouched?
>
> (2) Is there a canonical way for the debugger to communicate to a JIT that
it's interested in inspecting the JIT's output? If we're going to
use breakpoints (or something like that) to signal to the debugger when objects
have been linked, is it reasonable to have an API that the debugger can call in
to to request the information it's looking for? If the JIT actually receives
a call then it would give us a chance to lazily populate the necessary data
structures.
>
> Regards,
> Lang.
>
>

David Chisnall

2014-Aug-04 08:36 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

On 2 Aug 2014, at 02:10, Lang Hames <lhames at gmail.com> wrote:
> (2) Is there a canonical way for the debugger to communicate to a JIT that
it's interested in inspecting the JIT's output? If we're going to
use breakpoints (or something like that) to signal to the debugger when objects
have been linked, is it reasonable to have an API that the debugger can call in
to to request the information it's looking for? If the JIT actually receives
a call then it would give us a chance to lazily populate the necessary data
structures.
The problem with calling into the JIT is that it's very common to debug a
program that contains bugs.  If it contains bugs, then it's likely to be in
an undefined state when the debugger is attached.  We've all seen gdb and
lldb hit signals when executing code in the debugged process because they hit
some invalid memory and have to give up on something typed in the command line. 
It's even worse if you have to invoke the JIT.  Any memory-management errors
in JIT'd code have the potential to break the JIT and so break its ability
to provide the debug info (which is another reason why it's good to have the
debug info be read-only).

This is not an issue if your deployment model is to JIT the code in one process
and run it in another.  MCJIT was designed to allow this, and it might be a good
idea to encourage this usage pattern, but it's not always feasible.

It should be possible to have the debugger toggle a variable when it's
attached and inspecting a particular piece of code, to allow the JIT to unload
debug info later.  I'm somewhat hesitant to recommend even that, because
perturbing the memory layout depending on whether the debugger is attached is
likely to cause heisenbugs.

I wonder if a better solution is to stream the debug info somewhere - to a pipe
or a file that a debugger can be responsible for managing.  Set an environment
variable before you launch the process to tell it either 'write debug info
to this file' or 'write debug info to file descriptor number N'.  If
this isn't set, don't bother generating debug info.  If it is, send
debug info there.  If it's a pipe with the debugger on the other end, then
the debugger gets a message as soon as there's more debug info to read and
can discard any of the information that it doesn't care about.

David

Filip Pizlo

2014-Aug-10 20:43 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

I think this ignores the real problem with the MCJIT debugging interface: it
doesn't give MCJIT clients any way of directly accessing and parsing the
debug metadata.

WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to
register anything with the system debugger. Non-C languages usually have a
different set of debugging interfaces and it's up to the client of LLVM to
arrange to glue the debugging information that the MCJIT knows about to the
debugging interface that the LLVM client knows about. The mcjit's current
architecture makes this extremely awkward.

This is part of a bigger problem in the MCJIT API: it is designed to work like
an execution engine for C programs despite the fact that the most compelling use
of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for
non-C languages. Is there some client of the MCJIT that actually benefits from
the MCJIT pretending to be an execution engine for C programs?  Is there a
reason why this client should get more attention than the seemingly more
compelling non-C use cases?

-Filip
> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote:
> 
> Hi All,
> 
> I'd like to revisit the MCJIT debugger-registration system, as the
existing system has a few flaws, some of which are seriously problematic.
> 
> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
> 
> We have two symbols in MCJIT that act as fixed points for the debugger to
latch on to:
> 
> __jit_debug_register_code is a no-op function that the debugger can set a
breakpoint on.  MCJIT will call this function to notify the debugger when an
object file is loaded.
> 
> __jit_debug_descriptor is the head of a C linked list data structure that
contains pointers to in-memory object files. The ELF/MachO headers of the in
memory object files will have had their vaddrs fixed up by the JIT to point to
where each of the linked sections reside in memory.
> 
> There are a couple of problems with this system: (1) Modifying object-file
headers in-place violates some internal LLVM contracts. In particular, the
object files may be backed by read-only memory. This has caused crashes in the
JIT that have forced me to revert support for debugger registration on the MachO
side (We really want to replace this on the ELF side soon too). (2) The JIT has
no way of knowing whether a debugger is attached, which means keeping object
files in memory even if they're not being used, just in case there an
attached debugger that needs them.
> 
> We'd really like to come up with a system that doesn't have these
drawbacks. That is, a system where the object files remain unmodified, and the
JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.
> 
> It would be great if the debugger experts (and particularly anyone who has
experience on both the debugger and the JIT side of things) could weigh in on
these issues. In particular:
> 
> (1) Is there a reason we bake the vmaddrs into the object file headers, or
could they just as easily be passed in a side-table so as to keep the object
untouched?
> 
> (2) Is there a canonical way for the debugger to communicate to a JIT that
it's interested in inspecting the JIT's output? If we're going to
use breakpoints (or something like that) to signal to the debugger when objects
have been linked, is it reasonable to have an API that the debugger can call in
to to request the information it's looking for? If the JIT actually receives
a call then it would give us a chance to lazily populate the necessary data
structures.
> 
> Regards,
> Lang.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Tim Northover

2014-Aug-10 20:55 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

> Is there a reason why this client should get more attention than the
seemingly more compelling non-C use cases?
The presence of patches.

Tim.

Eric Christopher

2014-Aug-10 22:07 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com>
wrote:> I think this ignores the real problem with the MCJIT debugging interface:
it doesn't give MCJIT clients any way of directly accessing and parsing the
debug metadata.
>
Parsing the existing debug metadata isn't necessarily a good idea
anyhow. It's not a stable format and is quite large.

> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the
MCJIT to register anything with the system debugger. Non-C languages usually
have a different set of debugging interfaces and it's up to the client of
LLVM to arrange to glue the debugging information that the MCJIT knows about to
the debugging interface that the LLVM client knows about. The mcjit's
current architecture makes this extremely awkward.
>
> This is part of a bigger problem in the MCJIT API: it is designed to work
like an execution engine for C programs despite the fact that the most
compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or
tiered runtime for non-C languages. Is there some client of the MCJIT that
actually benefits from the MCJIT pretending to be an execution engine for C
programs?  Is there a reason why this client should get more attention than the
seemingly more compelling non-C use cases?
The debug metadata is largely based around dwarf debug information,
but it isn't a C language based format. I think this is a misleading
assertion you make.

Also, it's your most compelling use case, not the most compelling.

-eric
>
> -Filip
>
>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com>
wrote:
>>
>> Hi All,
>>
>> I'd like to revisit the MCJIT debugger-registration system, as the
existing system has a few flaws, some of which are seriously problematic.
>>
>> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
>>
>> We have two symbols in MCJIT that act as fixed points for the debugger
to latch on to:
>>
>> __jit_debug_register_code is a no-op function that the debugger can set
a breakpoint on.  MCJIT will call this function to notify the debugger when an
object file is loaded.
>>
>> __jit_debug_descriptor is the head of a C linked list data structure
that contains pointers to in-memory object files. The ELF/MachO headers of the
in memory object files will have had their vaddrs fixed up by the JIT to point
to where each of the linked sections reside in memory.
>>
>> There are a couple of problems with this system: (1) Modifying
object-file headers in-place violates some internal LLVM contracts. In
particular, the object files may be backed by read-only memory. This has caused
crashes in the JIT that have forced me to revert support for debugger
registration on the MachO side (We really want to replace this on the ELF side
soon too). (2) The JIT has no way of knowing whether a debugger is attached,
which means keeping object files in memory even if they're not being used,
just in case there an attached debugger that needs them.
>>
>> We'd really like to come up with a system that doesn't have
these drawbacks. That is, a system where the object files remain unmodified, and
the JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.
>>
>> It would be great if the debugger experts (and particularly anyone who
has experience on both the debugger and the JIT side of things) could weigh in
on these issues. In particular:
>>
>> (1) Is there a reason we bake the vmaddrs into the object file headers,
or could they just as easily be passed in a side-table so as to keep the object
untouched?
>>
>> (2) Is there a canonical way for the debugger to communicate to a JIT
that it's interested in inspecting the JIT's output? If we're going
to use breakpoints (or something like that) to signal to the debugger when
objects have been linked, is it reasonable to have an API that the debugger can
call in to to request the information it's looking for? If the JIT actually
receives a call then it would give us a chance to lazily populate the necessary
data structures.
>>
>> Regards,
>> Lang.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Aug 2014 - [LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

Reasonably Related Threads