thr3ads.net - llvm dev - [LLVMdev] MCJIT debugger registration interface. [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Eric Christopher

2014-Aug-10 22:07 UTC

[LLVMdev] MCJIT debugger registration interface.

On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com>
wrote:> I think this ignores the real problem with the MCJIT debugging interface:
it doesn't give MCJIT clients any way of directly accessing and parsing the
debug metadata.
>
Parsing the existing debug metadata isn't necessarily a good idea
anyhow. It's not a stable format and is quite large.

> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the
MCJIT to register anything with the system debugger. Non-C languages usually
have a different set of debugging interfaces and it's up to the client of
LLVM to arrange to glue the debugging information that the MCJIT knows about to
the debugging interface that the LLVM client knows about. The mcjit's
current architecture makes this extremely awkward.
>
> This is part of a bigger problem in the MCJIT API: it is designed to work
like an execution engine for C programs despite the fact that the most
compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or
tiered runtime for non-C languages. Is there some client of the MCJIT that
actually benefits from the MCJIT pretending to be an execution engine for C
programs?  Is there a reason why this client should get more attention than the
seemingly more compelling non-C use cases?
The debug metadata is largely based around dwarf debug information,
but it isn't a C language based format. I think this is a misleading
assertion you make.

Also, it's your most compelling use case, not the most compelling.

-eric
>
> -Filip
>
>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com>
wrote:
>>
>> Hi All,
>>
>> I'd like to revisit the MCJIT debugger-registration system, as the
existing system has a few flaws, some of which are seriously problematic.
>>
>> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
>>
>> We have two symbols in MCJIT that act as fixed points for the debugger
to latch on to:
>>
>> __jit_debug_register_code is a no-op function that the debugger can set
a breakpoint on.  MCJIT will call this function to notify the debugger when an
object file is loaded.
>>
>> __jit_debug_descriptor is the head of a C linked list data structure
that contains pointers to in-memory object files. The ELF/MachO headers of the
in memory object files will have had their vaddrs fixed up by the JIT to point
to where each of the linked sections reside in memory.
>>
>> There are a couple of problems with this system: (1) Modifying
object-file headers in-place violates some internal LLVM contracts. In
particular, the object files may be backed by read-only memory. This has caused
crashes in the JIT that have forced me to revert support for debugger
registration on the MachO side (We really want to replace this on the ELF side
soon too). (2) The JIT has no way of knowing whether a debugger is attached,
which means keeping object files in memory even if they're not being used,
just in case there an attached debugger that needs them.
>>
>> We'd really like to come up with a system that doesn't have
these drawbacks. That is, a system where the object files remain unmodified, and
the JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.
>>
>> It would be great if the debugger experts (and particularly anyone who
has experience on both the debugger and the JIT side of things) could weigh in
on these issues. In particular:
>>
>> (1) Is there a reason we bake the vmaddrs into the object file headers,
or could they just as easily be passed in a side-table so as to keep the object
untouched?
>>
>> (2) Is there a canonical way for the debugger to communicate to a JIT
that it's interested in inspecting the JIT's output? If we're going
to use breakpoints (or something like that) to signal to the debugger when
objects have been linked, is it reasonable to have an API that the debugger can
call in to to request the information it's looking for? If the JIT actually
receives a call then it would give us a chance to lazily populate the necessary
data structures.
>>
>> Regards,
>> Lang.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Filip Pizlo

2014-Aug-10 22:37 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

> On Aug 10, 2014, at 3:07 PM, Eric Christopher <echristo at gmail.com>
wrote:
> 
>> On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>> I think this ignores the real problem with the MCJIT debugging
interface: it doesn't give MCJIT clients any way of directly accessing and
parsing the debug metadata.
> 
> Parsing the existing debug metadata isn't necessarily a good idea
> anyhow. It's not a stable format and is quite large.
I agree. I suspect that a better solution is to have the smarts for grokking the
debug data inside LLVM, possibly borrowing logic from lldb. For starters clients
like WebKit will want a machine-offset-to-debug-info map, which ain't rocket
science - but currently parsing dwarf inside the LLVM client is the only way to
do this afaict.
> 
> 
>> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the
MCJIT to register anything with the system debugger. Non-C languages usually
have a different set of debugging interfaces and it's up to the client of
LLVM to arrange to glue the debugging information that the MCJIT knows about to
the debugging interface that the LLVM client knows about. The mcjit's
current architecture makes this extremely awkward.
>> 
>> This is part of a bigger problem in the MCJIT API: it is designed to
work like an execution engine for C programs despite the fact that the most
compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or
tiered runtime for non-C languages. Is there some client of the MCJIT that
actually benefits from the MCJIT pretending to be an execution engine for C
programs?  Is there a reason why this client should get more attention than the
seemingly more compelling non-C use cases?
> 
> The debug metadata is largely based around dwarf debug information,
> but it isn't a C language based format. I think this is a misleading
> assertion you make.
That would be a misleading assertion indeed, but it's not the one I'm
making. Let me restate.

Clients of optimizing JIT compilers are usually going to want to have some
finer-grained control over how that JIT presents debug data to the debugger.
Probably all that we want is: the JIT offers its debug data to its client, and
the client decides if, and how, this data is presented to any debugger (lldb,
gdb, or whatever). A reasonable default can of course be provided, if it leads
to a good API.

The MCJIT is currently ill suited to this kind of thing because it pretends to
be a black box execution engine for LLVM IR. This black box then makes further
assumptions that make sense for programs that target the C runtime. I believe
that life would be easier if the task of generating code and the task of linking
and executing it were better separated in the API.
> 
> Also, it's your most compelling use case, not the most compelling.
If it isn't the most compelling, then can you provide an example of an MCJIT
client that benefits from the current design?

I suspect that most other MCJIT clients will do some similar things to what
WebKit does:

- custom runtime that doesn't behave like a C linker. 

- custom debugging infrastructure; even if lldb integration is provided, the
client's runtime will want lots of control.

- multiple compiler tiers or mixed-mode execution. 

- source language that is not like C. 

These four things apply to many systems and it would be cool if LLVM became
easier to use for those. If you believe that these things are not compelling,
then can you describe what kind system you envision MCJIT being used for?
> 
> -eric
> 
>> 
>> -Filip
>> 
>>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com>
wrote:
>>> 
>>> Hi All,
>>> 
>>> I'd like to revisit the MCJIT debugger-registration system, as
the existing system has a few flaws, some of which are seriously problematic.
>>> 
>>> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
>>> 
>>> We have two symbols in MCJIT that act as fixed points for the
debugger to latch on to:
>>> 
>>> __jit_debug_register_code is a no-op function that the debugger can
set a breakpoint on.  MCJIT will call this function to notify the debugger when
an object file is loaded.
>>> 
>>> __jit_debug_descriptor is the head of a C linked list data
structure that contains pointers to in-memory object files. The ELF/MachO
headers of the in memory object files will have had their vaddrs fixed up by the
JIT to point to where each of the linked sections reside in memory.
>>> 
>>> There are a couple of problems with this system: (1) Modifying
object-file headers in-place violates some internal LLVM contracts. In
particular, the object files may be backed by read-only memory. This has caused
crashes in the JIT that have forced me to revert support for debugger
registration on the MachO side (We really want to replace this on the ELF side
soon too). (2) The JIT has no way of knowing whether a debugger is attached,
which means keeping object files in memory even if they're not being used,
just in case there an attached debugger that needs them.
>>> 
>>> We'd really like to come up with a system that doesn't have
these drawbacks. That is, a system where the object files remain unmodified, and
the JIT knows if/when a debugger attaches so that it can generate the relevant
information on the fly.
>>> 
>>> It would be great if the debugger experts (and particularly anyone
who has experience on both the debugger and the JIT side of things) could weigh
in on these issues. In particular:
>>> 
>>> (1) Is there a reason we bake the vmaddrs into the object file
headers, or could they just as easily be passed in a side-table so as to keep
the object untouched?
>>> 
>>> (2) Is there a canonical way for the debugger to communicate to a
JIT that it's interested in inspecting the JIT's output? If we're
going to use breakpoints (or something like that) to signal to the debugger when
objects have been linked, is it reasonable to have an API that the debugger can
call in to to request the information it's looking for? If the JIT actually
receives a call then it would give us a chance to lazily populate the necessary
data structures.
>>> 
>>> Regards,
>>> Lang.
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Kevin Modzelewski

2014-Aug-11 09:50 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

On Sun, Aug 10, 2014 at 3:37 PM, Filip Pizlo <fpizlo at apple.com>
wrote:>
> I agree. I suspect that a better solution is to have the smarts for
> grokking the debug data inside LLVM, possibly borrowing logic from lldb.
> For starters clients like WebKit will want a machine-offset-to-debug-info
> map, which ain't rocket science - but currently parsing dwarf inside
the
> LLVM client is the only way to do this afaict.
>
I think what you're asking for is currently available in the C++ API:
I'm
not familiar with the C API, but my guess is a lack of a JITEventListener
equivalent is what's caused you guys to do some contortions using the
memory manager to inspect the MCJIT output (I think this also applies to
finding the stackmaps sections).

So far our use of debug info is limited (only for user tracebacks) but
we've been pretty happy with using a JITEventListener to call
DIContext::getDWARFContext on the output, which at least for line table
information, provides DWARF-parsing for us.  I guess it's inelegant to
re-parse the data that was just generated, but so far it seems fine.
https://github.com/dropbox/pyston/blob/master/src/codegen/unwinding.cpp#L111

> Clients of optimizing JIT compilers are usually going to want to have some
> finer-grained control over how that JIT presents debug data to the
> debugger. Probably all that we want is: the JIT offers its debug data to
> its client, and the client decides if, and how, this data is presented to
> any debugger (lldb, gdb, or whatever). A reasonable default can of course
> be provided, if it leads to a good API.
>
> The MCJIT is currently ill suited to this kind of thing because it
> pretends to be a black box execution engine for LLVM IR. This black box
> then makes further assumptions that make sense for programs that target the
> C runtime. I believe that life would be easier if the task of generating
> code and the task of linking and executing it were better separated in the
> API.

My impression is that the builtin gdb registration is just the default way
of consuming the debug information -- I agree that the default behavior
shouldn't come at the cost of flexibility for users who need something more
customized, but it seems like things are close to the point that the
GDB-registrar could be built using the C++ API, and it sounds like Lang's
proposed changes would make it more possible.  The situation sounds
different with the C API, but I think that might be an orthogonal issue of
C-api-vs-C++-api, rather than MCJIT-internals-vs-api?

Personally I find the default gdb registration to be helpful in debugging
of the JIT itself, even if it's not related to the task of providing our
language-specific debug functionality.  Maybe one thing that would be nice
to have is an API for disabling the GDB registration for performance
reasons, which we would potentially make use of in release builds.  I'm not
sure how much that would actually save, though, since I would assume the
registration cost is dwarfed (pun intended?) by the compile time.

kmod
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140811/2e001296/attachment.html>

Eric Christopher

2014-Aug-11 17:06 UTC

head link

[LLVMdev] MCJIT debugger registration interface.

On Sun, Aug 10, 2014 at 3:37 PM, Filip Pizlo <fpizlo at apple.com>
wrote:>
>
>> On Aug 10, 2014, at 3:07 PM, Eric Christopher <echristo at
gmail.com> wrote:
>>
>>> On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at
apple.com> wrote:
>>> I think this ignores the real problem with the MCJIT debugging
interface: it doesn't give MCJIT clients any way of directly accessing and
parsing the debug metadata.
>>
>> Parsing the existing debug metadata isn't necessarily a good idea
>> anyhow. It's not a stable format and is quite large.
>
> I agree. I suspect that a better solution is to have the smarts for
grokking the debug data inside LLVM, possibly borrowing logic from lldb. For
starters clients like WebKit will want a machine-offset-to-debug-info map, which
ain't rocket science - but currently parsing dwarf inside the LLVM client is
the only way to do this afaict.
There's some support (originally forked from lldb) already in llvm to
do this. Look at lib/DebugInfo, it's what llvm-dwarfdump, etc are
based upon.
>>
>>> WebKit, and likely other non-C/C++ clients of MCJIT, will not want
the MCJIT to register anything with the system debugger. Non-C languages usually
have a different set of debugging interfaces and it's up to the client of
LLVM to arrange to glue the debugging information that the MCJIT knows about to
the debugging interface that the LLVM client knows about. The mcjit's
current architecture makes this extremely awkward.
>>>
>>> This is part of a bigger problem in the MCJIT API: it is designed
to work like an execution engine for C programs despite the fact that the most
compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or
tiered runtime for non-C languages. Is there some client of the MCJIT that
actually benefits from the MCJIT pretending to be an execution engine for C
programs?  Is there a reason why this client should get more attention than the
seemingly more compelling non-C use cases?
>>
>> The debug metadata is largely based around dwarf debug information,
>> but it isn't a C language based format. I think this is a
misleading
>> assertion you make.
>
> That would be a misleading assertion indeed, but it's not the one
I'm making. Let me restate.
>
> Clients of optimizing JIT compilers are usually going to want to have some
finer-grained control over how that JIT presents debug data to the debugger.
Probably all that we want is: the JIT offers its debug data to its client, and
the client decides if, and how, this data is presented to any debugger (lldb,
gdb, or whatever). A reasonable default can of course be provided, if it leads
to a good API.
>
> The MCJIT is currently ill suited to this kind of thing because it pretends
to be a black box execution engine for LLVM IR. This black box then makes
further assumptions that make sense for programs that target the C runtime. I
believe that life would be easier if the task of generating code and the task of
linking and executing it were better separated in the API.
I think there are two things here, dwarf level support for things like
line numbers, variable locations, and even some basic type
information. Then there's language support like you'd want to see
debugging a high level language that can't be fully described or has
run time effects - a debugging interface that can be called into for
that could be useful, but I'm not seeing that as necessarily something
that MCJIT would vend but something on top of it. I.e. how a debugger
would handle (bad example here, but...) something like Obj-C or Swift.
>
>>
>> Also, it's your most compelling use case, not the most compelling.
>
> If it isn't the most compelling, then can you provide an example of an
MCJIT client that benefits from the current design?
>
> I suspect that most other MCJIT clients will do some similar things to what
WebKit does:
>
> - custom runtime that doesn't behave like a C linker.
>
> - custom debugging infrastructure; even if lldb integration is provided,
the client's runtime will want lots of control.
>
> - multiple compiler tiers or mixed-mode execution.
>
> - source language that is not like C.
>
> These four things apply to many systems and it would be cool if LLVM became
easier to use for those. If you believe that these things are not compelling,
then can you describe what kind system you envision MCJIT being used for?
>
Oh, I agree they'd be cool to have as well, but there's also languages
like Swift and Julia that use the JIT. There are all of the
OpenGL/OpenCL/OpenACC accelerator type compilation uses, etc. Just
saying that the Webkit JavaScript compilation strategy isn't the only
compelling use case.

Mostly I think we're in agreement that this sort of functionality
would be useful, just where it goes and whether or not the existing
information that we can vend is also useful.

-eric

>>
>> -eric
>>
>>>
>>> -Filip
>>>
>>>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at
gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I'd like to revisit the MCJIT debugger-registration system,
as the existing system has a few flaws, some of which are seriously problematic.
>>>>
>>>> The 20,000 foot overview of the existing scheme (implemented in
llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I
understand it, is as follows:
>>>>
>>>> We have two symbols in MCJIT that act as fixed points for the
debugger to latch on to:
>>>>
>>>> __jit_debug_register_code is a no-op function that the debugger
can set a breakpoint on.  MCJIT will call this function to notify the debugger
when an object file is loaded.
>>>>
>>>> __jit_debug_descriptor is the head of a C linked list data
structure that contains pointers to in-memory object files. The ELF/MachO
headers of the in memory object files will have had their vaddrs fixed up by the
JIT to point to where each of the linked sections reside in memory.
>>>>
>>>> There are a couple of problems with this system: (1) Modifying
object-file headers in-place violates some internal LLVM contracts. In
particular, the object files may be backed by read-only memory. This has caused
crashes in the JIT that have forced me to revert support for debugger
registration on the MachO side (We really want to replace this on the ELF side
soon too). (2) The JIT has no way of knowing whether a debugger is attached,
which means keeping object files in memory even if they're not being used,
just in case there an attached debugger that needs them.
>>>>
>>>> We'd really like to come up with a system that doesn't
have these drawbacks. That is, a system where the object files remain
unmodified, and the JIT knows if/when a debugger attaches so that it can
generate the relevant information on the fly.
>>>>
>>>> It would be great if the debugger experts (and particularly
anyone who has experience on both the debugger and the JIT side of things) could
weigh in on these issues. In particular:
>>>>
>>>> (1) Is there a reason we bake the vmaddrs into the object file
headers, or could they just as easily be passed in a side-table so as to keep
the object untouched?
>>>>
>>>> (2) Is there a canonical way for the debugger to communicate to
a JIT that it's interested in inspecting the JIT's output? If we're
going to use breakpoints (or something like that) to signal to the debugger when
objects have been linked, is it reasonable to have an API that the debugger can
call in to to request the information it's looking for? If the JIT actually
receives a call then it would give us a chance to lazily populate the necessary
data structures.
>>>>
>>>> Regards,
>>>> Lang.
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - Aug 2014 - [LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.

[LLVMdev] MCJIT debugger registration interface.