On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com> wrote:> I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata. >Parsing the existing debug metadata isn't necessarily a good idea anyhow. It's not a stable format and is quite large.> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward. > > This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs? Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases?The debug metadata is largely based around dwarf debug information, but it isn't a C language based format. I think this is a misleading assertion you make. Also, it's your most compelling use case, not the most compelling. -eric> > -Filip > >> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote: >> >> Hi All, >> >> I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. >> >> The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: >> >> We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: >> >> __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. >> >> __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. >> >> There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. >> >> We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. >> >> It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: >> >> (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? >> >> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. >> >> Regards, >> Lang. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> On Aug 10, 2014, at 3:07 PM, Eric Christopher <echristo at gmail.com> wrote: > >> On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com> wrote: >> I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata. > > Parsing the existing debug metadata isn't necessarily a good idea > anyhow. It's not a stable format and is quite large.I agree. I suspect that a better solution is to have the smarts for grokking the debug data inside LLVM, possibly borrowing logic from lldb. For starters clients like WebKit will want a machine-offset-to-debug-info map, which ain't rocket science - but currently parsing dwarf inside the LLVM client is the only way to do this afaict.> > >> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward. >> >> This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs? Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases? > > The debug metadata is largely based around dwarf debug information, > but it isn't a C language based format. I think this is a misleading > assertion you make.That would be a misleading assertion indeed, but it's not the one I'm making. Let me restate. Clients of optimizing JIT compilers are usually going to want to have some finer-grained control over how that JIT presents debug data to the debugger. Probably all that we want is: the JIT offers its debug data to its client, and the client decides if, and how, this data is presented to any debugger (lldb, gdb, or whatever). A reasonable default can of course be provided, if it leads to a good API. The MCJIT is currently ill suited to this kind of thing because it pretends to be a black box execution engine for LLVM IR. This black box then makes further assumptions that make sense for programs that target the C runtime. I believe that life would be easier if the task of generating code and the task of linking and executing it were better separated in the API.> > Also, it's your most compelling use case, not the most compelling.If it isn't the most compelling, then can you provide an example of an MCJIT client that benefits from the current design? I suspect that most other MCJIT clients will do some similar things to what WebKit does: - custom runtime that doesn't behave like a C linker. - custom debugging infrastructure; even if lldb integration is provided, the client's runtime will want lots of control. - multiple compiler tiers or mixed-mode execution. - source language that is not like C. These four things apply to many systems and it would be cool if LLVM became easier to use for those. If you believe that these things are not compelling, then can you describe what kind system you envision MCJIT being used for?> > -eric > >> >> -Filip >> >>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote: >>> >>> Hi All, >>> >>> I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. >>> >>> The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: >>> >>> We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: >>> >>> __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. >>> >>> __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. >>> >>> There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. >>> >>> We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. >>> >>> It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: >>> >>> (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? >>> >>> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. >>> >>> Regards, >>> Lang. >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Sun, Aug 10, 2014 at 3:37 PM, Filip Pizlo <fpizlo at apple.com> wrote:> > I agree. I suspect that a better solution is to have the smarts for > grokking the debug data inside LLVM, possibly borrowing logic from lldb. > For starters clients like WebKit will want a machine-offset-to-debug-info > map, which ain't rocket science - but currently parsing dwarf inside the > LLVM client is the only way to do this afaict. >I think what you're asking for is currently available in the C++ API: I'm not familiar with the C API, but my guess is a lack of a JITEventListener equivalent is what's caused you guys to do some contortions using the memory manager to inspect the MCJIT output (I think this also applies to finding the stackmaps sections). So far our use of debug info is limited (only for user tracebacks) but we've been pretty happy with using a JITEventListener to call DIContext::getDWARFContext on the output, which at least for line table information, provides DWARF-parsing for us. I guess it's inelegant to re-parse the data that was just generated, but so far it seems fine. https://github.com/dropbox/pyston/blob/master/src/codegen/unwinding.cpp#L111> Clients of optimizing JIT compilers are usually going to want to have some > finer-grained control over how that JIT presents debug data to the > debugger. Probably all that we want is: the JIT offers its debug data to > its client, and the client decides if, and how, this data is presented to > any debugger (lldb, gdb, or whatever). A reasonable default can of course > be provided, if it leads to a good API. > > The MCJIT is currently ill suited to this kind of thing because it > pretends to be a black box execution engine for LLVM IR. This black box > then makes further assumptions that make sense for programs that target the > C runtime. I believe that life would be easier if the task of generating > code and the task of linking and executing it were better separated in the > API.My impression is that the builtin gdb registration is just the default way of consuming the debug information -- I agree that the default behavior shouldn't come at the cost of flexibility for users who need something more customized, but it seems like things are close to the point that the GDB-registrar could be built using the C++ API, and it sounds like Lang's proposed changes would make it more possible. The situation sounds different with the C API, but I think that might be an orthogonal issue of C-api-vs-C++-api, rather than MCJIT-internals-vs-api? Personally I find the default gdb registration to be helpful in debugging of the JIT itself, even if it's not related to the task of providing our language-specific debug functionality. Maybe one thing that would be nice to have is an API for disabling the GDB registration for performance reasons, which we would potentially make use of in release builds. I'm not sure how much that would actually save, though, since I would assume the registration cost is dwarfed (pun intended?) by the compile time. kmod -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140811/2e001296/attachment.html>
On Sun, Aug 10, 2014 at 3:37 PM, Filip Pizlo <fpizlo at apple.com> wrote:> > >> On Aug 10, 2014, at 3:07 PM, Eric Christopher <echristo at gmail.com> wrote: >> >>> On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com> wrote: >>> I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata. >> >> Parsing the existing debug metadata isn't necessarily a good idea >> anyhow. It's not a stable format and is quite large. > > I agree. I suspect that a better solution is to have the smarts for grokking the debug data inside LLVM, possibly borrowing logic from lldb. For starters clients like WebKit will want a machine-offset-to-debug-info map, which ain't rocket science - but currently parsing dwarf inside the LLVM client is the only way to do this afaict.There's some support (originally forked from lldb) already in llvm to do this. Look at lib/DebugInfo, it's what llvm-dwarfdump, etc are based upon.>> >>> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward. >>> >>> This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs? Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases? >> >> The debug metadata is largely based around dwarf debug information, >> but it isn't a C language based format. I think this is a misleading >> assertion you make. > > That would be a misleading assertion indeed, but it's not the one I'm making. Let me restate. > > Clients of optimizing JIT compilers are usually going to want to have some finer-grained control over how that JIT presents debug data to the debugger. Probably all that we want is: the JIT offers its debug data to its client, and the client decides if, and how, this data is presented to any debugger (lldb, gdb, or whatever). A reasonable default can of course be provided, if it leads to a good API. > > The MCJIT is currently ill suited to this kind of thing because it pretends to be a black box execution engine for LLVM IR. This black box then makes further assumptions that make sense for programs that target the C runtime. I believe that life would be easier if the task of generating code and the task of linking and executing it were better separated in the API.I think there are two things here, dwarf level support for things like line numbers, variable locations, and even some basic type information. Then there's language support like you'd want to see debugging a high level language that can't be fully described or has run time effects - a debugging interface that can be called into for that could be useful, but I'm not seeing that as necessarily something that MCJIT would vend but something on top of it. I.e. how a debugger would handle (bad example here, but...) something like Obj-C or Swift.> >> >> Also, it's your most compelling use case, not the most compelling. > > If it isn't the most compelling, then can you provide an example of an MCJIT client that benefits from the current design? > > I suspect that most other MCJIT clients will do some similar things to what WebKit does: > > - custom runtime that doesn't behave like a C linker. > > - custom debugging infrastructure; even if lldb integration is provided, the client's runtime will want lots of control. > > - multiple compiler tiers or mixed-mode execution. > > - source language that is not like C. > > These four things apply to many systems and it would be cool if LLVM became easier to use for those. If you believe that these things are not compelling, then can you describe what kind system you envision MCJIT being used for? >Oh, I agree they'd be cool to have as well, but there's also languages like Swift and Julia that use the JIT. There are all of the OpenGL/OpenCL/OpenACC accelerator type compilation uses, etc. Just saying that the Webkit JavaScript compilation strategy isn't the only compelling use case. Mostly I think we're in agreement that this sort of functionality would be useful, just where it goes and whether or not the existing information that we can vend is also useful. -eric>> >> -eric >> >>> >>> -Filip >>> >>>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote: >>>> >>>> Hi All, >>>> >>>> I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. >>>> >>>> The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: >>>> >>>> We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: >>>> >>>> __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. >>>> >>>> __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. >>>> >>>> There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. >>>> >>>> We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. >>>> >>>> It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: >>>> >>>> (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? >>>> >>>> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. >>>> >>>> Regards, >>>> Lang. >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev