Hi All, I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. Regards, Lang.
Thanks for writing up the summary. We'd like to come up with a simple scheme that addresses the problems we've found in practice with the current interface, but isn't unnecessarily complex. The current interface is basically fine except for the points mentioned above which make using it somewhat of a pain. This is also the chance to address any other pain points we may have with this interface. For example, the current scheme is disabled on OS X at the moment in LLDB since looking for the symbol to set the breakpoint is too slow. Perhaps we can come up with something better here as well (I guess this is related to point 2 above). It would be good to get some input from people more experienced with the debugger side on that point. On Fri, Aug 1, 2014 at 9:10 PM, Lang Hames <lhames at gmail.com> wrote:> Hi All, > > I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. > > The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: > > We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: > > __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. > > __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. > > There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. > > We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. > > It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: > > (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? > > (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. > > Regards, > Lang. > >
On 2 Aug 2014, at 02:10, Lang Hames <lhames at gmail.com> wrote:> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures.The problem with calling into the JIT is that it's very common to debug a program that contains bugs. If it contains bugs, then it's likely to be in an undefined state when the debugger is attached. We've all seen gdb and lldb hit signals when executing code in the debugged process because they hit some invalid memory and have to give up on something typed in the command line. It's even worse if you have to invoke the JIT. Any memory-management errors in JIT'd code have the potential to break the JIT and so break its ability to provide the debug info (which is another reason why it's good to have the debug info be read-only). This is not an issue if your deployment model is to JIT the code in one process and run it in another. MCJIT was designed to allow this, and it might be a good idea to encourage this usage pattern, but it's not always feasible. It should be possible to have the debugger toggle a variable when it's attached and inspecting a particular piece of code, to allow the JIT to unload debug info later. I'm somewhat hesitant to recommend even that, because perturbing the memory layout depending on whether the debugger is attached is likely to cause heisenbugs. I wonder if a better solution is to stream the debug info somewhere - to a pipe or a file that a debugger can be responsible for managing. Set an environment variable before you launch the process to tell it either 'write debug info to this file' or 'write debug info to file descriptor number N'. If this isn't set, don't bother generating debug info. If it is, send debug info there. If it's a pipe with the debugger on the other end, then the debugger gets a message as soon as there's more debug info to read and can discard any of the information that it doesn't care about. David
I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata. WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward. This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs? Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases? -Filip> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote: > > Hi All, > > I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. > > The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: > > We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: > > __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. > > __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. > > There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. > > We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. > > It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: > > (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? > > (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. > > Regards, > Lang. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases?The presence of patches. Tim.
On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com> wrote:> I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata. >Parsing the existing debug metadata isn't necessarily a good idea anyhow. It's not a stable format and is quite large.> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward. > > This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs? Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases?The debug metadata is largely based around dwarf debug information, but it isn't a C language based format. I think this is a misleading assertion you make. Also, it's your most compelling use case, not the most compelling. -eric> > -Filip > >> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote: >> >> Hi All, >> >> I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic. >> >> The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows: >> >> We have two symbols in MCJIT that act as fixed points for the debugger to latch on to: >> >> __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded. >> >> __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory. >> >> There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them. >> >> We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly. >> >> It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular: >> >> (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched? >> >> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures. >> >> Regards, >> Lang. >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev