Lang Hames via llvm-dev
2016-Feb-11 02:58 UTC
[llvm-dev] [RFC] Error handling in LLVM libraries.
Hi All, Now that this thread has accumulated some feedback, I thought I'd try to summarize my thoughts on error handling in LLVM, and why I think this proposal is worth adopting: (1) Failure to check an error *is* a programmatic error, and our error system should reflect this. This makes it easy to spot mistakes in our error handling and correct them. (2) Error returns should describe all the information about an error that a client could reasonably use to recover from the error. This requirement goes beyond what a flat enum, or even an enum with a category (like std::error_code) is capable of expressing. For example, it is possible to express "missing_file" in an enum, but not "missing_file: /path/to/file". A library client could meaningfully respond to the latter by re-generating the file (if they knew how) and re-trying the operation. (3) There may be many levels of stack between the point where an error is raised, and the point where it can be handled. A generic error handling system needs to bridge this gap, bringing all the information needed to handle an error together in one place. In the process, the system should unwind through all levels of the stack that can't handle the error allowing resources to be freed, and ensuring execution does not continue into code that can't handle the error. (A corollary to this is that a generic error handling system should never respond to an error in user-supplied input by aborting). (4) Errors should be easy to break on in a debugger. (5) Errors should have minimal / no overhead in the success case. I think we have consensus that point (1) is a serious problem. My scheme addresses this problem by including a 'checked' bit that verifies that errors (and even success values) have been inspected before being discarded. However, we could do something similar with a wrapper around std::error_code without adopting the custom-error-class aspect of my proposal. Points (2) and (3) (which are closely related) can't be addressed with std::error_code. My scheme does address these points by allowing users to define custom error classes. Point (4) is addressed in my scheme by setting breakpoints on error class constructors. With a wrapper around std::error_code, the best you can do is to set a conditional breakpoint looking for construction with a particular enum value. Point (5) is a wash - both TypedError and std::error_code are cheap in the success case. One of the concerns that has been raised with my system is complexity. I don't think the system is overly complex conceptually (though the implementation does involve some non-trivial template goop). I presented the system in detail in my original email, but I'll try to summarize here: This system provides an owning "error pointer" class, TypedError, that is as cheap as std::error_code to move around, return from functions, etc. Success values are indicated by a null "error pointer" value that is extremely cheap to construct. Failure values are indicated by a non-null error-pointer. The cost of constructing a failure value is just the cost of constructing the pointee describing the error, plus the cost of handling the error (if that happens). The pointee must be a user-defined class that works with the typed-error RTTI system. We provide a utility class, TypedErrorInfo, to make it easy to construct such classes: class MyError : TypedErrorInfo<MyError> { ... }; User defined error classes must also implement a log method to render the error to a stream, however this can usually be done in a couple of lines. Since we expect the number of user-defined error classes to be small, this overhead should be minimal. For usage examples on all this, please see the error demo attached to my original email. So, given the benefits of this proposal, I think the complexity is well justified. An alternative scheme that has been floated is using a combination of error_code and diagnostic handlers, but I have two strong objections to this as a general solution: (1) Error returns are already required to ensure that code further up the stack does not continue past an error. Given the fundamental necessity of error returns, it makes sense to re-use them for diagnostics where possible, as it avoids the need to *also* thread a diagnostic handler through any function that could fail. If used as a general solution, diagnostic handlers would quickly become unwieldy, especially when calling between libraries. For example, if library A calls library B, which calls library C, and library C can produce an error, you are now forced thread a diagnostic handler through library A and B for library C's sake. This does not scale well. I recognize that diagnostic handlers have totally valid use cases. I am not proposing that we remove any of the existing diagnostic handlers from LLVM, and I'm not against introducing new ones where there is a use-case. However, in use-cases where TypedError is necessary to enable recovery and sufficient to produce good error messages, I think diagnostic handlers are superfluous. (2) By retaining std::error_code, the error_code + diagnostic-handler scheme still falls short on points (2) and (3) on my original list. That is, the combination of diagnostic handlers and std::error_code only works if the best "recovery" you can do is report an error on a stream and then bail out or continue, based on a coarse-grained error_code. That's good enough for a lot of command-line utilities, but not good enough for general library design. Responses to this proposal seem mostly positive so far, and I hope this summary addresses the concerns that have been raised, but I'm very happy to continue the conversation if not. That said, I'm really keen to start improving error handling in libObject and the JIT, so in the interest of maintaining some momentum on this proposal, here's how I plan to proceed if this proposal is adopted: (1) Write up a set of unit-tests for the TypedError system, and a new section of the LLVM Programmer's Manual describing its use. (2) Re-submit the original prototype (with minor tweaks) plus the unit-tests and programmers manual additions to llvm-commits fro review. (3) Begin threading TypedError and TypedErrorOr through MachOObjectFile in libObject, using ECError as a bridge between TypedError/TypedErrorOr and std::error_code/ErrorOr. The act of threading this through libObject will hit LLD, MCJIT and ORC, DebugInfo, LTO, ProfilingData and a host of command-line utilities. However, the ECError glue is fairly straightforward and idiomatic, and provides us with an explicit (grepable) surface area that can be chipped away at. Working with Kevin Enderby, my next step would be to plumb TypedError through the MachO-specific parts of llvm-nm and llvm-objdump. Based on our experiments with the prototype, we expect to be able to significantly improve the diagnostics produced by these tools for broken object files. In the long-term I expect many parts of LLVM may want to migrate from std::error_code to TypedError, but that's ultimately a decision for the maintainers of the respective parts. I'll be very happy to provide input and support, but I don't want to pre-empt anyone else's decision on whether to adopt this. Cheers, Lang. On Wed, Feb 10, 2016 at 1:36 PM, Lang Hames <lhames at gmail.com> wrote:> Hi Rafael, > > What prevents you from using a diag handler in the jit server that > sends errors/warnings over the RPCChannel? > > > Sorry - this is a non-trivial problem to think through, so to speed > things up, here are the issues: > > (1) A diagnostic handler can only exit or continue, neither of which are > sufficient to deal with an error in the general case. You need to be able > to stop and unwind to some point in the stack where the error can be > handled, meaningfully releasing / cleaning-up resources as you go. > > (2) Using a diagnostic handler for structured error serialization removes > the cohesion between error serialization / deserialization and the error > types themselves. It also removes the cohesion between serialization and > deserialization: serialization happens in the diagnostic handler, > deserialization somewhere else. All this makes it very easy to forget to > update serialization/deserialization code when error types change, and for > serialization/deserialization to get out of sync with one another. > Diagnostic handlers really aren't designed for this problem. > > In my design, anyone building a client/server system on top of ORC can > make any error reportable over the wire by inheriting from some > "Serializable" interface and writing their serialization/deserialization > code in the error type itself. The server can then contain a > checkTypedError block that boils down to: > > while (1) { > if (auto Err = handleServerCommand(Channel)) > if (Err is handleable by the server) > /* handle it */ > else if (Err.isA<Serializable>()) > Err->serialize(Channel); /* Report it to the client, let them > respond */ > else > return Err; /* Bail out of the server loop */ > } > > Cheers, > Lang. > > On Wed, Feb 10, 2016 at 10:55 AM, Lang Hames <lhames at gmail.com> wrote: > >> Hi Rafael, >> >> What prevents you from using a diag handler in the jit server that >> sends errors/warnings over the RPCChannel? >> >> >> What would you do with errors that can't reasonable be serialized when >> they reach the diagnostic handler? >> >> And what would you do with the serialized bytes on the client end? >> >> - Lang. >> >> Sent from my iPhone >> >> On Feb 10, 2016, at 10:31 AM, Rafael Espíndola < >> rafael.espindola at gmail.com> wrote: >> >> I recently added support for remote JITing to ORC. There are in-tree >> library >> >> facilities for JITing code into a process on the other end of an abstract >> >> "RPCChannel". What happens if something goes wrong at one end? We want to >> be >> >> able to communicate an error back across the RPCChannel (assuming it's >> still >> >> intact) so the other side can recover or fail gracefully. That means we >> need >> >> to be able to serialize an error with enough information to describe what >> >> went wrong. There's no practical way to maintain a serialization routine >> for >> >> all possible std::error_codes that might come up, even if they were >> powerful >> >> enough to describe everything that could go wrong (which again, being >> static >> >> kinds, they're not). With my proposal however, a JITError base class can >> be >> >> defined as: >> >> >> class JITError : public TypedErrorInfo<JITError> { >> >> public: >> >> virtual void serialize(RPCChannel &C) const = 0; >> >> }; >> >> >> >> What prevents you from using a diag handler in the jit server that >> sends errors/warnings over the RPCChannel? >> >> Cheers, >> Rafael >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160210/a0e16bd2/attachment.html>
Rafael Espíndola via llvm-dev
2016-Feb-11 03:19 UTC
[llvm-dev] [RFC] Error handling in LLVM libraries.
> In the long-term I expect many parts of LLVM may want to migrate from > std::error_code to TypedError, but that's ultimately a decision for the > maintainers of the respective parts. I'll be very happy to provide input and > support, but I don't want to pre-empt anyone else's decision on whether to > adopt this.That means we will forever have an incomplete transition. We still haven't finished moving to the "new" naming style. Moving to the current path handling took from Nov 2009 to Jun 2013. I am still not convinced that the new system is better than a asserting wrapper over std::error_code with diag handlers, but I don't have the time to change Orc to use to show it. Given that I also failed to convince you otherwise, we will probably have to go with the view of who can actually code this. But *please*, make an effort to use it everywhere. We still have a mix of bool+std::string, std::error_code and ErrorOr. Are each of these better than what you are proposing in one case or the other? Probably. Is it worth it having unique snow flakes? I don't think so. In particular, please don't change just the "MachO-specific part of X". In the past I have made an effort to keep MachO bits current when working in MC and lib/Object. Please do the corresponding effort. For example, if you change llvm-nm, please change all of it Cheers, Rafael
Lang Hames via llvm-dev
2016-Feb-11 04:40 UTC
[llvm-dev] [RFC] Error handling in LLVM libraries.
Hi Rafael,> > In the long-term I expect many parts of LLVM may want to migrate from > > std::error_code to TypedError, but that's ultimately a decision for the > > maintainers of the respective parts. I'll be very happy to provideinput and> > support, but I don't want to pre-empt anyone else's decision on whetherto> > adopt this.> That means we will forever have an incomplete transition.I included that caveat because it's not my place to insist on adoption, but it is my hope is that this system could eventually replace other error-return systems in LLVM.> But *please*, make an effort to use it everywhere. We still have a mix > of bool+std::string, std::error_code and ErrorOr. Are each of these > better than what you are proposing in one case or the other? Probably. > Is it worth it having unique snow flakes? I don't think so.I'm happy to take this approach, and I agree: having a consistent scheme for error returns everywhere is ideal.> In particular, please don't change just the "MachO-specific part of > X". In the past I have made an effort to keep MachO bits current when > working in MC and lib/Object. Please do the corresponding effort. For > example, if you change llvm-nm, please change all of itI'm also happy to do this. It's neither practical nor prudent to try to re-write the world in one go, so there's no escaping a transition period with some glue code. That's no reason not to do this though, and assuming the community's experience with the system is as positive as mine was with the prototype, I think we could aim to move to this as an essentially universal error-return solution in the future. Cheers, Lang. On Wed, Feb 10, 2016 at 7:19 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> > In the long-term I expect many parts of LLVM may want to migrate from > > std::error_code to TypedError, but that's ultimately a decision for the > > maintainers of the respective parts. I'll be very happy to provide input > and > > support, but I don't want to pre-empt anyone else's decision on whether > to > > adopt this. > > That means we will forever have an incomplete transition. > > We still haven't finished moving to the "new" naming style. Moving to > the current path handling took from Nov 2009 to Jun 2013. > > I am still not convinced that the new system is better than a > asserting wrapper over std::error_code with diag handlers, but I don't > have the time to change Orc to use to show it. Given that I also > failed to convince you otherwise, we will probably have to go with the > view of who can actually code this. > > But *please*, make an effort to use it everywhere. We still have a mix > of bool+std::string, std::error_code and ErrorOr. Are each of these > better than what you are proposing in one case or the other? Probably. > Is it worth it having unique snow flakes? I don't think so. > > In particular, please don't change just the "MachO-specific part of > X". In the past I have made an effort to keep MachO bits current when > working in MC and lib/Object. Please do the corresponding effort. For > example, if you change llvm-nm, please change all of it > > Cheers, > Rafael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160210/eb7edd3d/attachment.html>