Vedant Kumar via llvm-dev
2017-Nov-06 22:19 UTC
[llvm-dev] [RFC] Setting the current debug loc when the insertion point changes
Hi everybody, I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior. # Correctness The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630 <http://llvm.org/PR25630>. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous. To fix the issue, clients of IRBuilder should be required to either: 1. Explicitly set the correct debug location before inserting instructions, or 2. Opt-in to the current behavior of propagating the debug loc from the insertion point. # Convenience The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect. If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior. # Solutions 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc. 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true. I have a mild preference for idea #1 because it's easier to read, IMO. Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior. I'd appreciate your thoughts on this! thanks, vedant -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171106/f40d2bb6/attachment.html>
Adrian Prantl via llvm-dev
2017-Nov-06 22:32 UTC
[llvm-dev] [RFC] Setting the current debug loc when the insertion point changes
> On Nov 6, 2017, at 2:19 PM, Vedant Kumar <vsk at apple.com> wrote: > > Hi everybody, > > I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior. > > # Correctness > > The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.I agree with your analysis.> > To fix the issue, clients of IRBuilder should be required to either: > > 1. Explicitly set the correct debug location before inserting instructions, or > 2. Opt-in to the current behavior of propagating the debug loc from the insertion point. > > # Convenience > > The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect. > > If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior. > > # Solutions > > 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc. > 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true. > > I have a mild preference for idea #1 because it's easier to read, IMO. > > Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case. I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization? We should also think about how to map this to the C API. -- adrian> > I'd appreciate your thoughts on this! > > thanks, > vedant
Davide Italiano via llvm-dev
2017-Nov-06 22:42 UTC
[llvm-dev] [RFC] Setting the current debug loc when the insertion point changes
On Mon, Nov 6, 2017 at 2:19 PM, Vedant Kumar <vsk at apple.com> wrote:> Hi everybody, > > I'm investigating some correctness issues with debug line table information > in optimized code. I've noticed something problematic in IRBuilder. Setting > the insertion point of an IRBuilder sets the builder’s current debug loc to > the one attached to the insertion point. IMO this isn't always a correct or > convenient thing to do, and it shouldn't be the default behavior. > > # Correctness > > The IRBuilder shouldn't assume that the debug loc at its insertion point > applies to any instructions it emits. It doesn't have enough information to > make that connection. This assumption leads to bugs like llvm.org/PR25630. > I.e it creates situations in which we silently push around incorrect > DILocations without realizing it, because the debug locs in the IR look nice > and contiguous. >I remember that one, and I got to your same conclusion at the time when analyzing the bug, so thanks for writing this down :)> To fix the issue, clients of IRBuilder should be required to either: > > 1. Explicitly set the correct debug location before inserting instructions, > or > 2. Opt-in to the current behavior of propagating the debug loc from the > insertion point. > > # Convenience > > The current behavior of IRBuilder can be convenient when it's correct. But > it can be *really* inconvenient when it's incorrect. > > If a client sets an IRBuilder's debug loc correctly and makes a call that > changes the builder's insertion point, all generated instructions will pick > up a bogus debug loc from the IP. E.g this is what happens with > SCEVExpander::expandCodeFor(). There's no way to fix this bug without > changing IRBuilder's behavior. > > # Solutions > > 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically > to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't > change the current debug loc. > 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which > defaults to false. Only change the current debug loc if UpdateDbgLoc is > true. > > I have a mild preference for idea #1 because it's easier to read, IMO. > > Whichever solution we pick, my plan is to first migrate all in-tree users of > SetInsertPoint() to the explicit API, and to then gradually fix passes which > rely on the current incorrect behavior. >My preference is as well for 1. I kind of dislike adding bool parameters to API as it becomes easier to get them wrong. Your plan makes sense to me, regardless. I'll be happy to review the patches. Thanks! -- Davide
Vedant Kumar via llvm-dev
2017-Nov-06 22:50 UTC
[llvm-dev] [RFC] Setting the current debug loc when the insertion point changes
> On Nov 6, 2017, at 2:32 PM, Adrian Prantl <aprantl at apple.com> wrote: > > >> On Nov 6, 2017, at 2:19 PM, Vedant Kumar <vsk at apple.com> wrote: >> >> Hi everybody, >> >> I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior. >> >> # Correctness >> >> The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous. > > I agree with your analysis. > >> >> To fix the issue, clients of IRBuilder should be required to either: >> >> 1. Explicitly set the correct debug location before inserting instructions, or >> 2. Opt-in to the current behavior of propagating the debug loc from the insertion point. >> >> # Convenience >> >> The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect. >> >> If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior. >> >> # Solutions >> >> 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc. >> 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true. >> >> I have a mild preference for idea #1 because it's easier to read, IMO. >> >> Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior. > > Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence. I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API. OTOH, if we added a method like SetInsertPointAndDebugLoc(), we could incrementally eliminate as many uses of that method as possible.> I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization?Sure :).> We should also think about how to map this to the C API.Wdyt of exposing SetInsertPointAndDebugLoc() via the C API, with a note explaining that it will eventually be deprecated? vedant> > -- adrian > >> >> I'd appreciate your thoughts on this! >> >> thanks, >> vedant >