Quentin Colombet via llvm-dev
2019-Feb-14 19:20 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Hi all, As much as possible I would rather we avoid any kind of metadata in MIR to express the semantic of instructions. Instead I would prefer that each back provides a way to interpret what an instruction is doing. What I have in mind is something that would generalize what we do in the peephole optimizer for instance (look for isRegSequenceLike/getRegSequenceInputs and co.) or what we have for analyzing branches. One way we could do that and that was discussed in the past would be to describe each instruction in terms of the generic mir operations. Ultimately we could get a lot of this semantic information automatically populated by TableGen using the ISel patterns, like dagger does (github.com/repzret/dagger). Anyway, for the most part, I believe we could implement the “interpreter” for just a handful of instruction and get 90% of the information right. Cheers, -Quentin> On Feb 14, 2019, at 9:20 AM, Adrian Prantl <aprantl at apple.com> wrote: > > > >> On Feb 13, 2019, at 8:44 PM, Francis Visoiu Mistrih <francisvm at yahoo.com> wrote: >> >> Hi, >> >> [+ Quentin] >> >> Sorry for the late reply. >> >>> On Feb 13, 2019, at 9:09 AM, Nikola Prica <nikola.prica at rt-rk.com> wrote: >>> >>> On 12.02.2019. 18:06, Adrian Prantl wrote: >>>> [+ some folks more knowledgable about the Machine layer than me.] >>>> >>> That would be useful for us too! :) >>> >>> >>>>> On Feb 12, 2019, at 5:07 AM, Nikola Prica <nikola.prica at rt-rk.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am one of the authors of this feature. On Phabricator, we agreed to >>>>> take discussion whether encoding this in IR and threading it through the >>>>> compiler or performing a late MIR analysis is the better approach. >>>>> >>>>> Regarding late MIR pass approach, it would need to go back from call >>>>> instruction by recognizing parameter's forwarding instructions and >>>>> interpret them. We could interpret register moves, immediate moves and >>>>> some of stack object loadings. There would be need to write >>>>> interpretation of various architectures instructions. We were not >>>>> positive about completeness of such recognition. >>>> >>>> So you're saying that in late MIR, the target-specific MachineInstructions don't have enough generic meta information to understand where data was copied/loaded from in a target-independent way? Would it be easier in earlier pre-regalloc MIR, or does that have the same problem because the instructions are already target-specific? >>>> >>> >>> It has enough generic meta information for some kind of instructions but >>> not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad >>> that can be useful for recognizing some kind of parameter loading >>> instructions. But we're not quite sure whether it is enough for >>> recognizing all of them. For example there is no support for recognizing >>> X86::LEA instructions with such mechanism (and there is significant >>> number of such parameter loading instructions). This instruction made us >>> give up with late MIR pass approach because we were not sure about >>> various architectures complexities. As a result from tacking it from >>> ISEL phase we were able to get entry values from some LEA instructions. >>> This is very important for this approach. Currently, for various >>> architectures, this approach gives us more information about values >>> loaded into parameter forwarding registers than late MIR pass would give >>> use (because of lack of special instruction interpretation). >>> >> >> I discussed this with Jessica offline today, and all this seems correct. We don’t have enough generic information to gather this in MIR. I agree that taking the MIR approach sounds attractive since we don’t have to carry all the debug instructions around, but it seems to me that a lot of these would either need special handling for every instruction that has no metadata (in this case instructions like LEA). > > You seem to imply that this extra metadata is not a long-wanted missing feature of MIR that we've just been waiting for an excuse to implement? Because otherwise it might be the right technical solution to augment MIR to be able to capture the effect lea & friends. > >> >>> >>> But nevertheless, in the end, we lose some information through the >>> optimization pipeline, and in order to salvage some information we >>> implemented target specific machine instruction interpreter. For example >>> situations like: >>> >>> %vreg = LEA <smt> >>> $rdi = COPY %vreg >>> call foo >>> DBG_CALLSITE >>> *DBG_CALLSITEPARAM $rdi, , %vreg >>> >>> ==== replace %vreg with $rdi ===>>> >>> %rdi = LEA <smt> >>> $rdi = COPY %rdi >>> call foo >>> DBG_CALLSITE >>> *DBG_CALLSITEPARAM $rdi, , %rdi >>> >>> ==== delete redudant instruction identities ===>>> >>> >>> %rdi = LEA <smt> >>> call foo >>> DBG_CALLSITE >>> *DBG_CALLSITEPARAM $rdi, , %rdi >>> >>> >>> In order to salvage this we go backward from call instruction and try to >>> interpret instruction that loads value to $rdi. >>> >>> This salvaging part could be used for interpreting in late MIR pass but >>> it would need extension for other architecture specific instructions. >>> But with current approach that starts tracking parameter forwarding >>> instructions from ISEL phase we are able to cover some of them without >>> such interpretor. >> >> I don’t have an opinion on this, but it sounds like this is going to grow into the full-MIR solution that Adrian was suggesting from the beginning. > > Just to clarify the record here, they actually have posted a working implementation that gathers this information at the IR level and threads it all the way through MIR. I've been the one asking whether we couldn't do a very late analysis instead :-) > > My motivation being that teaching all IR and MIR passes about the additional debug metadata has a high maintenance cost and may break more easily as new passes or instruction selectors are added. It's quite possible that the conclusion will be that the approach taken by the patch set is the right trade-off, but I want to make sure that we're rejecting the alternatives for the right technical reasons. > >> >>> >>> ISEL phase is important for matching DICallSiteParam metadata to >>> respective DBG_CALLSITEPARAM. > > Who is responsible for generating DICallSiteParam annotations? Is it the frontend or is it the codegenprepare(?) pass that knows about the calling convention? > >>> It also recognizes COPY instructions that >>> are part of calling sequence but do not forward any argument (for >>> example for variadic, C calling convention, functions we have copy to AL >>> register). If we are able to dispatch such non transferring argument >>> copy instructions from calling convention and we potentially drop IR >>> metadata about call site parameters, we might be able to do all >>> necessary parameter's tracking in some separate MIR pass (not very late, >>> somewhere after ISEL). > > Since we have three instruction selectors, that sounds intriguing to me :-) > > If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is preferable over adding metadata to recognize LEA-style instructions, perhaps we can stage the patch reviews in way that we get to a working, but suboptimal implementation that skips the IR call site annotations, and then look at the question whether it's feasible to extract more information from the calling convention lowering. The one wrinkle I see is that the calling convention lowering presumably happens before ISEL, so we may still need to update all three ISEL implementations. That might still be better than also teaching all IR passes about new debug info. > > Here's a proposal for how we could proceed: > 1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to recognize LEA semantics and implement an analysis > 2. Land above MIR support for call site parameters > 3. if (a), land support for introducing DBG_CALLSITEPARAM either in calling convention lowering or post-ISEL > 4. if that isn't good enough discuss whether IR call site parameters are the best solution > > let me know if that makes sense. > > Thanks, > adrian > >>> >>>>> However, such analysis >>>>> might not be complete as current approach. It would not be able to >>>>> produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of >>>>> call site parameter as a late pass. >>>>> >>>>> As example think of callee function that forwards its argument in >>>>> function call in entry block and never uses that argument again: >>>>> %vreg = COPY $rsi; -> >>>>> …. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg> >>>>> $rsi = COPY $vreg; -> call foo >>>>> call foo -> >>>>> >>>> >>>> I'm not sure I can follow this example (yet). Is this about creating a call site parameter for an argument of the call to foo, or is it meant to illustrate that the call to foo() makes it hard to write a backwards analysis for inserting a call site parameter for a call site below the call to foo()? >>>>>> This is the case from ‘VirtRegMap’ pass, but I think it can happen >>>>> elsewhere. >>>> >>>> As I'm not super familiar with the various MIR passes: Is VirtRegMap the pass inserting the vreg copy from $rsi here? Does it do something else? >>>> >>> >>> Oh, I didn't explain it fully. In virtual register rewriter, for >>> previous example %vreg gets replaced with $rsi and we get two 'identity >>> copies' ($rsi = COPY $rsi) that get deleted. Such situation is special >>> for call at entry block whose argument is callee's argument that is used >>> only at that place. For example like: >>> >>> baa(int a) { >>> foo(a); >>> <code that does not use 'a' variable>; >>> } >>> >>> Variable 'a' is dead after 'foo' call and there is no interest in >>> preserving it in further flow of function. At that point we lose >>> information about parameter forwarding instruction. No instruction, no >>> way to interpret it. In order to track such situations we need >>> DBG_CALLSITEPARAM instructions to track parameter transferring register >>> through the backend. For situations like this we use DICallSiteParam in >>> order to find it in previous frame. One late pass wont do the trick. >>> >>> Call at entry block that implicitly re-forwards argument is special >>> situation and can possibly be handled with emitting DW_OP_entry_value >>> for call site parameter value expression. >>> >>> Also, it is worth mentioning that for situations where we could have >>> call of 'foo(a)' nested at some machine block, parameter loading >>> instruction can be in different block than the call of 'foo(a)'. Such >>> situations would not be handled so easily by late pass. >>> >>>>> Recreation of this might be possible when function ‘foo’ is >>>>> in current compilation module, but we are not sure if it is possible for >>>>> external modules calls. In order to follow such cases we need >>>>> DBG_CALLSITEPARAM that can track such situation. >>>> >>>> What information exactly would you need about foo's implementation that you cannot get from just knowing the calling convention? >>>> >>> >>> Having in mind previous example where we have just call of fucntion foo, >>> we would need to know how many arguments does 'foo' have and through >>> which registers are they forwarded. I'm not sure how would we get such >>> information? >> >> I believe you would have to use the IR Function attached to the MachineFunction and the target calling convention generated from TableGen. This is what targets already implement in [Target]ISelLowering::LowerCall. >> >> Thanks, >> >> -- >> Francis >> >>> >>>>> >>>>> Since after ISEL phase we have explicit pseudo COPY instructions that >>>>> forward argument to another function frame, it came naturally to >>>>> recognize such instructions at this stage. There we can say with 100% >>>>> certainty that those instruction indeed forward function arguments. >>>> >>>> My main motivation for this discussion is to minimize the added complexity of the feature. For example, if we figure out that we can get by without introducing new IR constructs (that optimization authors would need to be taught about, that would need to be supported in all three instruction selectors) and can get by with only adding new MIR instructions, that would be a win. However, if we can prove that deferring the analysis to a later stage would result in inferior quality then the extra maintenance burden of new IR constructs may be the right tradeoff. >>>>> Thanks for taking the time to walk me through your thought process. >>>> -- adrian >>> >>> >>> In general we use DICallSiteParam as a backup solution (when we lose >>> track of location loaded int parameter forwarding register) for >>> representing value at entry point. As a consequence we are able to >>> produce 'DW_OP_entry_values' in 'DW_AT_call_site_value'(GCC generates >>> such expressions) which allow us to go 2 or more frames behind. I've >>> showed one example for calls at entry block that could also produce such >>> expressions without DICallSiteParam, but that is the only case that I >>> can think of now. But since it is a backup when something fails it could >>> at some point be removed once it is no longer needed as backup. >>> Currently it gives use more information about call site forwarding values. >>> >>> Thanks for time! It is our common goal to make this right! >>> --Nikola-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20190214/9e6265c6/attachment.html>
Nikola Prica via llvm-dev
2019-Feb-18 16:37 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
On 14.02.2019. 20:20, Quentin Colombet wrote:> Hi all, > > As much as possible I would rather we avoid any kind of metadata in MIR > to express the semantic of instructions. > Instead I would prefer that each back provides a way to interpret what > an instruction is doing. What I have in mind is something that would > generalize what we do in the peephole optimizer for instance (look for > isRegSequenceLike/getRegSequenceInputs and co.) or what we have for > analyzing branches. > One way we could do that and that was discussed in the past would be to > describe each instruction in terms of the generic mir operations. > > Ultimately we could get a lot of this semantic information automatically > populated by TableGen using the ISel patterns, like dagger does > (github.com/repzret/dagger). > > Anyway, for the most part, I believe we could implement the > “interpreter” for just a handful of instruction and get 90% of the > information right. >This seems interesting. We will need to investigate this further if we decide to take interpreting approach.> Cheers, > -Quentin > >> On Feb 14, 2019, at 9:20 AM, Adrian Prantl <aprantl at apple.com >> <mailto:aprantl at apple.com>> wrote: >> >> >> >>> On Feb 13, 2019, at 8:44 PM, Francis Visoiu Mistrih >>> <francisvm at yahoo.com <mailto:francisvm at yahoo.com>> wrote: >>> >>> Hi, >>> >>> [+ Quentin] >>> >>> Sorry for the late reply. >>> >>>> On Feb 13, 2019, at 9:09 AM, Nikola Prica <nikola.prica at rt-rk.com >>>> <mailto:nikola.prica at rt-rk.com>> wrote: >>>> >>>> On 12.02.2019. 18:06, Adrian Prantl wrote: >>>>> [+ some folks more knowledgable about the Machine layer than me.] >>>>> >>>> That would be useful for us too! :) >>>> >>>> >>>>>> On Feb 12, 2019, at 5:07 AM, Nikola Prica <nikola.prica at rt-rk.com >>>>>> <mailto:nikola.prica at rt-rk.com>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am one of the authors of this feature. On Phabricator, we agreed to >>>>>> take discussion whether encoding this in IR and threading it >>>>>> through the >>>>>> compiler or performing a late MIR analysis is the better approach. >>>>>> >>>>>> Regarding late MIR pass approach, it would need to go back from call >>>>>> instruction by recognizing parameter's forwarding instructions and >>>>>> interpret them. We could interpret register moves, immediate moves and >>>>>> some of stack object loadings. There would be need to write >>>>>> interpretation of various architectures instructions. We were not >>>>>> positive about completeness of such recognition. >>>>> >>>>> So you're saying that in late MIR, the target-specific >>>>> MachineInstructions don't have enough generic meta information to >>>>> understand where data was copied/loaded from in a >>>>> target-independent way? Would it be easier in earlier pre-regalloc >>>>> MIR, or does that have the same problem because the instructions >>>>> are already target-specific? >>>>> >>>> >>>> It has enough generic meta information for some kind of instructions but >>>> not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad >>>> that can be useful for recognizing some kind of parameter loading >>>> instructions. But we're not quite sure whether it is enough for >>>> recognizing all of them. For example there is no support for recognizing >>>> X86::LEA instructions with such mechanism (and there is significant >>>> number of such parameter loading instructions). This instruction made us >>>> give up with late MIR pass approach because we were not sure about >>>> various architectures complexities. As a result from tacking it from >>>> ISEL phase we were able to get entry values from some LEA instructions. >>>> This is very important for this approach. Currently, for various >>>> architectures, this approach gives us more information about values >>>> loaded into parameter forwarding registers than late MIR pass would give >>>> use (because of lack of special instruction interpretation). >>>> >>> >>> I discussed this with Jessica offline today, and all this seems >>> correct. We don’t have enough generic information to gather this in >>> MIR. I agree that taking the MIR approach sounds attractive since we >>> don’t have to carry all the debug instructions around, but it seems >>> to me that a lot of these would either need special handling for >>> every instruction that has no metadata (in this case instructions >>> like LEA). >> >> You seem to imply that this extra metadata is not a long-wanted >> missing feature of MIR that we've just been waiting for an excuse to >> implement? Because otherwise it might be the right technical solution >> to augment MIR to be able to capture the effect lea & friends. >> >>> >>>> >>>> But nevertheless, in the end, we lose some information through the >>>> optimization pipeline, and in order to salvage some information we >>>> implemented target specific machine instruction interpreter. For example >>>> situations like: >>>> >>>> %vreg = LEA <smt> >>>> $rdi = COPY %vreg >>>> call foo >>>> DBG_CALLSITE >>>> *DBG_CALLSITEPARAM $rdi, , %vreg >>>> >>>> ==== replace %vreg with $rdi ===>>>> >>>> %rdi = LEA <smt> >>>> $rdi = COPY %rdi >>>> call foo >>>> DBG_CALLSITE >>>> *DBG_CALLSITEPARAM $rdi, , %rdi >>>> >>>> ==== delete redudant instruction identities ===>>>> >>>> >>>> %rdi = LEA <smt> >>>> call foo >>>> DBG_CALLSITE >>>> *DBG_CALLSITEPARAM $rdi, , %rdi >>>> >>>> >>>> In order to salvage this we go backward from call instruction and try to >>>> interpret instruction that loads value to $rdi. >>>> >>>> This salvaging part could be used for interpreting in late MIR pass but >>>> it would need extension for other architecture specific instructions. >>>> But with current approach that starts tracking parameter forwarding >>>> instructions from ISEL phase we are able to cover some of them without >>>> such interpretor. >>> >>> I don’t have an opinion on this, but it sounds like this is going to >>> grow into the full-MIR solution that Adrian was suggesting from the >>> beginning. >> >> Just to clarify the record here, they actually have posted a working >> implementation that gathers this information at the IR level and >> threads it all the way through MIR. I've been the one asking whether >> we couldn't do a very late analysis instead :-) >> >> My motivation being that teaching all IR and MIR passes about the >> additional debug metadata has a high maintenance cost and may break >> more easily as new passes or instruction selectors are added. It's >> quite possible that the conclusion will be that the approach taken by >> the patch set is the right trade-off, but I want to make sure that >> we're rejecting the alternatives for the right technical reasons. >> >>> >>>> >>>> ISEL phase is important for matching DICallSiteParam metadata to >>>> respective DBG_CALLSITEPARAM. >> >> Who is responsible for generating DICallSiteParam annotations? Is it >> the frontend or is it the codegenprepare(?) pass that knows about the >> calling convention? >>It is fronted job. It basically call expression visitor that produces it. It basically tries to provide expression that needs to be printed in caller's frame. For now it is not able to provide function call expression or expression with more than one variable.>>>> It also recognizes COPY instructions that >>>> are part of calling sequence but do not forward any argument (for >>>> example for variadic, C calling convention, functions we have copy to AL >>>> register). If we are able to dispatch such non transferring argument >>>> copy instructions from calling convention and we potentially drop IR >>>> metadata about call site parameters, we might be able to do all >>>> necessary parameter's tracking in some separate MIR pass (not very late, >>>> somewhere after ISEL). >> >> Since we have three instruction selectors, that sounds intriguing to >> me :-)Algorithm is implemented only for SelectionDAG but it could be easily extended for FastISel. It basically works on chained call sequence of SDNodes. It tries to match copied SDNode to one of the input argument SDNode. It would certainly work better if this implementation is lowered to target specific part where call sequence is been generated, but we have looked for general solution.>> >> If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is >> preferable over adding metadata to recognize LEA-style instructions, >> perhaps we can stage the patch reviews in way that we get to a >> working, but suboptimal implementation that skips the IR call site >> annotations, and then look at the question whether it's feasible to >> extract more information from the calling convention lowering. The one >> wrinkle I see is that the calling convention lowering presumably >> happens before ISEL, so we may still need to update all three ISEL >> implementations. That might still be better than also teaching all IR >> passes about new debug info. >>Just to clarify, IR call site implementation does not give support for recognizing LEA-style instructions. Tracking locations loaded into parameter forwarding register from ISEL phase does this by tracking loaded location. But we agree that it could be best to defer adding of IR call site annotation for later. Call lowering happens at ISEL, but calling convention is used before ISEL to adjust forwarding arguments context? Am I right? post-ISEL MIR has pretty much generic look that could be processed in search of copy instructions that forward function arguments. It is still in SSA form. It looks something like: ADJCALLSTACKDOWN64 %8:gr64 = LEA64r %stack.0.local1, 1, $noreg, 0, $noreg %9:gr32 = MOV32ri 10 %10:gr32 = MOV32ri 15 $rdi = COPY %8, debug-location !25 $esi = COPY %1, debug-location !25 $edx = COPY %9, debug-location !25 $ecx = COPY %10, debug-location !25 $r8d = COPY %6, debug-location !25 $r9d = COPY %7, debug-location !25 CALL64pcrel32 @foo ADJCALLSTACKUP64 At MIR level, we are aware that we could extract calling convention from foo but we are not still sure how would we recognize COPY instructions that do not transfer call arguments. There are situations like loading of AL for functions with variable number of arguments, or like loading for global based registers for i686 compiled with PIC, etc. Currently we are trying to investigate how we could recognize such instruction on MIR level. Thanks, Nikola>> Here's a proposal for how we could proceed: >> 1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to >> recognize LEA semantics and implement an analysis >> 2. Land above MIR support for call site parameters >> 3. if (a), land support for introducing DBG_CALLSITEPARAM either in >> calling convention lowering or post-ISEL >> 4. if that isn't good enough discuss whether IR call site parameters >> are the best solution >> >> let me know if that makes sense. >> >> Thanks, >> adrian >> >>>> >>>>>> However, such analysis >>>>>> might not be complete as current approach. It would not be able to >>>>>> produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of >>>>>> call site parameter as a late pass. >>>>>> >>>>>> As example think of callee function that forwards its argument in >>>>>> function call in entry block and never uses that argument again: >>>>>> %vreg = COPY $rsi; -> >>>>>> …. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg> >>>>>> $rsi = COPY $vreg; -> call foo >>>>>> call foo -> >>>>>> >>>>> >>>>> I'm not sure I can follow this example (yet). Is this about >>>>> creating a call site parameter for an argument of the call to foo, >>>>> or is it meant to illustrate that the call to foo() makes it hard >>>>> to write a backwards analysis for inserting a call site parameter >>>>> for a call site below the call to foo()? >>>>>>> This is the case from ‘VirtRegMap’ pass, but I think it can happen >>>>>> elsewhere. >>>>> >>>>> As I'm not super familiar with the various MIR passes: Is >>>>> VirtRegMap the pass inserting the vreg copy from $rsi here? Does it >>>>> do something else? >>>>> >>>> >>>> Oh, I didn't explain it fully. In virtual register rewriter, for >>>> previous example %vreg gets replaced with $rsi and we get two 'identity >>>> copies' ($rsi = COPY $rsi) that get deleted. Such situation is special >>>> for call at entry block whose argument is callee's argument that is used >>>> only at that place. For example like: >>>> >>>> baa(int a) { >>>> foo(a); >>>> <code that does not use 'a' variable>; >>>> } >>>> >>>> Variable 'a' is dead after 'foo' call and there is no interest in >>>> preserving it in further flow of function. At that point we lose >>>> information about parameter forwarding instruction. No instruction, no >>>> way to interpret it. In order to track such situations we need >>>> DBG_CALLSITEPARAM instructions to track parameter transferring register >>>> through the backend. For situations like this we use DICallSiteParam in >>>> order to find it in previous frame. One late pass wont do the trick. >>>> >>>> Call at entry block that implicitly re-forwards argument is special >>>> situation and can possibly be handled with emitting DW_OP_entry_value >>>> for call site parameter value expression. >>>> >>>> Also, it is worth mentioning that for situations where we could have >>>> call of 'foo(a)' nested at some machine block, parameter loading >>>> instruction can be in different block than the call of 'foo(a)'. Such >>>> situations would not be handled so easily by late pass. >>>> >>>>>> Recreation of this might be possible when function ‘foo’ is >>>>>> in current compilation module, but we are not sure if it is >>>>>> possible for >>>>>> external modules calls. In order to follow such cases we need >>>>>> DBG_CALLSITEPARAM that can track such situation. >>>>> >>>>> What information exactly would you need about foo's implementation >>>>> that you cannot get from just knowing the calling convention? >>>>> >>>> >>>> Having in mind previous example where we have just call of fucntion foo, >>>> we would need to know how many arguments does 'foo' have and through >>>> which registers are they forwarded. I'm not sure how would we get such >>>> information? >>> >>> I believe you would have to use the IR Function attached to the >>> MachineFunction and the target calling convention generated from >>> TableGen. This is what targets already implement in >>> [Target]ISelLowering::LowerCall. >>> >>> Thanks, >>> >>> -- >>> Francis >>> >>>> >>>>>> >>>>>> Since after ISEL phase we have explicit pseudo COPY instructions that >>>>>> forward argument to another function frame, it came naturally to >>>>>> recognize such instructions at this stage. There we can say with 100% >>>>>> certainty that those instruction indeed forward function arguments. >>>>> >>>>> My main motivation for this discussion is to minimize the added >>>>> complexity of the feature. For example, if we figure out that we >>>>> can get by without introducing new IR constructs (that optimization >>>>> authors would need to be taught about, that would need to be >>>>> supported in all three instruction selectors) and can get by with >>>>> only adding new MIR instructions, that would be a win. However, if >>>>> we can prove that deferring the analysis to a later stage would >>>>> result in inferior quality then the extra maintenance burden of new >>>>> IR constructs may be the right tradeoff. >>>>>> Thanks for taking the time to walk me through your thought process. >>>>> -- adrian >>>> >>>> >>>> In general we use DICallSiteParam as a backup solution (when we lose >>>> track of location loaded int parameter forwarding register) for >>>> representing value at entry point. As a consequence we are able to >>>> produce 'DW_OP_entry_values' in 'DW_AT_call_site_value'(GCC generates >>>> such expressions) which allow us to go 2 or more frames behind. I've >>>> showed one example for calls at entry block that could also produce such >>>> expressions without DICallSiteParam, but that is the only case that I >>>> can think of now. But since it is a backup when something fails it could >>>> at some point be removed once it is no longer needed as backup. >>>> Currently it gives use more information about call site forwarding >>>> values. >>>> >>>> Thanks for time! It is our common goal to make this right! >>>> --Nikola >
Nikola Prica via llvm-dev
2019-Feb-22 10:49 UTC
[llvm-dev] RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
Hi, We have done some investigation. Please find my comment inlined bellow.> > On 14.02.2019. 20:20, Quentin Colombet wrote: >> Hi all, >> >> As much as possible I would rather we avoid any kind of metadata in MIR >> to express the semantic of instructions. >> Instead I would prefer that each back provides a way to interpret what >> an instruction is doing. What I have in mind is something that would >> generalize what we do in the peephole optimizer for instance (look for >> isRegSequenceLike/getRegSequenceInputs and co.) or what we have for >> analyzing branches. >> One way we could do that and that was discussed in the past would be to >> describe each instruction in terms of the generic mir operations. >> >> Ultimately we could get a lot of this semantic information automatically >> populated by TableGen using the ISel patterns, like dagger does >> (github.com/repzret/dagger). >>> Anyway, for the most part, I believe we could implement the >> “interpreter” for just a handful of instruction and get 90% of the >> information right. >> > > This seems interesting. We will need to investigate this further if we > decide to take interpreting approach. > >> Cheers, >> -Quentin >> >>> On Feb 14, 2019, at 9:20 AM, Adrian Prantl <aprantl at apple.com >>> <mailto:aprantl at apple.com>> wrote: >>> >>> >>> >>>> On Feb 13, 2019, at 8:44 PM, Francis Visoiu Mistrih >>>> <francisvm at yahoo.com <mailto:francisvm at yahoo.com>> wrote: >>>> >>>> Hi, >>>> >>>> [+ Quentin] >>>> >>>> Sorry for the late reply. >>>> >>>>> On Feb 13, 2019, at 9:09 AM, Nikola Prica <nikola.prica at rt-rk.com >>>>> <mailto:nikola.prica at rt-rk.com>> wrote: >>>>> >>>>> On 12.02.2019. 18:06, Adrian Prantl wrote: >>>>>> [+ some folks more knowledgable about the Machine layer than me.] >>>>>> >>>>> That would be useful for us too! :) >>>>> >>>>> >>>>>>> On Feb 12, 2019, at 5:07 AM, Nikola Prica <nikola.prica at rt-rk.com >>>>>>> <mailto:nikola.prica at rt-rk.com>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am one of the authors of this feature. On Phabricator, we agreed to >>>>>>> take discussion whether encoding this in IR and threading it >>>>>>> through the >>>>>>> compiler or performing a late MIR analysis is the better approach. >>>>>>> >>>>>>> Regarding late MIR pass approach, it would need to go back from call >>>>>>> instruction by recognizing parameter's forwarding instructions and >>>>>>> interpret them. We could interpret register moves, immediate moves and >>>>>>> some of stack object loadings. There would be need to write >>>>>>> interpretation of various architectures instructions. We were not >>>>>>> positive about completeness of such recognition. >>>>>> >>>>>> So you're saying that in late MIR, the target-specific >>>>>> MachineInstructions don't have enough generic meta information to >>>>>> understand where data was copied/loaded from in a >>>>>> target-independent way? Would it be easier in earlier pre-regalloc >>>>>> MIR, or does that have the same problem because the instructions >>>>>> are already target-specific? >>>>>> >>>>> >>>>> It has enough generic meta information for some kind of instructions but >>>>> not for all. MachineInstr has bits for isMoveImm, isMoveReg and mayLoad >>>>> that can be useful for recognizing some kind of parameter loading >>>>> instructions. But we're not quite sure whether it is enough for >>>>> recognizing all of them. For example there is no support for recognizing >>>>> X86::LEA instructions with such mechanism (and there is significant >>>>> number of such parameter loading instructions). This instruction made us >>>>> give up with late MIR pass approach because we were not sure about >>>>> various architectures complexities. As a result from tacking it from >>>>> ISEL phase we were able to get entry values from some LEA instructions. >>>>> This is very important for this approach. Currently, for various >>>>> architectures, this approach gives us more information about values >>>>> loaded into parameter forwarding registers than late MIR pass would give >>>>> use (because of lack of special instruction interpretation). >>>>> >>>> >>>> I discussed this with Jessica offline today, and all this seems >>>> correct. We don’t have enough generic information to gather this in >>>> MIR. I agree that taking the MIR approach sounds attractive since we >>>> don’t have to carry all the debug instructions around, but it seems >>>> to me that a lot of these would either need special handling for >>>> every instruction that has no metadata (in this case instructions >>>> like LEA). >>> >>> You seem to imply that this extra metadata is not a long-wanted >>> missing feature of MIR that we've just been waiting for an excuse to >>> implement? Because otherwise it might be the right technical solution >>> to augment MIR to be able to capture the effect lea & friends. >>> >>>> >>>>> >>>>> But nevertheless, in the end, we lose some information through the >>>>> optimization pipeline, and in order to salvage some information we >>>>> implemented target specific machine instruction interpreter. For example >>>>> situations like: >>>>> >>>>> %vreg = LEA <smt> >>>>> $rdi = COPY %vreg >>>>> call foo >>>>> DBG_CALLSITE >>>>> *DBG_CALLSITEPARAM $rdi, , %vreg >>>>> >>>>> ==== replace %vreg with $rdi ===>>>>> >>>>> %rdi = LEA <smt> >>>>> $rdi = COPY %rdi >>>>> call foo >>>>> DBG_CALLSITE >>>>> *DBG_CALLSITEPARAM $rdi, , %rdi >>>>> >>>>> ==== delete redudant instruction identities ===>>>>> >>>>> >>>>> %rdi = LEA <smt> >>>>> call foo >>>>> DBG_CALLSITE >>>>> *DBG_CALLSITEPARAM $rdi, , %rdi >>>>> >>>>> >>>>> In order to salvage this we go backward from call instruction and try to >>>>> interpret instruction that loads value to $rdi. >>>>> >>>>> This salvaging part could be used for interpreting in late MIR pass but >>>>> it would need extension for other architecture specific instructions. >>>>> But with current approach that starts tracking parameter forwarding >>>>> instructions from ISEL phase we are able to cover some of them without >>>>> such interpretor. >>>> >>>> I don’t have an opinion on this, but it sounds like this is going to >>>> grow into the full-MIR solution that Adrian was suggesting from the >>>> beginning. >>> >>> Just to clarify the record here, they actually have posted a working >>> implementation that gathers this information at the IR level and >>> threads it all the way through MIR. I've been the one asking whether >>> we couldn't do a very late analysis instead :-) >>> >>> My motivation being that teaching all IR and MIR passes about the >>> additional debug metadata has a high maintenance cost and may break >>> more easily as new passes or instruction selectors are added. It's >>> quite possible that the conclusion will be that the approach taken by >>> the patch set is the right trade-off, but I want to make sure that >>> we're rejecting the alternatives for the right technical reasons. >>> >>>> >>>>> >>>>> ISEL phase is important for matching DICallSiteParam metadata to >>>>> respective DBG_CALLSITEPARAM. >>> >>> Who is responsible for generating DICallSiteParam annotations? Is it >>> the frontend or is it the codegenprepare(?) pass that knows about the >>> calling convention? >>> > > It is fronted job. It basically call expression visitor that produces > it. It basically tries to provide expression that needs to be printed in > caller's frame. For now it is not able to provide function call > expression or expression with more than one variable. > >>>>> It also recognizes COPY instructions that >>>>> are part of calling sequence but do not forward any argument (for >>>>> example for variadic, C calling convention, functions we have copy to AL >>>>> register). If we are able to dispatch such non transferring argument >>>>> copy instructions from calling convention and we potentially drop IR >>>>> metadata about call site parameters, we might be able to do all >>>>> necessary parameter's tracking in some separate MIR pass (not very late, >>>>> somewhere after ISEL). >>> >>> Since we have three instruction selectors, that sounds intriguing to >>> me :-) > > > Algorithm is implemented only for SelectionDAG but it could be easily > extended for FastISel. It basically works on chained call sequence of > SDNodes. It tries to match copied SDNode to one of the input argument > SDNode. It would certainly work better if this implementation is lowered > to target specific part where call sequence is been generated, but we > have looked for general solution. > >>> >>> If everybody agrees that adding DBG_CALLSITEPARAM MIR instructions is >>> preferable over adding metadata to recognize LEA-style instructions, >>> perhaps we can stage the patch reviews in way that we get to a >>> working, but suboptimal implementation that skips the IR call site >>> annotations, and then look at the question whether it's feasible to >>> extract more information from the calling convention lowering. The one >>> wrinkle I see is that the calling convention lowering presumably >>> happens before ISEL, so we may still need to update all three ISEL >>> implementations. That might still be better than also teaching all IR >>> passes about new debug info. >>> > > Just to clarify, IR call site implementation does not give support for > recognizing LEA-style instructions. Tracking locations loaded into > parameter forwarding register from ISEL phase does this by tracking > loaded location. But we agree that it could be best to defer adding of > IR call site annotation for later. > > Call lowering happens at ISEL, but calling convention is used before > ISEL to adjust forwarding arguments context? Am I right? > > post-ISEL MIR has pretty much generic look that could be processed in > search of copy instructions that forward function arguments. It is still > in SSA form. It looks something like: > > ADJCALLSTACKDOWN64 > %8:gr64 = LEA64r %stack.0.local1, 1, $noreg, 0, $noreg > %9:gr32 = MOV32ri 10 > %10:gr32 = MOV32ri 15 > $rdi = COPY %8, debug-location !25 > $esi = COPY %1, debug-location !25 > $edx = COPY %9, debug-location !25 > $ecx = COPY %10, debug-location !25 > $r8d = COPY %6, debug-location !25 > $r9d = COPY %7, debug-location !25 > CALL64pcrel32 @foo > ADJCALLSTACKUP64 > > At MIR level, we are aware that we could extract calling convention from > foo but we are not still sure how would we recognize COPY instructions > that do not transfer call arguments. There are situations like loading > of AL for functions with variable number of arguments, or like loading > for global based registers for i686 compiled with PIC, etc. Currently we > are trying to investigate how we could recognize such instruction on MIR > level. > > Thanks, > > Nikola > >>> Here's a proposal for how we could proceed: >>> 1. Decide whether to add (a) DBG_CALLSITEPARAM vs. (b) augment MIR to >>> recognize LEA semantics and implement an analysis >>> 2. Land above MIR support for call site parameters >>> 3. if (a), land support for introducing DBG_CALLSITEPARAM either in >>> calling convention lowering or post-ISEL >>> 4. if that isn't good enough discuss whether IR call site parameters >>> are the best solution >>> >>> let me know if that makes sense. >>> >>> Thanks, >>> adrian >>In order to use calling convention lowering at MIR pass level, for recognizing instructions that forward function arguments, we would need to implement calling convention interpreter. Only recognizing instructions and trying to see whether it is part of calling sequence, would not be enough. For example, we will not be able to properly handle cases when one 64bit argument is split on two 32bit registers. This could be handled if we know number and sizes of arguments of called function, but then we would end up calling similar process as one from ISEL phase. We can only know number and sizes of arguments for direct calls since we can find IR function declaration for it and extract such information. For indirect calls, we would not be able to perform such analysis since we cannot fetch function’s declaration. This means that we will not be able to support indirect calls (not without some trickery). If everybody agrees with stated, this might be the technical reason to give up with MIR pass that would collect call site parameter debug info. If we are wrong with our analysis, please advise us. Otherwise, we can go with approach with introducing DBG_CALLSITEPARAM and producing it from ISEL phase (with dispatched IR part). Thanks, Nikola>>>>> >>>>>>> However, such analysis >>>>>>> might not be complete as current approach. It would not be able to >>>>>>> produce ‘DW_OP_entry_values’ in ‘DW_TAG_call_site_value’ expression of >>>>>>> call site parameter as a late pass. >>>>>>> >>>>>>> As example think of callee function that forwards its argument in >>>>>>> function call in entry block and never uses that argument again: >>>>>>> %vreg = COPY $rsi; -> >>>>>>> …. <no use of $rsi nor %vreg> -> …<no use of $rsi nor %vreg> >>>>>>> $rsi = COPY $vreg; -> call foo >>>>>>> call foo -> >>>>>>> >>>>>> >>>>>> I'm not sure I can follow this example (yet). Is this about >>>>>> creating a call site parameter for an argument of the call to foo, >>>>>> or is it meant to illustrate that the call to foo() makes it hard >>>>>> to write a backwards analysis for inserting a call site parameter >>>>>> for a call site below the call to foo()? >>>>>>>> This is the case from ‘VirtRegMap’ pass, but I think it can happen >>>>>>> elsewhere. >>>>>> >>>>>> As I'm not super familiar with the various MIR passes: Is >>>>>> VirtRegMap the pass inserting the vreg copy from $rsi here? Does it >>>>>> do something else? >>>>>> >>>>> >>>>> Oh, I didn't explain it fully. In virtual register rewriter, for >>>>> previous example %vreg gets replaced with $rsi and we get two 'identity >>>>> copies' ($rsi = COPY $rsi) that get deleted. Such situation is special >>>>> for call at entry block whose argument is callee's argument that is used >>>>> only at that place. For example like: >>>>> >>>>> baa(int a) { >>>>> foo(a); >>>>> <code that does not use 'a' variable>; >>>>> } >>>>> >>>>> Variable 'a' is dead after 'foo' call and there is no interest in >>>>> preserving it in further flow of function. At that point we lose >>>>> information about parameter forwarding instruction. No instruction, no >>>>> way to interpret it. In order to track such situations we need >>>>> DBG_CALLSITEPARAM instructions to track parameter transferring register >>>>> through the backend. For situations like this we use DICallSiteParam in >>>>> order to find it in previous frame. One late pass wont do the trick. >>>>> >>>>> Call at entry block that implicitly re-forwards argument is special >>>>> situation and can possibly be handled with emitting DW_OP_entry_value >>>>> for call site parameter value expression. >>>>> >>>>> Also, it is worth mentioning that for situations where we could have >>>>> call of 'foo(a)' nested at some machine block, parameter loading >>>>> instruction can be in different block than the call of 'foo(a)'. Such >>>>> situations would not be handled so easily by late pass. >>>>> >>>>>>> Recreation of this might be possible when function ‘foo’ is >>>>>>> in current compilation module, but we are not sure if it is >>>>>>> possible for >>>>>>> external modules calls. In order to follow such cases we need >>>>>>> DBG_CALLSITEPARAM that can track such situation. >>>>>> >>>>>> What information exactly would you need about foo's implementation >>>>>> that you cannot get from just knowing the calling convention? >>>>>> >>>>> >>>>> Having in mind previous example where we have just call of fucntion foo, >>>>> we would need to know how many arguments does 'foo' have and through >>>>> which registers are they forwarded. I'm not sure how would we get such >>>>> information? >>>> >>>> I believe you would have to use the IR Function attached to the >>>> MachineFunction and the target calling convention generated from >>>> TableGen. This is what targets already implement in >>>> [Target]ISelLowering::LowerCall. >>>> >>>> Thanks, >>>> >>>> -- >>>> Francis >>>> >>>>> >>>>>>> >>>>>>> Since after ISEL phase we have explicit pseudo COPY instructions that >>>>>>> forward argument to another function frame, it came naturally to >>>>>>> recognize such instructions at this stage. There we can say with 100% >>>>>>> certainty that those instruction indeed forward function arguments. >>>>>> >>>>>> My main motivation for this discussion is to minimize the added >>>>>> complexity of the feature. For example, if we figure out that we >>>>>> can get by without introducing new IR constructs (that optimization >>>>>> authors would need to be taught about, that would need to be >>>>>> supported in all three instruction selectors) and can get by with >>>>>> only adding new MIR instructions, that would be a win. However, if >>>>>> we can prove that deferring the analysis to a later stage would >>>>>> result in inferior quality then the extra maintenance burden of new >>>>>> IR constructs may be the right tradeoff. >>>>>>> Thanks for taking the time to walk me through your thought process. >>>>>> -- adrian >>>>> >>>>> >>>>> In general we use DICallSiteParam as a backup solution (when we lose >>>>> track of location loaded int parameter forwarding register) for >>>>> representing value at entry point. As a consequence we are able to >>>>> produce 'DW_OP_entry_values' in 'DW_AT_call_site_value'(GCC generates >>>>> such expressions) which allow us to go 2 or more frames behind. I've >>>>> showed one example for calls at entry block that could also produce such >>>>> expressions without DICallSiteParam, but that is the only case that I >>>>> can think of now. But since it is a backup when something fails it could >>>>> at some point be removed once it is no longer needed as backup. >>>>> Currently it gives use more information about call site forwarding >>>>> values. >>>>> >>>>> Thanks for time! It is our common goal to make this right! >>>>> --Nikola >>