Reid Kleckner via llvm-dev
2017-Sep-07 00:37 UTC
[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
On Wed, Sep 6, 2017 at 5:01 PM, David Blaikie <dblaikie at gmail.com> wrote:> On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <rnk at google.com> wrote: > >> On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >>> I guess you described this already, but talking it through for >>> myself/maybe others will find this useful: >>> >>> So since we don't have DW_OP_regN for LLVM registers, we could sort of >>> assume the implicit first value on the stack is a pseudo-OP_regN of the >>> LLVM SSA register. >>> >> >> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the >> dbg.value and DBG_VALUE instructions are a register-like value that gets >> pushed onto the expression stack. The DWARF asmprinter does some expression >> folding to clean things up, but that's the model. >> >> >>> To support that, all existing uses would need no changes to match the >>> DWARF model of registers being implicitly direct values. >>> >>> Code that wanted to describe the register as containing the memory >>> address of the interesting thing would use DW_OP_stack_value to say "this >>> location description that is a register is really an address you should >>> follow to find the value, not a direct value itself"? >>> >>> But code that wanted to describe a variable as being 3 bytes ahead of a >>> pointer in an LLVM SSA register would only have "plus 3" in the expression >>> stack, since then it's no longer a direct value but is treated as a pointer >>> to the value. I guess this is where the ambiguity would come in - currently >>> how does "plus 3" get interpreted when seen in LLVM IR, I guess that's >>> meant to describe reg value + 3 as being the immediate value of the >>> variable? (so it's implicitly OP_stack_value? & OP_stack_value is added >>> somewhere in the DWARF backend?) >>> >> >> Our model today is inconsistent. >> > > Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's > some mismatch between the semantics, though I'm still having trouble > wrapping my head around it. >I mean LLVM's model is internally inconsistent. We have bugs like the RVO one that you filed (https://llvm.org/pr34513), where we forget if the debug value is an address or a value. In LLVM IR today, the SSA value of the dbg.value *is* the interesting>> value, it is not the address, and we typically use empty DIExpressions. If >> the value is ultimately register allocated and the DIExpression is empty, >> we will emit a DW_OP_regN location expression. If the value is spilled, we >> usually don't need to append DW_OP_stack_value because the location is now >> a memory location, which can be described by DW_OP_[f]breg. >> >> Today, passes that want to add "plus 3" to a DIExpression go out of their >> way to add DW_OP_stack_value to the DIExpression because the backend won't >> do it for us, even though dbg.value normally describes the value, not an >> address. >> >> To explore the alternative DW_OP_stack_value model, here's how I'd go >> about it: >> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the >> semantic change clear. It can express both an address or a value, depending >> on the DIExpression. >> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value >> to the DIExpression argument of the intrinsic. >> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression >> alone. The LHS of llvm.dbg.declare is already the address of the variable. >> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect >> DBG_VALUES are now expressed with a DIExpression that lacks >> DW_OP_stack_value at the end. >> 5. Teach our DWARF expression emitter to combine the new expressions as >> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs >> with physical register operands. They just use DW_OP_regN, which is >> implicitly a value location. >> 6. Teach all passes that spill virtual registers used by DBG_VALUE to >> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as >> appropriate. >> >> This should be equivalent to DW_OP_LLVM_memory, and more inline with >> DWARF location expression semantics, but it has a large migration cost. >> >> --- >> >> I think part of the reason I wanted to move in the DW_OP_LLVM_memory >> direction is that I originally wanted to add a memory offset operand to it. >> Our actual use cases for complex DWARF expressions typically come from >> things like safestack, ASan, and blocks. What these all have in common is >> that they gather up a number of variables and move them off into a struct >> in heap memory. This is very similar to what happens when we spill a >> virtual register: instead of describing a register, we modify the >> expression to load the value from some FP register with an offset. I think >> the right representation for these transforms is basically a "chain of >> loads". >> > > Don't think I've got any mental model of what you mean by this phrase > ('chain of loads') - could you provide an example or the like? >Suppose you have a captured variable with __block shared storage, and then suppose you compile it with ASan and safestack, and then the safestack pointer is spilled. To compute the value, the debugger starts from a register, goes to an offset, and loads a pointer, repeating the process until it finds the value. As we proceed through codegen, we effectively build up the chain. 1. To implement __block in Clang, we use dbg.declare(%block_descriptor, DW_OP_deref, DW_OP_constu_plus $offset) 2. Assuming the block descriptor lived in an alloca (which it doesn't, but assume it does for argument's sake), asan will move that alloca onto the heap to implement use-after-return detection. It will prepend "DW_OP_deref, DW_OP_constu, $offset" to the DIExpression. 3. If ASan put its value in an alloca and safestack wanted to move that alloca to the safe stack (again bear with me), it would do the same: prepend deref+offset. 4. Finally, spilling the safe stack pointer to the control stack would mean prepending deref+offset. This seems like a really common pattern. Right now this offsetting and loading has to be expressed as separate location expression opcodes. The DW_OP_deref opcode functions like a load sequencing operation that can only appear between two offsets, although an offset could be zero, in which case there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a representation where the offset and the deref are one. Think "semicolon as sequencing operator" vs. "semicolon as statement terminator". When we use DW_OP_LLVM_memory or DW_OP_deref, the variable must always live in memory, we're always doing address calculation. We should try to make our representation more closely match the set of things we actually want to do and support. That's kind of the gist of what I had in mind. I didn't think it was worth it, which is why I pared the proposal down to just "the opposite of DW_OP_stack_value".> I was imagining that DW_OP_LLVM_memory with an offset would be that load >> chain link. >> >> The idea behind this representation is that it should make it easy for >> spilling transforms to prepend a load chain onto the expression, rather >> than them having to individually discover if DW_OP_deref is needed, or call >> some common helper like DIExpression::prepend. It should always be valid to >> push on a new load with an offset. >> > > When would that not be valid today/without LLVM_memory? Sorry, again - > it's all a bit fuzzy in my head. > > There'd be some canonicalization opportunities, but not seeing the > correctness issues with being able to prepend onto the location list - > seems like that might be true with LLVM_memory too... maybe? >The correctness issue with today's prepending of offsets and deref is that it's hard to know when to insert deref, because we don't know if an expression describes an address or a value. We have code like this in buildDbgValueForSpill: // If the DBG_VALUE already was a memory location, add an extra // DW_OP_deref. Otherwise just turning this from a register into a // memory/indirect location is sufficient. if (IsIndirect) Expr = DIExpression::prepend(Expr, DIExpression::WithDeref); We modify DBG_VALUEs for spills in several other places in codegen and they don't all correctly insert DW_OP_deref. The load chain representation should make it easy to just modify the offset on the front or add a new load depending on what's being done.> It also has the advantage that it will be easier to translate to CodeView >> than arbitrary DWARF expressions, which we are currently canonicalizing >> into a load chain and then attempting to emit. >> > > *nod* my worry is ending up with 3 different representations - DWARF, > CodeView, and the increasingly divergent IRDWARF (especially since it's > "sort of like DWARF" which makes the few divergences more costly/difficult). > > >> Does that make sense? I'm starting to feel like I should either pursue >> the more ambitious load chain design, >> > > What would that look like? >Just `DW_OP_LLVM_memory, 8, DW_OP_LLVM_memory, 20, ...` in DIExpression through IR. It's OK to insert more DWARF opcodes between the links, it's just non-canonical if they are pointer offsetting opcodes that could be folded into the memory opcode. The DWARF expression backend would fold it into the same location expressions we have today.> or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names >> welcome). >> > > Would have to think some more - maybe there's a way to avoid the rename? > But yeah, don't have a problem with llvm.dbg.loc - as you say/implied, it'd > match the new semantics better. >I don't think so. =/ I think the rename is the only safe way to maintain bitcode compatibility.> But really, your original proposal's probably OK/close enough to go ahead. > I don't feel that strongly. >Makes sense. That's basically where I ended up, but now I'm reconsidering the dbg.loc+DW_OP_stack_value thing, to bring DIExpressions closer to DWARF. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/aa0795a7/attachment.html>
David Blaikie via llvm-dev
2017-Sep-07 16:46 UTC
[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
On Wed, Sep 6, 2017 at 5:37 PM Reid Kleckner <rnk at google.com> wrote:> On Wed, Sep 6, 2017 at 5:01 PM, David Blaikie <dblaikie at gmail.com> wrote: > >> On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <rnk at google.com> wrote: >> > On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com> wrote: >>> >>>> I guess you described this already, but talking it through for >>>> myself/maybe others will find this useful: >>>> >>>> So since we don't have DW_OP_regN for LLVM registers, we could sort of >>>> assume the implicit first value on the stack is a pseudo-OP_regN of the >>>> LLVM SSA register. >>>> >>> >>> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the >>> dbg.value and DBG_VALUE instructions are a register-like value that gets >>> pushed onto the expression stack. The DWARF asmprinter does some expression >>> folding to clean things up, but that's the model. >>> >>> >>>> To support that, all existing uses would need no changes to match the >>>> DWARF model of registers being implicitly direct values. >>>> >>>> Code that wanted to describe the register as containing the memory >>>> address of the interesting thing would use DW_OP_stack_value to say "this >>>> location description that is a register is really an address you should >>>> follow to find the value, not a direct value itself"? >>>> >>>> But code that wanted to describe a variable as being 3 bytes ahead of a >>>> pointer in an LLVM SSA register would only have "plus 3" in the expression >>>> stack, since then it's no longer a direct value but is treated as a pointer >>>> to the value. I guess this is where the ambiguity would come in - currently >>>> how does "plus 3" get interpreted when seen in LLVM IR, I guess that's >>>> meant to describe reg value + 3 as being the immediate value of the >>>> variable? (so it's implicitly OP_stack_value? & OP_stack_value is added >>>> somewhere in the DWARF backend?) >>>> >>> >>> Our model today is inconsistent. >>> >> >> Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's >> some mismatch between the semantics, though I'm still having trouble >> wrapping my head around it. >> > > I mean LLVM's model is internally inconsistent. We have bugs like the RVO > one that you filed (https://llvm.org/pr34513), where we forget if the > debug value is an address or a value. >Feels to me like bugs rather than inconsistencies (I'd think of an inconsistency as "we do X over here intentionally and Y over here intentionally but they're in contradiction to one another")> > In LLVM IR today, the SSA value of the dbg.value *is* the interesting >>> value, it is not the address, and we typically use empty DIExpressions. If >>> the value is ultimately register allocated and the DIExpression is empty, >>> we will emit a DW_OP_regN location expression. If the value is spilled, we >>> usually don't need to append DW_OP_stack_value because the location is now >>> a memory location, which can be described by DW_OP_[f]breg. >>> >>> Today, passes that want to add "plus 3" to a DIExpression go out of >>> their way to add DW_OP_stack_value to the DIExpression because the backend >>> won't do it for us, even though dbg.value normally describes the value, not >>> an address. >>> >>> To explore the alternative DW_OP_stack_value model, here's how I'd go >>> about it: >>> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the >>> semantic change clear. It can express both an address or a value, depending >>> on the DIExpression. >>> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value >>> to the DIExpression argument of the intrinsic. >>> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression >>> alone. The LHS of llvm.dbg.declare is already the address of the variable. >>> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect >>> DBG_VALUES are now expressed with a DIExpression that lacks >>> DW_OP_stack_value at the end. >>> 5. Teach our DWARF expression emitter to combine the new expressions as >>> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs >>> with physical register operands. They just use DW_OP_regN, which is >>> implicitly a value location. >>> 6. Teach all passes that spill virtual registers used by DBG_VALUE to >>> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as >>> appropriate. >>> >>> This should be equivalent to DW_OP_LLVM_memory, and more inline with >>> DWARF location expression semantics, but it has a large migration cost. >>> >>> --- >>> >>> I think part of the reason I wanted to move in the DW_OP_LLVM_memory >>> direction is that I originally wanted to add a memory offset operand to it. >>> Our actual use cases for complex DWARF expressions typically come from >>> things like safestack, ASan, and blocks. What these all have in common is >>> that they gather up a number of variables and move them off into a struct >>> in heap memory. This is very similar to what happens when we spill a >>> virtual register: instead of describing a register, we modify the >>> expression to load the value from some FP register with an offset. I think >>> the right representation for these transforms is basically a "chain of >>> loads". >>> >> >> Don't think I've got any mental model of what you mean by this phrase >> ('chain of loads') - could you provide an example or the like? >> > > Suppose you have a captured variable with __block shared storage, and then > suppose you compile it with ASan and safestack, and then the safestack > pointer is spilled. To compute the value, the debugger starts from a > register, goes to an offset, and loads a pointer, repeating the process > until it finds the value. As we proceed through codegen, we effectively > build up the chain. > > 1. To implement __block in Clang, we use dbg.declare(%block_descriptor, > DW_OP_deref, DW_OP_constu_plus $offset) >Guessing it's the other way (constu_plus(offset), deref)? But in any case...> 2. Assuming the block descriptor lived in an alloca (which it doesn't, but > assume it does for argument's sake), asan will move that alloca onto the > heap to implement use-after-return detection. It will prepend "DW_OP_deref, > DW_OP_constu, $offset" to the DIExpression. > 3. If ASan put its value in an alloca and safestack wanted to move that > alloca to the safe stack (again bear with me), it would do the same: > prepend deref+offset. > 4. Finally, spilling the safe stack pointer to the control stack would > mean prepending deref+offset. > > This seems like a really common pattern. Right now this offsetting and > loading has to be expressed as separate location expression opcodes. The > DW_OP_deref opcode functions like a load sequencing operation that can only > appear between two offsets, although an offset could be zero, in which case > there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a > representation where the offset and the deref are one. >Seems like a size optimization more than anything? Which I could get behind if there is data to support the optimization as being significantly valuable (& ideally I'd probably favor the general representation first, etc - but I understand if the special-cased representation has other side benefits (like reducing the cost to add support, etc) it might be valuable, but trades off with the "more special cases/non-DWARFian things in our pseudo-DWARF IR")> Think "semicolon as sequencing operator" vs. "semicolon as statement > terminator". When we use DW_OP_LLVM_memory or DW_OP_deref, the variable > must always live in memory, we're always doing address calculation. We > should try to make our representation more closely match the set of things > we actually want to do and support. > > That's kind of the gist of what I had in mind. I didn't think it was worth > it, which is why I pared the proposal down to just "the opposite of > DW_OP_stack_value". > > >> I was imagining that DW_OP_LLVM_memory with an offset would be that load >>> chain link. >>> >>> The idea behind this representation is that it should make it easy for >>> spilling transforms to prepend a load chain onto the expression, rather >>> than them having to individually discover if DW_OP_deref is needed, or call >>> some common helper like DIExpression::prepend. It should always be valid to >>> push on a new load with an offset. >>> >> >> When would that not be valid today/without LLVM_memory? Sorry, again - >> it's all a bit fuzzy in my head. >> >> There'd be some canonicalization opportunities, but not seeing the >> correctness issues with being able to prepend onto the location list - >> seems like that might be true with LLVM_memory too... maybe? >> > > The correctness issue with today's prepending of offsets and deref is that > it's hard to know when to insert deref, because we don't know if an > expression describes an address or a value. We have code like this in > buildDbgValueForSpill: > // If the DBG_VALUE already was a memory location, add an extra > // DW_OP_deref. Otherwise just turning this from a register into a > // memory/indirect location is sufficient. > if (IsIndirect) > Expr = DIExpression::prepend(Expr, DIExpression::WithDeref); >Sure - this goes back to my adding the "indirect" flag to support C++ non-trivial pass by value in C++, before we had all the more general expression support, etc. (hilariously, what was there before was even more awesome: we encoded the type of the parameter in the DWARF as T& instead of T... literally changing the signature of the function... ) Either LLVM_memory or stack_value approaches would remove the prepending issues & I agree getting rid of the ad-hoc/separate 'indirect' flag is good, just haggling over how best to do that.> We modify DBG_VALUEs for spills in several other places in codegen and > they don't all correctly insert DW_OP_deref. The load chain representation > should make it easy to just modify the offset on the front or add a new > load depending on what's being done. >But I /think/ a stack_value based approach would allow that too, right? I'm probably missing something.> > >> It also has the advantage that it will be easier to translate to CodeView >>> than arbitrary DWARF expressions, which we are currently canonicalizing >>> into a load chain and then attempting to emit. >>> >> >> *nod* my worry is ending up with 3 different representations - DWARF, >> CodeView, and the increasingly divergent IRDWARF (especially since it's >> "sort of like DWARF" which makes the few divergences more costly/difficult). >> >> >>> Does that make sense? I'm starting to feel like I should either pursue >>> the more ambitious load chain design, >>> >> >> What would that look like? >> > > Just `DW_OP_LLVM_memory, 8, DW_OP_LLVM_memory, 20, ...` in DIExpression > through IR. It's OK to insert more DWARF opcodes between the links, it's > just non-canonical if they are pointer offsetting opcodes that could be > folded into the memory opcode. The DWARF expression backend would fold it > into the same location expressions we have today. > > >> or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative >>> names welcome). >>> >> >> Would have to think some more - maybe there's a way to avoid the rename? >> But yeah, don't have a problem with llvm.dbg.loc - as you say/implied, it'd >> match the new semantics better. >> > > I don't think so. =/ I think the rename is the only safe way to maintain > bitcode compatibility. > > >> But really, your original proposal's probably OK/close enough to go >> ahead. I don't feel that strongly. >> > > Makes sense. That's basically where I ended up, but now I'm reconsidering > the dbg.loc+DW_OP_stack_value thing, to bring DIExpressions closer to DWARF. >*nod* I'm not sure either way, maybe need to come back to a summary of this again after going through these discussions. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/6dae4e2a/attachment.html>
Reid Kleckner via llvm-dev
2017-Sep-07 17:10 UTC
[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
On Thu, Sep 7, 2017 at 9:46 AM, David Blaikie <dblaikie at gmail.com> wrote:> Feels to me like bugs rather than inconsistencies (I'd think of an > inconsistency as "we do X over here intentionally and Y over here > intentionally but they're in contradiction to one another") >The DBG_VALUE MachineInstr actually already has a way to indicate that the computed value is an address, so I can accept that the backend design is logically consistent with some bugs. The middle-end, though, needs a new design for optimized debug info for variables that live in memory. Right now LowerDbgDeclare creates dbg.value instructions that use the *address* of a variable, rather than its value, when it encounters a call that directly uses the variable's alloca. This is inconsistent, and we need a way to say "address" not "value". I would like whatever we do between the backend and the middle end to be somewhat consistent. In LLVM IR today, the SSA value of the dbg.value *is* the interesting>>>> value, it is not the address, and we typically use empty DIExpressions. If >>>> the value is ultimately register allocated and the DIExpression is empty, >>>> we will emit a DW_OP_regN location expression. If the value is spilled, we >>>> usually don't need to append DW_OP_stack_value because the location is now >>>> a memory location, which can be described by DW_OP_[f]breg. >>>> >>>> Today, passes that want to add "plus 3" to a DIExpression go out of >>>> their way to add DW_OP_stack_value to the DIExpression because the backend >>>> won't do it for us, even though dbg.value normally describes the value, not >>>> an address. >>>> >>>> To explore the alternative DW_OP_stack_value model, here's how I'd go >>>> about it: >>>> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the >>>> semantic change clear. It can express both an address or a value, depending >>>> on the DIExpression. >>>> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append >>>> DW_OP_stack_value to the DIExpression argument of the intrinsic. >>>> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the >>>> DIExpression alone. The LHS of llvm.dbg.declare is already the address of >>>> the variable. >>>> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect >>>> DBG_VALUES are now expressed with a DIExpression that lacks >>>> DW_OP_stack_value at the end. >>>> 5. Teach our DWARF expression emitter to combine the new expressions as >>>> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs >>>> with physical register operands. They just use DW_OP_regN, which is >>>> implicitly a value location. >>>> 6. Teach all passes that spill virtual registers used by DBG_VALUE to >>>> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as >>>> appropriate. >>>> >>>> This should be equivalent to DW_OP_LLVM_memory, and more inline with >>>> DWARF location expression semantics, but it has a large migration cost. >>>> >>>> --- >>>> >>>> I think part of the reason I wanted to move in the DW_OP_LLVM_memory >>>> direction is that I originally wanted to add a memory offset operand to it. >>>> Our actual use cases for complex DWARF expressions typically come from >>>> things like safestack, ASan, and blocks. What these all have in common is >>>> that they gather up a number of variables and move them off into a struct >>>> in heap memory. This is very similar to what happens when we spill a >>>> virtual register: instead of describing a register, we modify the >>>> expression to load the value from some FP register with an offset. I think >>>> the right representation for these transforms is basically a "chain of >>>> loads". >>>> >>> >>> Don't think I've got any mental model of what you mean by this phrase >>> ('chain of loads') - could you provide an example or the like? >>> >> >> Suppose you have a captured variable with __block shared storage, and >> then suppose you compile it with ASan and safestack, and then the safestack >> pointer is spilled. To compute the value, the debugger starts from a >> register, goes to an offset, and loads a pointer, repeating the process >> until it finds the value. As we proceed through codegen, we effectively >> build up the chain. >> >> 1. To implement __block in Clang, we use dbg.declare(%block_descriptor, >> DW_OP_deref, DW_OP_constu_plus $offset) >> > > Guessing it's the other way (constu_plus(offset), deref)? But in any > case... >We'd have a deref if %block_descriptor was actually an alloca containing a pointer, which is what I'm assuming for this example. Deref loads the pointer, then we go to some offset, which is the final address. This is a dbg.declare call, so this should compute an address.> 2. Assuming the block descriptor lived in an alloca (which it doesn't, but >> assume it does for argument's sake), asan will move that alloca onto the >> heap to implement use-after-return detection. It will prepend "DW_OP_deref, >> DW_OP_constu, $offset" to the DIExpression. >> 3. If ASan put its value in an alloca and safestack wanted to move that >> alloca to the safe stack (again bear with me), it would do the same: >> prepend deref+offset. >> 4. Finally, spilling the safe stack pointer to the control stack would >> mean prepending deref+offset. >> >> This seems like a really common pattern. Right now this offsetting and >> loading has to be expressed as separate location expression opcodes. The >> DW_OP_deref opcode functions like a load sequencing operation that can only >> appear between two offsets, although an offset could be zero, in which case >> there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a >> representation where the offset and the deref are one. >> > > Seems like a size optimization more than anything? Which I could get > behind if there is data to support the optimization as being significantly > valuable (& ideally I'd probably favor the general representation first, > etc - but I understand if the special-cased representation has other side > benefits (like reducing the cost to add support, etc) it might be valuable, > but trades off with the "more special cases/non-DWARFian things in our > pseudo-DWARF IR") >It's supposed to be a canonicalization, a representation shift designed to make it easier to think about what it is that we're actually doing. We could replace LLVM IR GEP with something lower-level like ADD and MUL, but we have the high-level thing because it makes it easier for compiler writers to think about it. If this design isn't helping you understand what's happening, though, it's probably not useful. I'm coming at it from the angle of: DWARF location expressions seem terribly general and unstructured. Pattern matching them to codeview is hard. If we limited the representation to only represent the kinds of expressions we care about, it would be easier to translate and manipulate them. Anyway, I'm pretty convinced we aren't going to pursue this design. Think "semicolon as sequencing operator" vs. "semicolon as statement>> terminator". When we use DW_OP_LLVM_memory or DW_OP_deref, the variable >> must always live in memory, we're always doing address calculation. We >> should try to make our representation more closely match the set of things >> we actually want to do and support. >> >> That's kind of the gist of what I had in mind. I didn't think it was >> worth it, which is why I pared the proposal down to just "the opposite of >> DW_OP_stack_value". >> >> >>> I was imagining that DW_OP_LLVM_memory with an offset would be that load >>>> chain link. >>>> >>>> The idea behind this representation is that it should make it easy for >>>> spilling transforms to prepend a load chain onto the expression, rather >>>> than them having to individually discover if DW_OP_deref is needed, or call >>>> some common helper like DIExpression::prepend. It should always be valid to >>>> push on a new load with an offset. >>>> >>> >>> When would that not be valid today/without LLVM_memory? Sorry, again - >>> it's all a bit fuzzy in my head. >>> >>> There'd be some canonicalization opportunities, but not seeing the >>> correctness issues with being able to prepend onto the location list - >>> seems like that might be true with LLVM_memory too... maybe? >>> >> >> The correctness issue with today's prepending of offsets and deref is >> that it's hard to know when to insert deref, because we don't know if an >> expression describes an address or a value. We have code like this in >> buildDbgValueForSpill: >> // If the DBG_VALUE already was a memory location, add an extra >> // DW_OP_deref. Otherwise just turning this from a register into a >> // memory/indirect location is sufficient. >> if (IsIndirect) >> Expr = DIExpression::prepend(Expr, DIExpression::WithDeref); >> > > Sure - this goes back to my adding the "indirect" flag to support C++ > non-trivial pass by value in C++, before we had all the more general > expression support, etc. (hilariously, what was there before was even more > awesome: we encoded the type of the parameter in the DWARF as T& instead of > T... literally changing the signature of the function... ) >Awesome, Bob just did the same thing for CodeView for NRVO in r312034, because we don't have DW_OP_deref.> Either LLVM_memory or stack_value approaches would remove the prepending > issues & I agree getting rid of the ad-hoc/separate 'indirect' flag is > good, just haggling over how best to do that. > > >> We modify DBG_VALUEs for spills in several other places in codegen and >> they don't all correctly insert DW_OP_deref. The load chain representation >> should make it easy to just modify the offset on the front or add a new >> load depending on what's being done. >> > > But I /think/ a stack_value based approach would allow that too, right? > I'm probably missing something. >These designs should be equivalent in power, the difference is supposed to be in how easy they are to implement. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/1a6ee5f9/attachment.html>
Maybe Matching Threads
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value