Keno Fischer via llvm-dev
2016-Jan-04 20:11 UTC
[llvm-dev] Proposal for multi location debug info support in LLVM IR
Thanks for your comments. Replies inline.> The DWARF 5 standard says that > "Address range entries in a range list may not overlap.” > > The reasoning behind this is presumably that if a variable is in more than > one > location at a point all the values need to be identical, or the > information is uselessOh huh, for some reason I was under the impression that they could. No matter, all we would have to do then is choose one in the backend. I think it makes sense to maintain the notion of separate multiple locations until then.> > > > - To add a location with the same value for the same variable, you > pass the > > token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first > argument > > E.g. to add another location for the variable above: > > > > %second = call token @llvm.dbg.value(token %first, metadata > %val2, > > metadata !var, metadata > !expr2) > > Does this invalidate the first location, or does this add an additional > location > to the set of locations for var at this point? If I want to add a third > location, > which token do I pass in? Can you explain a bit more what information the > token > allows us to express that is currently not possible? >It adds a second location. If you want to add a third location you pass in the first token again. Thus the first call (key call) indicates a change of values, and all locations that have the same value should use the key call's token.> > > > - To indicate that a location will no longer hold a value, you do the > > following: > > > > call token @llvm.dbg.value(token %second, metadata token undef, > > metadata !var, metadata !()) > > > > - The current set of locations for a variable at a given instruction > are all > > those llvm.dbg.value instructions that dominate this location ( > > equivalently all those llvm.dbg.value calls whose token you could > use at > > that location without upsetting the Verifier), except that if more > than > > one key call is dominating, only the most recent one and all calls > > associated to it by first argument count. > > > > I think that should encapsulate the semantics, but here are some > consequences > > of and comments on the above that I think would be useful to discuss: > > > > - The upgrade path for existing IR is very simple and just consists > of > > adding token undef as the first argument to any call in the IR. > > > > - In general, if a value gets removed by an optimization, the > corresponding > > llvm.dbg.value call can be removed, unless that call is a key > call, in > > which case the value should be undefed out. This is necessary both > to be > > able to keep it around as the first argument to the other calls, > and more > > importantly to mark the end point of a previous set of locations. > > So if %val is optimized out in the following example: > > %first = call token @llvm.dbg.value(token undef, metadata %val, > metadata !var, metadata !expr) > ... > %second = call token @llvm.dbg.value(token %first, metadata %val2, > metadata !var, metadata !expr2) > > Does this turns into: > > call token @llvm.dbg.value(token undef, metadata %undef, > metadata !var, metadata !expr) > %second = call token @llvm.dbg.value(token %undef, metadata %val2, > metadata !var, metadata !expr2) > > Or do we still have a %first token, or does the key call get removed > entirely, because > the second one is now a key call? >I think the situation is the following: If %second is the only use of %first, we can do that optimization. If not and %second dominates all uses of first, we could also do this optimization and replace all uses of %first with %second. However, we cannot remove the actual first key call, because it denotes the end location for the previous value of the same variable. Two exceptions I could think of are if %first is the first call for that variable in the function (as then there can not be a previous range to terminate) or if there are no other calls or memory operations in between %first and %second, in which case we could hoist %second up and merge the two calls. Does that make sense?> > > > - I think llvm.dbg.declare can be deprecated and it's uses replaced > by > > llvm.dbg.value with an DW_OP_deref. That would also clarify the > semantics > > of the operation which have caused some confusion in the past. > > I think we could already remove it today without any loss of generality (by > lifting any dbg.value whose first argument is an alloca into the MMI > table). > What I see this proposal adding is a way to mark the end of a range, which > is important when a value is on the stack only for part of the function (as > in the stack coloring example).Agreed!> > > > - We may want to add an extra pass that does debug info inference > (some of > > which is done in InstCombine right now) > > What kind of inference does InstCombine do currently?I was thinking of replacing llvm.dbg.declare by appropriate llvm.dbg.value at each load/store. In the new design that would essentially be an inference pass which would add those as locations, with the original one only removed if the alloca actually gets lifted into registers.> > > > Here are some of the invariants, the verifier would enforce (included in > the > > hope that they can clarify anything in the above): > > > > 1. If the first argument is not token undef, then > > a. If the second argument is not token undef, > > I. the first argument must be a call to llvm.dbg.value whose > first > > argument is token undef > > b. If the second argument is token undef > > II. the first argument must be a call to llvm.dbg.value > whose second > > argument is not token undef > > III. the expression argument must be empty > > c. In either case, the variable described must be the same as > the one > > described by the call that is the first argument. > > d. There may not be another call to llvm.dbg.value with token > undef > > that dominates this instruction, is not the one passed as the > first > > argument and is dominated by the one passed as the first > argument. > > 2. All other invariants regarding calls to llvm.dbg.value carry over > > unchanged > > > > > -- adrian-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160104/15079400/attachment.html>
Robinson, Paul via llvm-dev
2016-Jan-04 20:45 UTC
[llvm-dev] Proposal for multi location debug info support in LLVM IR
Address ranges in a location list may overlap (section 2.6.2). The entries in a location list are not a range list (which is defined by section 2.17). --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Keno Fischer via llvm-dev Sent: Monday, January 04, 2016 12:11 PM To: Adrian Prantl Cc: llvm-dev Subject: Re: [llvm-dev] Proposal for multi location debug info support in LLVM IR Thanks for your comments. Replies inline. The DWARF 5 standard says that "Address range entries in a range list may not overlap.” The reasoning behind this is presumably that if a variable is in more than one location at a point all the values need to be identical, or the information is useless Oh huh, for some reason I was under the impression that they could. No matter, all we would have to do then is choose one in the backend. I think it makes sense to maintain the notion of separate multiple locations until then.> > - To add a location with the same value for the same variable, you pass the > token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first argument > E.g. to add another location for the variable above: > > %second = call token @llvm.dbg.value(token %first, metadata %val2, > metadata !var, metadata !expr2)Does this invalidate the first location, or does this add an additional location to the set of locations for var at this point? If I want to add a third location, which token do I pass in? Can you explain a bit more what information the token allows us to express that is currently not possible? It adds a second location. If you want to add a third location you pass in the first token again. Thus the first call (key call) indicates a change of values, and all locations that have the same value should use the key call's token.> > - To indicate that a location will no longer hold a value, you do the > following: > > call token @llvm.dbg.value(token %second, metadata token undef, > metadata !var, metadata !()) > > - The current set of locations for a variable at a given instruction are all > those llvm.dbg.value instructions that dominate this location ( > equivalently all those llvm.dbg.value calls whose token you could use at > that location without upsetting the Verifier), except that if more than > one key call is dominating, only the most recent one and all calls > associated to it by first argument count. > > I think that should encapsulate the semantics, but here are some consequences > of and comments on the above that I think would be useful to discuss: > > - The upgrade path for existing IR is very simple and just consists of > adding token undef as the first argument to any call in the IR. > > - In general, if a value gets removed by an optimization, the corresponding > llvm.dbg.value call can be removed, unless that call is a key call, in > which case the value should be undefed out. This is necessary both to be > able to keep it around as the first argument to the other calls, and more > importantly to mark the end point of a previous set of locations.So if %val is optimized out in the following example: %first = call token @llvm.dbg.value(token undef, metadata %val, metadata !var, metadata !expr) ... %second = call token @llvm.dbg.value(token %first, metadata %val2, metadata !var, metadata !expr2) Does this turns into: call token @llvm.dbg.value(token undef, metadata %undef, metadata !var, metadata !expr) %second = call token @llvm.dbg.value(token %undef, metadata %val2, metadata !var, metadata !expr2) Or do we still have a %first token, or does the key call get removed entirely, because the second one is now a key call? I think the situation is the following: If %second is the only use of %first, we can do that optimization. If not and %second dominates all uses of first, we could also do this optimization and replace all uses of %first with %second. However, we cannot remove the actual first key call, because it denotes the end location for the previous value of the same variable. Two exceptions I could think of are if %first is the first call for that variable in the function (as then there can not be a previous range to terminate) or if there are no other calls or memory operations in between %first and %second, in which case we could hoist %second up and merge the two calls. Does that make sense?> > - I think llvm.dbg.declare can be deprecated and it's uses replaced by > llvm.dbg.value with an DW_OP_deref. That would also clarify the semantics > of the operation which have caused some confusion in the past.I think we could already remove it today without any loss of generality (by lifting any dbg.value whose first argument is an alloca into the MMI table). What I see this proposal adding is a way to mark the end of a range, which is important when a value is on the stack only for part of the function (as in the stack coloring example). Agreed!> > - We may want to add an extra pass that does debug info inference (some of > which is done in InstCombine right now)What kind of inference does InstCombine do currently? I was thinking of replacing llvm.dbg.declare by appropriate llvm.dbg.value at each load/store. In the new design that would essentially be an inference pass which would add those as locations, with the original one only removed if the alloca actually gets lifted into registers.> > Here are some of the invariants, the verifier would enforce (included in the > hope that they can clarify anything in the above): > > 1. If the first argument is not token undef, then > a. If the second argument is not token undef, > I. the first argument must be a call to llvm.dbg.value whose first > argument is token undef > b. If the second argument is token undef > II. the first argument must be a call to llvm.dbg.value whose second > argument is not token undef > III. the expression argument must be empty > c. In either case, the variable described must be the same as the one > described by the call that is the first argument. > d. There may not be another call to llvm.dbg.value with token undef > that dominates this instruction, is not the one passed as the first > argument and is dominated by the one passed as the first argument. > 2. All other invariants regarding calls to llvm.dbg.value carry over > unchanged >-- adrian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160104/477d13cd/attachment.html>
Keno Fischer via llvm-dev
2016-Jan-04 20:54 UTC
[llvm-dev] Proposal for multi location debug info support in LLVM IR
Thanks for the clarification. That was my recollection as well, but I didn't know whether that was changed or not. On Mon, Jan 4, 2016 at 9:45 PM, Robinson, Paul < Paul_Robinson at playstation.sony.com> wrote:> Address ranges in a location list may overlap (section 2.6.2). The > entries in a location list are not a range list (which is defined by > section 2.17). > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Keno > Fischer via llvm-dev > *Sent:* Monday, January 04, 2016 12:11 PM > *To:* Adrian Prantl > *Cc:* llvm-dev > *Subject:* Re: [llvm-dev] Proposal for multi location debug info support > in LLVM IR > > > > Thanks for your comments. Replies inline. > > > > The DWARF 5 standard says that > "Address range entries in a range list may not overlap.” > > The reasoning behind this is presumably that if a variable is in more than > one > location at a point all the values need to be identical, or the > information is useless > > > > Oh huh, for some reason I was under the impression that they could. No > matter, all we would have to do then is choose one in the backend. I think > it makes sense to maintain the notion of separate multiple locations until > then. > > > > > > > - To add a location with the same value for the same variable, you > pass the > > token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first > argument > > E.g. to add another location for the variable above: > > > > %second = call token @llvm.dbg.value(token %first, metadata > %val2, > > metadata !var, metadata > !expr2) > > Does this invalidate the first location, or does this add an additional > location > to the set of locations for var at this point? If I want to add a third > location, > which token do I pass in? Can you explain a bit more what information the > token > allows us to express that is currently not possible? > > > > It adds a second location. If you want to add a third location you pass in > the first token again. > > Thus the first call (key call) indicates a change of values, and all > locations that have the same value should use the key call's token. > > > > > > > - To indicate that a location will no longer hold a value, you do the > > following: > > > > call token @llvm.dbg.value(token %second, metadata token undef, > > metadata !var, metadata !()) > > > > - The current set of locations for a variable at a given instruction > are all > > those llvm.dbg.value instructions that dominate this location ( > > equivalently all those llvm.dbg.value calls whose token you could > use at > > that location without upsetting the Verifier), except that if more > than > > one key call is dominating, only the most recent one and all calls > > associated to it by first argument count. > > > > I think that should encapsulate the semantics, but here are some > consequences > > of and comments on the above that I think would be useful to discuss: > > > > - The upgrade path for existing IR is very simple and just consists > of > > adding token undef as the first argument to any call in the IR. > > > > - In general, if a value gets removed by an optimization, the > corresponding > > llvm.dbg.value call can be removed, unless that call is a key > call, in > > which case the value should be undefed out. This is necessary both > to be > > able to keep it around as the first argument to the other calls, > and more > > importantly to mark the end point of a previous set of locations. > > So if %val is optimized out in the following example: > > %first = call token @llvm.dbg.value(token undef, metadata %val, > metadata !var, metadata !expr) > ... > %second = call token @llvm.dbg.value(token %first, metadata %val2, > metadata !var, metadata !expr2) > > Does this turns into: > > call token @llvm.dbg.value(token undef, metadata %undef, > metadata !var, metadata !expr) > %second = call token @llvm.dbg.value(token %undef, metadata %val2, > metadata !var, metadata !expr2) > > Or do we still have a %first token, or does the key call get removed > entirely, because > the second one is now a key call? > > > > I think the situation is the following: > > If %second is the only use of %first, we can do that optimization. If not > and %second dominates all uses of first, we could also do this optimization > and replace all uses of %first with %second. However, we cannot remove the > actual first key call, because it denotes the end location for the previous > value of the same variable. Two exceptions I could think of are if %first > is the first call for that variable in the function (as then there can not > be a previous range to terminate) or if there are no other calls or memory > operations in between %first and %second, in which case we could hoist > %second up and merge the two calls. Does that make sense? > > > > > > > - I think llvm.dbg.declare can be deprecated and it's uses replaced > by > > llvm.dbg.value with an DW_OP_deref. That would also clarify the > semantics > > of the operation which have caused some confusion in the past. > > I think we could already remove it today without any loss of generality (by > lifting any dbg.value whose first argument is an alloca into the MMI > table). > What I see this proposal adding is a way to mark the end of a range, which > is important when a value is on the stack only for part of the function (as > in the stack coloring example). > > > > Agreed! > > > > > > > - We may want to add an extra pass that does debug info inference > (some of > > which is done in InstCombine right now) > > What kind of inference does InstCombine do currently? > > > > I was thinking of replacing llvm.dbg.declare by appropriate llvm.dbg.value > at each load/store. > > In the new design that would essentially be an inference pass which would > add those as > > locations, with the original one only removed if the alloca actually gets > lifted into registers. > > > > > > > Here are some of the invariants, the verifier would enforce (included in > the > > hope that they can clarify anything in the above): > > > > 1. If the first argument is not token undef, then > > a. If the second argument is not token undef, > > I. the first argument must be a call to llvm.dbg.value whose > first > > argument is token undef > > b. If the second argument is token undef > > II. the first argument must be a call to llvm.dbg.value > whose second > > argument is not token undef > > III. the expression argument must be empty > > c. In either case, the variable described must be the same as > the one > > described by the call that is the first argument. > > d. There may not be another call to llvm.dbg.value with token > undef > > that dominates this instruction, is not the one passed as the > first > > argument and is dominated by the one passed as the first > argument. > > 2. All other invariants regarding calls to llvm.dbg.value carry over > > unchanged > > > > -- adrian > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160104/2a1ab494/attachment-0001.html>
Adrian Prantl via llvm-dev
2016-Jan-05 17:59 UTC
[llvm-dev] Proposal for multi location debug info support in LLVM IR
Thanks for the clarification, Paul! Keno, just a few more questions for my understanding:> - Indicating that a value changed at source level (e.g. because an > assignment occurred)This is done by a key call.> - Indicating that the same value is now available in a new locationAdditional, alternative locations with identical contents are added by passing in the token from a key call.> - Indicating that a value is no longer available in some locationThis is done by another key call (possibly with an %undef location).> > > > > > - To add a location with the same value for the same variable, you > > pass the > > > token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first > > argument > > > E.g. to add another location for the variable above: > > > > > > %second =3D call token @llvm.dbg.value(token %first, metadata > > %val2, > > > metadata !var, metadata > > !expr2) > > > > Does this invalidate the first location, or does this add an additional > > location > > to the set of locations for var at this point? If I want to add a third > > location, > > which token do I pass in? Can you explain a bit more what information the > > token > > allows us to express that is currently not possible? > > > > It adds a second location. If you want to add a third location you pass in > the first token again. > Thus the first call (key call) indicates a change of values, and all > locations that have the same value should use the key call's token. >Ok. Looks like this is going to be somewhat verbose for partial updates of SROA’ed aggregates as in the following example: // struct s { int i, j }; // void foo(struct s) { s.j = 0; ... } define void @foo(i32 %i, i32 %j) { %token = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32))) call llvm.dbg.value(token %token, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32))) ... ; have to repeat %i here: %tok2 = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32))) call llvm.dbg.value(token %tok2, metadata i32 0, !Struct, !DIExpression(DW_OP_bit_piece(32, 32))) On the upside, having all this information explicit could simplify the code in DwarfDebug::buildLocationList(). Is there any information in the tokens that could not be recovered by a static analysis of the debug intrinsics? Note that having redundant information available explicitly is not necessarily a bad thing. The one difference I noticed so far is that alternative locations allow earlier locations to outlive locations that are dominated by them: %loc = dbg.value(%undef, var, ...) ... %alt = dbg.value(%loc, var, ...) ... ; alt becomes unavailable ... ; %loc is still available here. Any other advantages that I missed? -- adrian
Keno Fischer via llvm-dev
2016-Jan-05 18:37 UTC
[llvm-dev] Proposal for multi location debug info support in LLVM IR
On Tue, Jan 5, 2016 at 6:59 PM, Adrian Prantl <aprantl at apple.com> wrote:> Thanks for the clarification, Paul! > Keno, just a few more questions for my understanding: > > > - Indicating that a value changed at source level (e.g. because an > > assignment occurred) > > This is done by a key call.Correct> > - Indicating that the same value is now available in a new location > > Additional, alternative locations with identical contents are added by > passing in the token from a key call.Correct> > - Indicating that a value is no longer available in some location > > This is done by another key call (possibly with an %undef location).Not quite. Another key call could be used if all locations are now invalid. However, to just remove a single value, I was proposing ; This is the key call %first = call token @llvm.dbg.value(token undef, %someloc, metadata !var, metadata !()) ; This adds a location %second = call token @llvm.dbg.value(token %second, %someotherloc, metadata !var, metadata !()) ; This removes the (%second) location %third = call token @llvm.dbg.value(token %second, metadata token undef, metadata !var, metadata !()) Thus, to remove a location you always pass in the token of the call that added the location. This is also the reason why I'm requiring the second argument to be `token undef` because no valid location can be of type token, and I wanted to avoid the situation in which a location gets replaced by undef everywhere, accidentally turning into a removal of the location specified by the key call.> > > > > > > > - To add a location with the same value for the same variable, > you > > > pass the > > > > token of the FIRST llvm.dbg.value, as this llvm.dbg.value's > first > > > argument > > > > E.g. to add another location for the variable above: > > > > > > > > %second =3D call token @llvm.dbg.value(token %first, metadata > > > %val2, > > > > metadata !var, metadata > > > !expr2) > > > > > > Does this invalidate the first location, or does this add an additional > > > location > > > to the set of locations for var at this point? If I want to add a third > > > location, > > > which token do I pass in? Can you explain a bit more what information > the > > > token > > > allows us to express that is currently not possible? > > > > > > > It adds a second location. If you want to add a third location you pass > in > > the first token again. > > Thus the first call (key call) indicates a change of values, and all > > locations that have the same value should use the key call's token. > > > > Ok. Looks like this is going to be somewhat verbose for partial updates of > SROA’ed aggregates as in the following example: > > // struct s { int i, j }; > // void foo(struct s) { s.j = 0; ... } > > define void @foo(i32 %i, i32 %j) { > %token = call llvm.dbg.value(token %undef, %i, !Struct, > !DIExpression(DW_OP_bit_piece(0, 32))) > call llvm.dbg.value(token %token, %j, !Struct, > !DIExpression(DW_OP_bit_piece(32, 32))) > ... > > ; have to repeat %i here: > %tok2 = call llvm.dbg.value(token %undef, %i, !Struct, > !DIExpression(DW_OP_bit_piece(0, 32))) > call llvm.dbg.value(token %tok2, metadata i32 0, !Struct, > !DIExpression(DW_OP_bit_piece(32, 32))) > > On the upside, having all this information explicit could simplify the > code in DwarfDebug::buildLocationList(). >Yeah, this is true. We could potentially extend the semantics by allowing separate key calls for pieces, i.e. %token = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32))) call llvm.dbg.value(token undef, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32))) ; This now only invalidates the .j part %tok2 = call llvm.dbg.value(token %undef, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32))) In that case we would probably have to require that all DW_OP_bit_pieces in non-key-call expressions are a subrange of those in the associated key call. Is there any information in the tokens that could not be recovered by a> static analysis of the debug intrinsics? > Note that having redundant information available explicitly is not > necessarily a bad thing. >I am not entirely sure what you are proposing. You somehow need to be able to encode which dbg.values invalidate previous locations and which do not. Since we're describing front-end variables this will generally depend on front-end semantics, so I'm not sure what a generic analysis pass can do here without requiring language-specific analysis.> The one difference I noticed so far is that alternative locations allow > earlier locations to outlive locations that are dominated by them: > %loc = dbg.value(%undef, var, ...) > ... > %alt = dbg.value(%loc, var, ...) > ... > ; alt becomes unavailable > ... > ; %loc is still available here. > > Any other advantages that I missed? > > -- adrian-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160105/50ffcc7b/attachment.html>
Possibly Parallel Threads
- Proposal for multi location debug info support in LLVM IR
- Proposal for multi location debug info support in LLVM IR
- Proposal for multi location debug info support in LLVM IR
- [LLVMdev] Debug info for lazy variables triggers SROA assertion
- RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value