David Blaikie via llvm-dev
2020-Jul-21 23:25 UTC
[llvm-dev] DW_OP_implicit_pointer design/implementation in general
Realized I didn't document the original reviews that motivated this thread: A stack of reviews, split off from here: https://reviews.llvm.org/D69787 Alok's posted a new patch (with smaller patches split off from the monolithic one) here: https://reviews.llvm.org/D84112 I haven't had a chance to page in all the old context, nor look at the new ones in detail yet. But probably worth keeping high level design review here, I think? Once the general direction seems good, we can go into the separate review threads for the implementation/mechanical details. On Tue, Jan 14, 2020 at 8:47 AM David Blaikie <dblaikie at gmail.com> wrote:> > > > On Mon, Jan 13, 2020 at 8:49 PM Alok Sharma <aloksharma.knit at gmail.com> wrote: >> >> Hi, >> >> Let me consolidate what we discussed with my opinion. >> >> * On the point of new intrinsic llvm.dbg.derefval: >> It (new intrinsic) was more a neater way than a needed way. The whole functionality can go ahead without it and using llvm.dbg.value instead. Though I liked it (new intrinsic), since most of us are against it, it should be fine for me to drop it. >> This is because the transformation was like >> llvm.dbg.value -> DBG_VALUE -> DWARF location-list >> llvm.dbg.derefval -> DBG_VALUE -> DWARF location-list >> Since it was just for better readability in LLVM IR only later it (new intrinsic) was sharing the same path with llvm.dbg.value. So it should be fine to drop it without any impact in later functionality. >> - On question of identify such cases we can anyway identify using the type of expression (DW_OP_implicit_pointer). >> - On question of ( DW_OP_implicit_pointer ) fitting to dbg.value intrinsic it perfectly does that as value in such case is metadata and prototype of dbg.value is dbg.value(metadata, metadata, metadata). >> So it should be fine to drop it and back to where it was started before introduction of new intrinsic. >> >> * On the point of handing of pointer pointing to temporary / unnamed variables (Lets call it Scope S1) >> As two proposed patches are there for bringing pointers referring to temporary / unnamed variable >> A) first patch uses (new proposed operator) DW_OP_LLVM_explicit_pointer(both in LLVM-IR and DWARF) >> B) Second patch uses artificial variable representing temporary (both in LLVM-IR and DWARF) >> https://reviews.llvm.org/D72055 [DebugInfo] Support for DW_OP_implicit_pointer (for temp references & dynamic allocations) >> If I understood David correctly, he wants LLVM-IR look like patch-A and DWARF look like patch-B (lets call it way C) >> Since patch-A is not desired because we don't support anything beyond DWARF-5 and patch proposes new DWARF operator. I want to clarify that patch-B can exist even without new intrinsic and can use dbg.value and fits perfectly in existing LLVM-IR template. if only reason to go way-C is to >> I would like to go way-B or way-C for the scope of unnamed variables. >> >> * For the cases when pointer points to named variable (Lets call it Scope S2): >> I would update the patches with replacing dbg.derefval to dbg.value and using DW_OP_implicit_pointer (to named variable) in both LLVM-IR and DWARF. >> >> In summary, >> Scope S1 can be solved with >> way-B) DW_OP_implicit_pointer with artificial variable and with intrinsic dbg.value in LLVM-IR and DWARF >> or >> way-C) DW_OP_LLVM_explicit_pointer with intrinsic dbg.value in LLVM-IR + DW_OP_implicit_pointer with artificial variable in DWARF >> >> Scope S2 can be solved with >> DW_OP_implicit_pointer with actual named variable with dbg.value in LLVM-IR and DWARF > > > Based on my current understanding, I'd rather not do this ^ it seems like added complexity to LLVM optimizations/middle end/IR representation with limited value. I think doing (C) above but using it to cover both S1 and S2 (ie: not special casing "pointing to an existing variable" - instead treating that the same as "pointing to an unnamed entity") initially, and then, maybe later, evaluating whether referencing named variables from dbg.value is worth the added benefit, would be the right way to go. > >> >> >> Regards, >> Alok >> >> >> Though >> >> >> >> lets >> >> On Tue, Jan 14, 2020 at 1:28 AM Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote: >>> >>> Hi, >>> >>> David wrote: >>> > Are you referring to the possibility of implicit values to refer to other variables? I'm sort of interested in maybe not doing that - and only implementing a more general form (what's been talked about with the LLVM_implicit_value (or was it LLVM_explicit_value? I forget)) - and synthesizing artificial variables in the backend rather than trying to track which variable a pointer points to. I think this would keep the impact on optimizations smaller & would be more general. >>> >>> Adrian wrote: >>> > If it were possible to synthesize it in AsmPrinter, would that remove the motivation for the new intrinsic for you? >>> >>> Ah, yeah, those changes would avoid any need for a new intrinsic to my >>> mind, and sounds much more palatable. Thanks for explaining. >>> >>> -- >>> Thanks, >>> Jeremy
Jeremy Morse via llvm-dev
2020-Jul-30 11:59 UTC
[llvm-dev] DW_OP_implicit_pointer design/implementation in general
Hi, I've taken a look at the patches (thanks Alok) and will submit comments in a bit, David wrote:> I haven't had a chance to page in all the old context, nor look at the > new ones in detail yet. But probably worth keeping high level design > review here, I think? Once the general direction seems good, we can go > into the separate review threads for the implementation/mechanical > details.I think the latest patch series matches what came out of the discussion above, as you described it:> I would expect this to be handled with a general OP saying "hey, I'm > skipping one level of indirection indirection in the resulting value, > because that indirection is missing/not in the final program" and that this > would be encoded in a llvm.dbg.value/DIExpression as usual, without the > need for new IR intrinsics, though possibly with the need for an LLVM > extension DWARF OP (DW_OP_LLVM_explicit_pointer?)That's what's been implemented, whenever an alloca is promoted, variable locations that used the allocas address are transformed into promoted-value variable locations in the usual way, but with a DW_OP_LLVM_explicit_pointer at the front of the expression to indicate "the pointer is absent, but this is what it would have pointed at". Simple case: i32 *%foo = alloca i32 dbg.declare(%foo, !123, !DIExpression()) dbg.value(%foo, !456, !DIExpression()) store i32 0, i32 *%foo Where !123 is a plain i32 source variable, and !456 a pointer-to-i32 source variable. When %foo is promoted, these would become: dbg.value(i32 0, !123, !DIExpression()) dbg.value(i32 0, !456, !DIExpression(DW_OP_LLVM_explicit_pointer)) When it comes to the IR way of modelling these things, I think that this matches the discussion, and is a lightweight way of representing what's going on. I have some reservations about further down the compiler though: artificial variables get created at isel time, which seems early to me, and duplicates the work for each instruction selector. Is there a reason why it can't be done in the DWARF emitter? The artificial variables are also tracked with additional DBG_VALUE instructions, if we could push artificial variable creation back to emission time then we wouldn't have to answer questions such as "what is the lifetime of a DBG_VALUE of an artificial variable?" At promotion time: some of the handling of variable promotion appears to happen within Instruction::eraseFromParent, which seems out of place. I reckon you've missed the calls in PromoteMemoryToRegister.cpp to the ConvertDebugDeclareToDebugValue helpers -- shifting the promotion handling there would be better, and not dependent on the order that things are erased in. I think those ConvertDebug... helper functions and the two other functions you've instrumented in the same file should be sufficient to catch all promotions. Additionally, I believe that promoted allocas are getting DW_OP_LLVM_explicit_pointer dbg.values generated for any pointer that _ever_ points at it. You'll need to consider circumstances where pointer variables have multiple values, i.e.: int foo, bar, baz; int *qux = &foo; qux = &bar; qux = &baz; foo = 1; bar = 2; baz = 3; If I understood the code correctly, 'qux' will have implicit-pointer values for each of the assignments to foo / bar / baz, where it should only have a dbg.value for the assignment to 'baz'. (It might be alright to limit handling to scenarios where a pointer variable only ever has one value, and then expand what can be handled later). -- Thanks, Jeremy
David Blaikie via llvm-dev
2020-Sep-24 02:56 UTC
[llvm-dev] DW_OP_implicit_pointer design/implementation in general
[+CC some folks interested in optimized debug info variable locations] On Thu, Jul 30, 2020 at 5:00 AM Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote:> > Hi, > > I've taken a look at the patches (thanks Alok) and will submit > comments in a bit,Thanks for that! Looks like you've hit a lot of the usual/important bits, for sure.> > David wrote: > > I haven't had a chance to page in all the old context, nor look at the > > new ones in detail yet. But probably worth keeping high level design > > review here, I think? Once the general direction seems good, we can go > > into the separate review threads for the implementation/mechanical > > details. > > I think the latest patch series matches what came out of the > discussion above, as you described it: > > > I would expect this to be handled with a general OP saying "hey, I'm > > skipping one level of indirection indirection in the resulting value, > > because that indirection is missing/not in the final program" and that this > > would be encoded in a llvm.dbg.value/DIExpression as usual, without the > > need for new IR intrinsics, though possibly with the need for an LLVM > > extension DWARF OP (DW_OP_LLVM_explicit_pointer?) > > That's what's been implemented, whenever an alloca is promoted, > variable locations that used the allocas address are transformed into > promoted-value variable locations in the usual way, but with a > DW_OP_LLVM_explicit_pointer at the front of the expression to indicate > "the pointer is absent, but this is what it would have pointed at". > Simple case: > > i32 *%foo = alloca i32 > dbg.declare(%foo, !123, !DIExpression()) > dbg.value(%foo, !456, !DIExpression()) > store i32 0, i32 *%foo > > Where !123 is a plain i32 source variable, and !456 a pointer-to-i32 > source variable. When %foo is promoted, these would become: > > dbg.value(i32 0, !123, !DIExpression()) > dbg.value(i32 0, !456, !DIExpression(DW_OP_LLVM_explicit_pointer)) > > When it comes to the IR way of modelling these things, I think that > this matches the discussion, and is a lightweight way of representing > what's going on.Awesome. I noticed there's also implicit_pointer as well as explicit_pointer - what's the distinction there and are the two concepts related enough to have some shared implementation concerns/merit shared review? Also, it looked like (at a glance) the LLVM_explicit (or implicit?) pointer thing took an integer parameter, which I think is the amount of indirection (so an int** collapsed down to an int would have a value of '2' for this int parameter?) - minor discussion about whether that's worthwhile, or whether we should stack LLVM_explicit_pointers on top of each other to represent this case?> I have some reservations about further down the compiler though: > artificial variables get created at isel time, which seems early to > me, and duplicates the work for each instruction selector.Is this consistent with how we do dynamic array bounds (void f1(int i) { int x[i]; }), which I think is one of the things that we modeled this idea from? Or perhaps it is done for array dimensions but isn't suitable to this new use case as much?> Is there a > reason why it can't be done in the DWARF emitter? The artificial > variables are also tracked with additional DBG_VALUE instructions, if > we could push artificial variable creation back to emission time then > we wouldn't have to answer questions such as "what is the lifetime of > a DBG_VALUE of an artificial variable?"This/these sort of questions I'd like to punt to Adrian and other folks who have had more investment in optimized debug info locations in the last couple of years... I hope that isn't a cop-out, it's just these particular aspects are probably not ones I have the most context on.> At promotion time: some of the handling of variable promotion appears > to happen within Instruction::eraseFromParent, which seems out of > place. I reckon you've missed the calls in PromoteMemoryToRegister.cpp > to the ConvertDebugDeclareToDebugValue helpers -- shifting the > promotion handling there would be better, and not dependent on the > order that things are erased in. I think those ConvertDebug... helper > functions and the two other functions you've instrumented in the same > file should be sufficient to catch all promotions. > > Additionally, I believe that promoted allocas are getting > DW_OP_LLVM_explicit_pointer dbg.values generated for any pointer that > _ever_ points at it. You'll need to consider circumstances where > pointer variables have multiple values, i.e.: > > int foo, bar, baz; > int *qux = &foo; > qux = &bar; > qux = &baz; > foo = 1; > bar = 2; > baz = 3; > > If I understood the code correctly, 'qux' will have implicit-pointer > values for each of the assignments to foo / bar / baz, where it should > only have a dbg.value for the assignment to 'baz'. (It might be > alright to limit handling to scenarios where a pointer variable only > ever has one value, and then expand what can be handled later).Sounds like something to be aware of/see some test cases for, for sure. Much thanks! - Dave