Robinson, Paul via llvm-dev
2020-Sep-15 22:07 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
Hi Adrian & Stephen, One thought here: But — not all memory locations are l-values. If we have a DWARF location list for variable "x" which points to a memory address for the first n instructions and the switches to a constant for the remainder of the scope, the memory address is not guaranteed to be an l-value, because writing the the memory address cannot affect the later part of the function where the variable is a constant. A very interesting point. I believe this is a concept that LLVM does not yet understand? that it needs to consider the l/r-value-ness of the variable throughout the function, before deciding on the form that the location-expressions will take. I would not want to spend a lot of time parsing expressions in order to make this kind of decision. And a second thought here: I suspect there may be disagreements over whether c should share an l-value location with a, since this means that a user could write to either c or a, and that doing so would assign to both of them. My personal belief is that even if it seems confusing, we shouldn't arbitrarily restrict write-access to variables on criteria that will not always be clear to a debug user; whether or not to apply such a restriction should be left to the debugger, rather than being baked into the information we produce for it. Ah — you've thought about this already :-) This is actually really important to me: My take is that we must always err on the safe side. If the compiler cannot prove that it is safe to write to an l-value (ie., that the semantics are the same as if the write were inserted into the source code at that source location) it must downgrade the l-value to an r-value. That's one thing I'm feeling really strongly about. The reason behind this is that if we allow debug info that is only maybe correct (or correct on some paths, or under some other circumstances that are opaque to the user) — if we allow potentially incorrect debug info to leak into the program, the user can not distinguish between information they can trust and information that is bogus. That means all information that they see in the debugger is potentially bogus. If the debugger is not trustworthy it's worthless. That is feedback I keep getting particularly from kernel developers, and I must say it resonates with me. I should say that Stephen and I had a discussion a while back, where I said IMO the compiler should emit debug info that correctly reflects the code as-it-is, and not arbitrarily decide that ‘c’ and/or ‘a’ should not be writeable. I believe I also said this was arguable, but that I felt pretty strongly about it. I have to say, it did not occur to me to consider emitting info that showed *neither* ‘c’ nor ‘a’ to be writeable! This is a kind of protect-the-debugger-from-the-user perspective that I’ve not encountered before. I would actually argue that showing both variables as writeable is in fact correct (truthful), because it reflects the instructions as-produced; if you modify one, you necessarily modify the other. That’s what the code as-produced does. I accept that is not what the code as-originally-written does… and that it is unlikely that the debugger would go to the trouble to notice that two variables share a storage location and caution the user about that unexpected interaction. (It’s hard enough to persuade the compiler to notice…) Given that engineering practicality, and that producing debug info that helps the user without confusing/dismaying the user is a valuable goal, then I can agree that describing both variables as r-values is more helpful. In short, I agree with your conclusion, although not for your reasons. 😊 --paulr -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/683d28a4/attachment.html>
Adrian Prantl via llvm-dev
2020-Sep-16 00:39 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> On Sep 15, 2020, at 3:07 PM, Robinson, Paul <paul.robinson at sony.com> wrote: > > Hi Adrian & Stephen, > > One thought here: > > But — not all memory locations are l-values. If we have a DWARF location list for variable "x" which points to a memory address for the first n instructions and the switches to a constant for the remainder of the scope, the memory address is not guaranteed to be an l-value, because writing the the memory address cannot affect the later part of the function where the variable is a constant. > > A very interesting point. I believe this is a concept that LLVM does not yet understand? that it needs to consider the l/r-value-ness of the variable throughout the function, before deciding on the form that the location-expressions will take. I would not want to spend a lot of time parsing expressions in order to make this kind of decision.That's right. Stephen correctly identified that we need some new concept at the LLVM IR level. My hypothesis is that DW_OP_stack_value — if applied correctly — is sufficient to represent this. Stephen is making a case that we could get out more by having a separate flag but I'm not (yet?) convinced that that is in the end-user's best interest. So if we can avoid having two flags for confusingly similar semantics I would like to go with the simpler IR. -- adrian> > And a second thought here: > > I suspect there may be disagreements over whether c should share an l-value location with a, since this means that a user could write to either c or a, and that doing so would assign to both of them. My personal belief is that even if it seems confusing, we shouldn't arbitrarily restrict write-access to variables on criteria that will not always be clear to a debug user; whether or not to apply such a restriction should be left to the debugger, rather than being baked into the information we produce for it. > > Ah — you've thought about this already :-) > > This is actually really important to me: My take is that we must always err on the safe side. If the compiler cannot prove that it is safe to write to an l-value (ie., that the semantics are the same as if the write were inserted into the source code at that source location) it must downgrade the l-value to an r-value. That's one thing I'm feeling really strongly about. > The reason behind this is that if we allow debug info that is only maybe correct (or correct on some paths, or under some other circumstances that are opaque to the user) — if we allow potentially incorrect debug info to leak into the program, the user can not distinguish between information they can trust and information that is bogus. That means all information that they see in the debugger is potentially bogus. If the debugger is not trustworthy it's worthless. That is feedback I keep getting particularly from kernel developers, and I must say it resonates with me. > > I should say that Stephen and I had a discussion a while back, where I said IMO the compiler should emit debug info that correctly reflects the code as-it-is, and not arbitrarily decide that ‘c’ and/or ‘a’ should not be writeable. I believe I also said this was arguable, but that I felt pretty strongly about it. > > I have to say, it did not occur to me to consider emitting info that showed *neither* ‘c’ nor ‘a’ to be writeable! This is a kind of protect-the-debugger-from-the-user perspective that I’ve not encountered before. I would actually argue that showing both variables as writeable is in fact correct (truthful), because it reflects the instructions as-produced; if you modify one, you necessarily modify the other. That’s what the code as-produced does. > I accept that is not what the code as-originally-written does… and that it is unlikely that the debugger would go to the trouble to notice that two variables share a storage location and caution the user about that unexpected interaction. (It’s hard enough to persuade the compiler to notice…) Given that engineering practicality, and that producing debug info that helps the user without confusing/dismaying the user is a valuable goal, then I can agree that describing both variables as r-values is more helpful. > > In short, I agree with your conclusion, although not for your reasons. 😊 > --paulr-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/8da5a5d9/attachment.html>
Tozer, Stephen via llvm-dev
2020-Sep-16 16:55 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> That makes sense, and I think for "direct" values in your definition it is true that all direct values are r-values. > Why do we need DW_OP_LLVM_direct when we already have DW_OP_LLVM_stack_value? Can you give an example of something that is definitely not a stack value, but direct?The difference in definition is the intention: DW_OP_LLVM_direct means "we'd like this to be an l-value if possible", DW_OP_stack_value means "this should never be an l-value". Because of this, an expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any case where the value of the preceding expression is equal to an l-value. So for example: DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) => DW_OP_reg7 RSP DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+0 DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value Your point about the semantics of variable assignments in the debugger is useful, that clears up my misunderstandings. I believe that even with that in mind, LLVM_direct (or whatever name it takes) would be appropriate. If we recognize that a variable must be read-only to preserve those semantics, then we can use DW_OP_stack_value to ensure that it is always an r-value. If we don't have any reason to make a variable read-only other than that we can't *currently* find an l-value location for it, then we would use DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex expression, but that doesn't need to be the case. The code below is an example program where we may eventually be able to generate a valid l-value for the variable "a" in foo(), but can't without an alternative to DW_OP_stack_value. At the end of the example, "a" is an r-value, but doesn't need to be: there is a single register that holds its exact value, and an assignment to that register would have the same semantics as an equivalent assignment to "a" in the source. The optimizations taking place in this code are analogous to if we had "a = bar() + 4 - 4;", but because we don't figure out that "a = bar()" in a single pass, we pre-emptively assume that "a" must be an r-value. To be able to emit an l-value we would first need the ability to optimize/simplify DIExpressions so that the expression becomes just (DW_OP_stack_value) - this wouldn't be particularly difficult to implement for simple arithmetic. Even with this improvement, the definition of DW_OP_stack_value explicitly forbids the expression from being a register location. If we instead used DW_OP_LLVM_direct, then we would be free to emit the register location (DW_OP_reg0 RAX). // Compile with clang -O2 -g int baz(); int bar2(int arg) { return arg * 4; } int bar() { return bar2(1); } int foo() { int a = baz() + bar() - 4; return a * 2; } ; Eventually becomes the IR... %call = call i32 @_Z3bazv(), !dbg !25 %call1 = call i32 @_Z3barv(), !dbg !26 %add = add nsw i32 %call, %call1, !dbg !27 %sub = sub nsw i32 %add, 4, !dbg !28 call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata !DIExpression()), !dbg !29 %mul = mul nsw i32 %sub, 2, !dbg !30 ret i32 %mul, !dbg !31 ; Combine redundant instructions, "a" is salvaged... %call = call i32 @_Z3bazv(), !dbg !25 %call1 = call i32 @_Z3barv(), !dbg !26 %add = add nsw i32 %call, %call1, !dbg !27 call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28 %sub = shl i32 %add, 1, !dbg !29 %mul = add i32 %sub, -8, !dbg !29 ret i32 %mul, !dbg !30 ; bar() is found to always return 4 %call = call i32 @_Z3bazv(), !dbg !14 %add = add nsw i32 %call, 4, !dbg !15 call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16 %sub = shl i32 %add, 1, !dbg !17 %mul = add i32 %sub, -8, !dbg !17 ret i32 %mul, !dbg !18 ; %add is unused, optimize out and salvage... %call = call i32 @_Z3bazv(), !dbg !24 call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata !DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !25 %add = shl i32 %call, 1, !dbg !26 ret i32 %add, !dbg !27 ; Final DWARF location for "a": DW_AT_location (0x00000000: [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu 0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/1d35bbd8/attachment-0001.html>