On Fri, Sep 8, 2017 at 9:00 AM Adrian Prantl <aprantl at apple.com> wrote:> > > Eventually, some tests will inevitably need to Windows or Posix > specific, so you're going to have to have all this extra stuff (the new > substitutions, the different command lines, the custom output formats, > etc. So I think something like this provides maximal encouragement of > sharing whenever possible (since you can almost always share source code), > while still allowing each format to test real input and real output. > > I understand the desire to allow for Windows-specific tests and I think it > would be good to add them to the repository in a windows subdirectory. > > Looking at the example you posted, the two variants are so structurally > similar that I believe it would be a better to come up with a common > abstraction from a readability / maintenance effort perspective. Basically, > the only thing that the RUN lines do is compile and link executables from > source code using the default target and run the test_debuginfo command. I > think it would be better to define a new command substitution > %clang-compile-link(?) in LIT that has different implementations on windows > and posix. The set of debugger commands used by the tests is so tiny that > it should not be a lot of work to implement a wrapper for the windows > debugger (it took me about a day to write the python wrapper for LLDB > including learning how to use the Python API) and it should also be > possible to either do a sed-style massaging of the output or relax the > CHECKs to work with both formats. > > I really want to avoid duplicating the debugger commands and checks, and I > also want to maintain the ability to put the commands and CHECKs into the > source code, since this makes the tests much easier to understand. Using a > common abstraction will save us a lot of time in the long run, make > maintenance and adding new tests cheaper, and won't prevent you from also > having windows-specific tests that may use an expanded vocabulary. > > What do you think? > -- adrian >I understand the desire to keep them as similar as possible, but I'm still not really sold that massaging fickle text output into a different text format is going to make things more scalable. I'd like there to be as few layers of text processing as possible. If someone files a bug report that includes a WinDbg command log, I'd like to be able to paste those statements into a test. I also expect that on Windows we will end up having far more debug info tests than other platforms, specifically because we don't have the ability to write tests against the debugger (as it's proprietary / closed source). So the language used in the current set of debug info tests is very simple, because GDB and LLDB already have test suites that test more complicated things. But the problem is, we don't have those other test suites to fall back on, so we will need much more. For example, we may end up wanting a test that exercises custom debug visualizers, or a test that a certain proprietary debugger feature works, or a test that builtin debug visualizers of STL types work. To write a check for the latter, you need to know the layout of the type, which depends on the standard library implementation, so it's already going to be different. If all we're doing is printing an integer, then I agree we can write a common test. But I don't think this is going to be the case outside of 1 or 2 trivial tests. There's also the issue that we may want to test entirely different things. For example, in the hypothetical example I posted earlier, we compile once and link twice with 2 different linkers. But we might even want to compile twice and link twice (compile same program with cl and clang-cl, then link both with lld). The fundamental difference here is that we have two different things that can emit debug info - the compiler and linker - and we need the flexibility to test both independently of each other. On a posixy platform, you only care the compiler and don't care what the linker is. On the other hand, it has its own set of unique aspects. You might decide to compile and link many times, so that you can test -gsplit-dwarf, -gdebug-info-kind=limited, -gdebug-info-kind=full, -gdb-index, etc. against a single program. I don't see a useful abstraction that glosses over these differences that isn't a ton of work for minimal gain, given the frequency with which we'd need to fall back to a custom test anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170908/2a9c5966/attachment.html>
Let me say up front that I sympathize deeply with the problem; debug info is an interface, and it is frequently unclear whether the goal of some bit of work is to test the producer or test the consumer of the interface. In fact we end up using the producer to test the consumer, and (in the case at hand) using the consumer to test the producer. There are distinct analogies to testing compilers by seeing what the linker thinks, and testing linkers by seeing whether they can handle what the compiler produces.> I understand the desire to keep them as similar as possible, but I'm > still not really sold that massaging fickle text output into a different > text format is going to make things more scalable. I'd like there to be > as few layers of text processing as possible.If the text output is fickle, then I'd think hiding the fickleness behind a wrapper we control would be preferable to updating dozens of tests when something changes. Or if a debugger changes its presentation in version N+1, but people are still running tests with version N, persuading the wrapper to handle both would be less overall work than making every test accommodate both formats.> If someone files a bug > report that includes a WinDbg command log, I'd like to be able to paste > those statements into a test.That sounds like your goal is to turn the bug report into a Windows-only test, and not a common test. Is that actually what you want? If you still want it to be a common test, then you still need to do the work to write the non-Windows side, and make sure it's actually still exercising the same thing; not clear you are saving anything by being able to copy-paste a WinDbg report.> I also expect that on Windows we will end up having far more debug info > tests than other platforms, specifically because we don't have the > ability to write tests against the debugger (as it's proprietary / > closed source).I don't see how that follows. Sony runs the GDB test suite using clang as the compiler, and while that is certainly perverting a debugger test suite into being a compiler test suite, it has value in being a body of tests that exercise a variety of debug-info features. GDB being open source is completely irrelevant to this use-case. We treat it as closed. We have local changes to the test suite, but not to GDB. The expected results from the suite are based on GDB+GCC, which we treat as an oracle; then we don't bother with tests of debugger features that clearly don't depend on debug info, such as thread handling. Whatever CodeView/PDB tests you want to write, you can use MS tools as your oracle. Maybe you can't leverage an existing test suite, but it doesn't mean you can't write tests.> So the language used in the current set of debug info tests is very > simple, because GDB and LLDB already have test suites that test more > complicated things. But the problem is, we don't have those other test > suites to fall back on, so we will need much more.This is an argument in favor of a completely separate WinDbg-based executable test suite, rather than pumping up debuginfo-tests.> There's also the issue that we may want to test entirely different > things. For example, in the hypothetical example I posted earlier, we > compile once and link twice with 2 different linkers. But we might even > want to compile twice and link twice (compile same program with cl and > clang-cl, then link both with lld). The fundamental difference here is > that we have two different things that can emit debug info - the compiler > and linker - and we need the flexibility to test both independently of > each other.Iterating over many combinations is a distinct testing problem. It helps to have the test suite designed to handle this up front. My experience is that different combinations will have slightly different pass/fail results and you need to be ready for that as well.> I don't see a useful abstraction that glosses over these differences > that isn't a ton of work for minimal gain, given the frequency with > which we'd need to fall back to a custom test anyway.As I mentioned above, you seem to be heading in the direction of a completely separate project, rather than being able to usefully leverage anything from debuginfo-tests other than the basic idea. --paulr
On Fri, Sep 8, 2017 at 10:46 AM Robinson, Paul <paul.robinson at sony.com> wrote:> Let me say up front that I sympathize deeply with the problem; debug > info is an interface, and it is frequently unclear whether the goal of > some bit of work is to test the producer or test the consumer of the > interface. In fact we end up using the producer to test the consumer, > and (in the case at hand) using the consumer to test the producer. > There are distinct analogies to testing compilers by seeing what the > linker thinks, and testing linkers by seeing whether they can handle > what the compiler produces. > > > I understand the desire to keep them as similar as possible, but I'm > > still not really sold that massaging fickle text output into a different > > text format is going to make things more scalable. I'd like there to be > > as few layers of text processing as possible. > > If the text output is fickle, then I'd think hiding the fickleness behind > a wrapper we control would be preferable to updating dozens of tests when > something changes. Or if a debugger changes its presentation in version > N+1, but people are still running tests with version N, persuading the > wrapper to handle both would be less overall work than making every test > accommodate both formats. >But that's just my point. There are clearly going to be tests where both formats don't even make sense because it's testing something specific to one debugger. What if I want to test that we output correct exception information, so I send a .exr command to the debugger and get back this: 0:000> .exr -1ExceptionAddress: 77a6db8b (ntdll!LdrpDoDebuggerBreak+0x0000002b) ExceptionCode: 80000003 (Break instruction exception) ExceptionFlags: 00000000NumberParameters: 1 Parameter[0]: 00000000 What if I want to test that that the debugger can print a valid stack trace, so I send a kv command and get back this? # ChildEBP RetAddr Args to Child *00* 0198fa4c 77a2f5ca 55fe0b87 00000000 00000000 ntdll!LdrpDoDebuggerBreak+0x2b (FPO: [Non-Fpo])*01* 0198fc8c 77a18a42 55fe0bef 00000000 00000000 ntdll!LdrpInitializeProcess+0x1967 (FPO: [Non-Fpo])*02* 0198fce4 77a1886c 00000000 bad81aba 00000000 ntdll!_LdrpInitialize+0x180 (FPO: [Non-Fpo])*03* 0198fcf4 00000000 0198fd08 779c0000 00000000 ntdll!LdrInitializeThunk+0x1c (FPO: [Non-Fpo]) Whereas GDB would print something like #0 m4_traceon (obs=0x24eb0, argc=1, argv=0x2b8c8) at builtin.c:993 #1 0x6e38 in expand_macro (sym=0x2b600) at macro.c:242 #2 0x6840 in expand_token (obs=0x0, t=177664, td=0xf7fffb08) at macro.c:71 (More stack frames follow...) I really don't want to get in the business of trying to convert the first format into the second format. Not only is it a recipe for disaster, but it leads to worse diagnostics. When my CHECK statement fails, I can't even see the original stack trace anymore, only a generic error message like "could not parse stack trace"> > I also expect that on Windows we will end up having far more debug info > > tests than other platforms, specifically because we don't have the > > ability to write tests against the debugger (as it's proprietary / > > closed source). > > I don't see how that follows. Sony runs the GDB test suite using clang as > the compiler, and while that is certainly perverting a debugger test suite > into being a compiler test suite, it has value in being a body of tests > that exercise a variety of debug-info features. GDB being open source is > completely irrelevant to this use-case. We treat it as closed. We have > local changes to the test suite, but not to GDB. The expected results > from the suite are based on GDB+GCC, which we treat as an oracle; then we > don't bother with tests of debugger features that clearly don't depend on > debug info, such as thread handling. >GDB being open source is very relevant to this case, because it means you *have* GDB's test suite. We don't have WinDbg or Visual Studio debugger's test suite.> > Whatever CodeView/PDB tests you want to write, you can use MS tools as > your oracle. Maybe you can't leverage an existing test suite, but it > doesn't mean you can't write tests. >Right, it just means we will end up writing plenty of tests that test specific features of the debugger, something that would normally be handled in a debugger test suite, which we don't have.> > > I don't see a useful abstraction that glosses over these differences > > that isn't a ton of work for minimal gain, given the frequency with > > which we'd need to fall back to a custom test anyway. > > As I mentioned above, you seem to be heading in the direction of a > completely separate project, rather than being able to usefully > leverage anything from debuginfo-tests other than the basic idea. > --paulr > >I don't entirely disagree with this assessment. On the other hand, I don't see any reason to call it something other than "debuginfo-tests" or to put it somewhere else, since conceptually both things are the same. Even in this case though, reusing the source code of the tests seems like a clear win since the high level ideas behind a test case often transcend consumer boundaries, even when the implementation doesn't. Plus, there is more to be gained from sharing than just the tests themselves. For example, I'm trying to get debuginfo-tests working properly with CMake in more idiomatic LLVM style. If I go off and fork the tests into an entirely separate project, we wouldn't have that shared benefit. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170908/e8802d26/attachment.html>