On 7/1/20 2:06 PM, Michael Kruse wrote:> Am Mi., 1. Juli 2020 um 11:37 Uhr schrieb Hal Finkel <hfinkel at anl.gov>: >> We can have different printing modes. There can be a more-human-friendly mode and a more-FileCheck-friendly mode. Or modes customized for different kinds of tests. I agree, however, that this does not solve the fragility problems with CHECK-NOT. > This would be similar to git's porcelain and plumbing modes. However, > even with git which had this from the beginning, often scripts use > porcelain output.Fair point. In this case, however, we control all of the relevant scripts. We can have a policy about which modes can be used in regression tests and which are designed only for human consumption. We can mark them as such.> > Another example are commands such as that change output on whether > stdout is a terminal or a pipe. However, I find such distinction > between modes more confusing than helpful.I find that annoying :-)> > >> That's the interesting question... it does add to the maintenance burden. However, having textual outputs are also quite convenient when debugging things (because I can change the input and see the output quickly, without needing to create and compile another program). Obviously, at some point, this becomes ridiculous. How much is too much? I don't know. But it's also not clear to me that we're at that point yet. We could add more textual analysis outputs and still have that be a net benefit in many places. >> >> In cases where the requisite output would just be too specific, we do have unit tests. Should we just add more? Maybe we're too happy to add lit tests instead of unit tests for some of these cases. > The RFC is not about replacing all uses of FileCheck, there are > certainly cases where it is straightforward, simple and robust, but > for some things it would be nice to have another tool in the toolbox. > The more workarounds, FileCheck features, test generators, etc are > needed to author appropriate tests, the more I get the impression > FileCheck is the wrong tool.FileCheck on raw IR is not the right approach in many cases. I think we all agree about that. The question is: If our go-to solution in such cases is to introduce an analysis pass with a textual output that FileCheck can process, is that bad? Are there cases where such a pass would not be applicable to multiple tests? Are there cases where such a pass would not be useful for humans during development? Cases don't particularly come to mind where this would be true, but I'm definitely interested in what everyone else thinks. When I teach my compilers class, I tell my students to liberally add the ability to serialize to interpretable text all of their internal data structures. It will seem like extra work at first, but when they're trying to debug things later, it will be really helpful. I think this is a key lesson that I, at least, have learned from LLVM. It makes us all more productive in the end (in part because we often spend much more time debugging our code than writing it in the first place). Firing up an actual debugger is slow and (despite our best efforts) fragile, changing a textual input and running it through something that produces textual output is fast.> > As an example, take `clang -verify` tests. It is certainly possible to > check diagnostic output using FileCheck, so why does clang have a > -verify option?AFAIK, to make it easy to write tests to verify that particular message appear with specific line numbers relative to the code in the tests without hard-coding the particular line numbers in the test (including with offsets, etc.). FileCheck probably could have been enhanced to do this directly (especially with all of its recent enhancements). -Hal> > > Michael-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Michael Kruse via llvm-dev
2020-Jul-02 05:44 UTC
[llvm-dev] [RFC] Compiled regression tests.
Am Mi., 1. Juli 2020 um 14:36 Uhr schrieb Hal Finkel <hfinkel at anl.gov>:> When I teach my compilers class, I tell my students to liberally add the > ability to serialize to interpretable text all of their internal data > structures. It will seem like extra work at first, but when they're > trying to debug things later, it will be really helpful. I think this is > a key lesson that I, at least, have learned from LLVM. It makes us all > more productive in the end (in part because we often spend much more > time debugging our code than writing it in the first place). Firing up > an actual debugger is slow and (despite our best efforts) fragile, > changing a textual input and running it through something that produces > textual output is fast.One of the first things I write for my data structure is indeed a dump function. However, the output is not stable since I regularly change/remove/add information that is dumped depending on whether information is relevant, adds too much noise, or found a better textual representation of the same thing. Michael
On 7/2/20 12:44 AM, Michael Kruse wrote:> Am Mi., 1. Juli 2020 um 14:36 Uhr schrieb Hal Finkel <hfinkel at anl.gov>: >> When I teach my compilers class, I tell my students to liberally add the >> ability to serialize to interpretable text all of their internal data >> structures. It will seem like extra work at first, but when they're >> trying to debug things later, it will be really helpful. I think this is >> a key lesson that I, at least, have learned from LLVM. It makes us all >> more productive in the end (in part because we often spend much more >> time debugging our code than writing it in the first place). Firing up >> an actual debugger is slow and (despite our best efforts) fragile, >> changing a textual input and running it through something that produces >> textual output is fast. > One of the first things I write for my data structure is indeed a dump > function. However, the output is not stable since I regularly > change/remove/add information that is dumped depending on whether > information is relevant, adds too much noise, or found a better > textual representation of the same thing.I think that, to a large extent, we're on the same page on this aspect. It's a question of reuse and stability. If there's a principled way to design an output that will be reused across many tests and can reasonably be believed will remain relatively stable, then we should do that. If not, then unit tests are better. The question is: do we have so many such unit tests that we want a special way to construct them from IR files (instead of, I suppose, just having the IR in a string in the code)? I don't know. -Hal> > > Michael-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory