Troy Johnson via llvm-dev
2018-Jun-15 16:51 UTC
[llvm-dev] Commit module to Git after each Pass
> FWIW: We could also just have a mode that dumps 1 file per pass. That is enough to make it convenient/easy to run diff between passes. (And if you wanted to you could still > make a git repository out of it with an external script). > > - MatthiasI have done this before and would strongly encourage this approach as opposed to direct emission to std[out|err] or directly involving a source control system. The most convenient way was to add an additional option, -print-to-files, which modified the behavior of -print-after-all, -print-before-all, etc. The filename was constructed by massaging the pass name to comply with file system naming conventions and prepending a monotonically increasing integer (with suitable leading zeros) plus "bef" or "aft" to indicate sequencing. The only awkward part was modifying createPrinterPass to accept a filename, which had to be done because otherwise you end up having to keep each stream open from the time you setup the pass pipeline until the printing pass actually runs. -Troy ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of mbraun via llvm-dev <llvm-dev at lists.llvm.org> Sent: Thursday, June 14, 2018 3:48 PM To: Alexandre Isoard Cc: llvm-dev Subject: Re: [llvm-dev] Commit module to Git after each Pass FWIW: We could also just have a mode that dumps 1 file per pass. That is enough to make it convenient/easy to run diff between passes. (And if you wanted to you could still make a git repository out of it with an external script). - Matthias On Jun 14, 2018, at 10:49 AM, Alexandre Isoard via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hello, Just an update on that. I am personally using -git-commit-after-all *as-is* extremely frequently (combined with "git filter-branch" and "opt -S -instnamer" it is extremely useful). I unfortunately won't have time to write a better implementation of that, and I agree "git fast-import" seems the way to go. If anybody is motivated enough to do so, feel free. Best regard! On Thu, Mar 22, 2018 at 10:38 AM Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Obviously, we do not want all stderr output to be buffered. However, I think it would be great to change Function::print and Module::print to call raw_ostream::SetBuffered / raw_ostream::SetUnbuffered before and after printing. I guess if the original stream was buffered we don't want to mark it unbuffered, so we may need to tweak the raw_ostream interface. Looks easy, though. On Thu, Mar 22, 2018 at 8:06 AM Fedor Sergeev via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Oh, well... as usually the answer appears to be pretty obvious. 99% of the time is spent inside the plain write. -print-after-all prints into llvm::errs(), which is an *unbuffered* raw_fd_stream. And -git-commit-after-all opens a *buffered* raw_fd_stream. As soon as I hacked -print-after-all to use a buffered stream to stderr performance went up to the normal expected values: ] time bin/opt -O1 big-ir.ll -disable-output -print-after-all -print-module-scope 2>&1 | grep -c "^; ModuleID" 526 real 0m2.363s user 0m2.373s sys 0m0.271s ] So, the morale of this story is - we should not be printing module IR into dbgs/errs(). And then the idea of streaming IR module dumps into a buffered stream and then postprocessing seems to be a right one. regards, Fedor. On 03/21/2018 01:08 PM, Fedor Sergeev via llvm-dev wrote:> On 03/16/2018 01:21 AM, Fedor Sergeev via llvm-dev wrote: > > git-commit-after-all solution has one serious issue - it has a > hardcoded git handling which > > makes it look problematic from many angles (picking a proper git, > > selecting exact way of storing information, creating repository, > replacing the file etc etc). > > > > Just dumping information in a way that allows easy subsequent > machine processing > > seems to be a more flexible, less cluttered and overall clean > solution that allows to avoid > > making any of "user interface" decisions mentioned above. > > > > We need to understand why git-commit-after-all works faster than > print-after-all. > Made an interesting experiment today and extended your > git-commit-after-all to avoid issuing > any git commands if git-repo starts with "/dev/". > > With git-repo=/dev/stderr it becomes functionally equivalent to > print-after-all+print-module-scope, > dumping module into stderr after each pass. > > On my testcase: > > # first normal git-commit-after-all execution > ] rm -rf test-git; time $RR/bin/opt -O1 some-ir.ll -disable-output > -git-commit-after-all -git-repo=./test-git > > real 0m7.172s > user 0m6.303s > sys 0m0.902s > # then "printing" git-commit-after-all execution > ] time $RR/bin/opt -O1 some-ir.ll -disable-output > -git-commit-after-all -git-repo=/dev/stderr 2>&1 | grep -c '^; ModuleID' > 615 > > real 0m2.893s > user 0m2.859s > sys 0m0.356s > # and finally print-after-all > ] time $RR/bin/opt -O1 some-ir.ll -disable-output -print-after-all > -print-module-scope 2>&1 | grep -c "^; ModuleID" > 526 > > real 2m8.024s > user 0m55.933s > sys 3m19.253s > ] > Ugh... 60x??? > Now, I'm set to analyze this astonishing difference that threatens my > sanity (while I'm still sane ... hopefully). > > regards, > Fedor. > PS btw, I checked /dev/null - and it works faster than /dev/stderr as > expected :) > > > I dont believe in magic... yet :) > > > > And, btw, thanks for both the idea and the patch. > > > > regards, > > Fedor. > > > > On 03/16/2018 12:03 AM, Alexandre Isoard wrote: > >> If this is faster than -print-after-all we may actually consider > pushing that in the code base then? (after diligent code review of > course) > >> > >> Note that it uses the same printing method as -print-after-all: > >> - create a pass of the same pass kind as the pass we just ran > >> - use Module::print(raw_ostream) to print (except -print-after-all > only print the concerned part and into stdout) > >> > >> If there is improvement to be done to print-after-all it might also > improve git-commit-after-all. (unless that only improve speed when > printing constructs smaller than module) > >> > >> In any case, it is, to me, much more usable (and extensible) than > -print-after-all. But requires git to be in PATH (I'm curious if that > works on Windows). > >> > >> On Thu, Mar 15, 2018 at 1:35 PM, Daniel Sanders > <daniel_l_sanders at apple.com<mailto:daniel_l_sanders at apple.com>> wrote: > >> > >> Does https://reviews.llvm.org/D44132 help at all? > >> > >> > >>> On 15 Mar 2018, at 09:16, Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: > >>> > >>> The most likely answer is that the printer used by > print-after-all is slow. I know there were some changes made around > passing in some form of state cache (metadata related?) and that > running printers without doing so work, but are dog slow. I suspect > the print-after-all support was never updated. Look at what we do for > the normal IR emission "-S" and see if print-after-all is out of sync. > >>> > >>> Philip > >>> > >>> On 03/15/2018 08:45 AM, Alexandre Isoard via llvm-dev wrote: > >>>> Huh. Great! 😁 > >>>> > >>>> I don't believe my poor excuse from earlier (else we should > map all pipes into files!), but I'm curious why we spend less time in > system mode when going through file than pipe. Maybe /dev/null is not > as efficient as we might think? I can't believe I'm saying that... > >>>> > >>>> On Thu, Mar 15, 2018, 08:25 Fedor Sergeev > <fedor.sergeev at azul.com<mailto:fedor.sergeev at azul.com>> wrote: > >>>> > >>>> Well, git by itself is so focused on performance, so its > not surprising > >>>> to me that even using git add/git commit does not cause > >>>> performance penalties. > >>>> > >>>> > >>>> Sure, but still, I write more stuff (entire module) into a > slower destination (file). Even ignoring git execution time it's > counter intuitive. > >>>> > >>>> The only difference is that while I write more, it overwrite > itself continuously, instead of being a long linear steam. I was > thinking of mmap the file instead of going through our raw_stream, but > maybe that's unnecessary then... > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > >> > >> > >> -- > >> Alexandre Isoard > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev_______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Alexandre Isoard _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/94a9a3e0/attachment-0001.html>
Alexandre Isoard via llvm-dev
2018-Jun-15 17:49 UTC
[llvm-dev] Commit module to Git after each Pass
On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > FWIW: We could also just have a mode that dumps 1 file per pass. That is > enough to make it convenient/easy to run diff between passes. (And if you > wanted to you could still > > make a git repository out of it with an external script). > > > > - Matthias > > I have done this before and would strongly encourage this approach as > opposed to direct emission to std[out|err] or directly involving a source > control system. The most convenient way was to add an additional option, > -print-to-files, which modified the behavior of -print-after-all, > -print-before-all, etc. The filename was constructed by massaging the pass > name to comply with file system naming conventions and prepending a > monotonically increasing integer (with suitable leading zeros) plus "bef" > or "aft" to indicate sequencing. The only awkward part was modifying > createPrinterPass to accept a filename, which had to be done because > otherwise you end up having to keep each stream open from the time you > setup the pass pipeline until the printing pass actually runs. > > > -Troy >That was the exact implementation we had, and that was way too many files for our file system, we would have to create subfolders each ~100 passes. Additionally, this took a lot of disk space and the only metadata we could store was in the file-name. Do you skip passes that don't change the module? How do you store the missed optimization opportunities messages? On the other hand, with git, I can store much more in the commit message (I actually extended the thing to allow a pass to tag a commit, and I am planning to allow passes to print into the commit message itself). Yesterday, I wanted to see when the compiler diverge when I tweak SCEV reduction rules so what I did is run the compiler once, switch the branch back to the beginning, do a second run with my modification, and the git history will automatically identify identical commit. That is, I directly get, in the git history tree, the divergence point between the two versions. And that's just scrapping the top of the iceberg. Git is designed to be a version control system, true, but it can also be re-purposed into a tremendous tool box. I would seriously encourage going into the "git fast-import" direction, or a semantically equivalent output format that we post-process, because I think it would simplify the implementation (especially to allow a pass to dump anything into the commit message). But don't pass on the actual benefits of having a version control system backend. -- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/a5d1b256/attachment.html>
Troy Johnson via llvm-dev
2018-Jun-15 18:13 UTC
[llvm-dev] Commit module to Git after each Pass
It's only a huge number of files if you're running over a set of input files and are using the -print-*-all options, which was not my use-case. Typically the use-case is debugging a problem in a single input file with -print-*-all, where generating a few hundred files is fine, or debugging a specific pass with -print-*= for some set of files, which similarly might generate a few hundred files. In other words, you usually know which input file is experiencing a problem or you know which pass is causing a problem. If you don't know either, then, well, you are kind of stuck until you narrow your scope further, but there are other tools to help with that. I was not skipping any passes. Storing optimization messages was not of interest. Storing additional metadata was not of interest. As I said, -print-to-files only modified where the -print-* options sent their output. That's it. I use git, and I like git, but would rather leave separate tools as separate tools. Printing to files, you are totally free to add them to a git repository if you want, but committing them directly forces others to use git just to see the data. Given that at least two people have implemented virtually the same thing, it seems like -print-to-files would be generally useful. Others may not need so many files or have your file system constraint. Would others find it useful? -Troy ________________________________ From: Alexandre Isoard <alexandre.isoard at gmail.com> Sent: Friday, June 15, 2018 12:49 PM To: Troy Johnson Cc: llvm-dev Subject: Re: [llvm-dev] Commit module to Git after each Pass On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:> FWIW: We could also just have a mode that dumps 1 file per pass. That is enough to make it convenient/easy to run diff between passes. (And if you wanted to you could still > make a git repository out of it with an external script). > > - MatthiasI have done this before and would strongly encourage this approach as opposed to direct emission to std[out|err] or directly involving a source control system. The most convenient way was to add an additional option, -print-to-files, which modified the behavior of -print-after-all, -print-before-all, etc. The filename was constructed by massaging the pass name to comply with file system naming conventions and prepending a monotonically increasing integer (with suitable leading zeros) plus "bef" or "aft" to indicate sequencing. The only awkward part was modifying createPrinterPass to accept a filename, which had to be done because otherwise you end up having to keep each stream open from the time you setup the pass pipeline until the printing pass actually runs. -Troy That was the exact implementation we had, and that was way too many files for our file system, we would have to create subfolders each ~100 passes. Additionally, this took a lot of disk space and the only metadata we could store was in the file-name. Do you skip passes that don't change the module? How do you store the missed optimization opportunities messages? On the other hand, with git, I can store much more in the commit message (I actually extended the thing to allow a pass to tag a commit, and I am planning to allow passes to print into the commit message itself). Yesterday, I wanted to see when the compiler diverge when I tweak SCEV reduction rules so what I did is run the compiler once, switch the branch back to the beginning, do a second run with my modification, and the git history will automatically identify identical commit. That is, I directly get, in the git history tree, the divergence point between the two versions. And that's just scrapping the top of the iceberg. Git is designed to be a version control system, true, but it can also be re-purposed into a tremendous tool box. I would seriously encourage going into the "git fast-import" direction, or a semantically equivalent output format that we post-process, because I think it would simplify the implementation (especially to allow a pass to dump anything into the commit message). But don't pass on the actual benefits of having a version control system backend. -- Alexandre Isoard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/f69504e9/attachment.html>