Fedor Sergeev via llvm-dev
2018-Mar-22 15:04 UTC
[llvm-dev] Commit module to Git after each Pass
Oh, well... as usually the answer appears to be pretty obvious. 99% of the time is spent inside the plain write. -print-after-all prints into llvm::errs(), which is an *unbuffered* raw_fd_stream. And -git-commit-after-all opens a *buffered* raw_fd_stream. As soon as I hacked -print-after-all to use a buffered stream to stderr performance went up to the normal expected values: ] time bin/opt -O1 big-ir.ll -disable-output -print-after-all -print-module-scope 2>&1 | grep -c "^; ModuleID" 526 real 0m2.363s user 0m2.373s sys 0m0.271s ] So, the morale of this story is - we should not be printing module IR into dbgs/errs(). And then the idea of streaming IR module dumps into a buffered stream and then postprocessing seems to be a right one. regards, Fedor. On 03/21/2018 01:08 PM, Fedor Sergeev via llvm-dev wrote:> On 03/16/2018 01:21 AM, Fedor Sergeev via llvm-dev wrote: > > git-commit-after-all solution has one serious issue - it has a > hardcoded git handling which > > makes it look problematic from many angles (picking a proper git, > > selecting exact way of storing information, creating repository, > replacing the file etc etc). > > > > Just dumping information in a way that allows easy subsequent > machine processing > > seems to be a more flexible, less cluttered and overall clean > solution that allows to avoid > > making any of "user interface" decisions mentioned above. > > > > We need to understand why git-commit-after-all works faster than > print-after-all. > Made an interesting experiment today and extended your > git-commit-after-all to avoid issuing > any git commands if git-repo starts with "/dev/". > > With git-repo=/dev/stderr it becomes functionally equivalent to > print-after-all+print-module-scope, > dumping module into stderr after each pass. > > On my testcase: > > # first normal git-commit-after-all execution > ] rm -rf test-git; time $RR/bin/opt -O1 some-ir.ll -disable-output > -git-commit-after-all -git-repo=./test-git > > real 0m7.172s > user 0m6.303s > sys 0m0.902s > # then "printing" git-commit-after-all execution > ] time $RR/bin/opt -O1 some-ir.ll -disable-output > -git-commit-after-all -git-repo=/dev/stderr 2>&1 | grep -c '^; ModuleID' > 615 > > real 0m2.893s > user 0m2.859s > sys 0m0.356s > # and finally print-after-all > ] time $RR/bin/opt -O1 some-ir.ll -disable-output -print-after-all > -print-module-scope 2>&1 | grep -c "^; ModuleID" > 526 > > real 2m8.024s > user 0m55.933s > sys 3m19.253s > ] > Ugh... 60x??? > Now, I'm set to analyze this astonishing difference that threatens my > sanity (while I'm still sane ... hopefully). > > regards, > Fedor. > PS btw, I checked /dev/null - and it works faster than /dev/stderr as > expected :) > > > I dont believe in magic... yet :) > > > > And, btw, thanks for both the idea and the patch. > > > > regards, > > Fedor. > > > > On 03/16/2018 12:03 AM, Alexandre Isoard wrote: > >> If this is faster than -print-after-all we may actually consider > pushing that in the code base then? (after diligent code review of > course) > >> > >> Note that it uses the same printing method as -print-after-all: > >> - create a pass of the same pass kind as the pass we just ran > >> - use Module::print(raw_ostream) to print (except -print-after-all > only print the concerned part and into stdout) > >> > >> If there is improvement to be done to print-after-all it might also > improve git-commit-after-all. (unless that only improve speed when > printing constructs smaller than module) > >> > >> In any case, it is, to me, much more usable (and extensible) than > -print-after-all. But requires git to be in PATH (I'm curious if that > works on Windows). > >> > >> On Thu, Mar 15, 2018 at 1:35 PM, Daniel Sanders > <daniel_l_sanders at apple.com> wrote: > >> > >> Does https://reviews.llvm.org/D44132 help at all? > >> > >> > >>> On 15 Mar 2018, at 09:16, Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > >>> > >>> The most likely answer is that the printer used by > print-after-all is slow. I know there were some changes made around > passing in some form of state cache (metadata related?) and that > running printers without doing so work, but are dog slow. I suspect > the print-after-all support was never updated. Look at what we do for > the normal IR emission "-S" and see if print-after-all is out of sync. > >>> > >>> Philip > >>> > >>> On 03/15/2018 08:45 AM, Alexandre Isoard via llvm-dev wrote: > >>>> Huh. Great! 😁 > >>>> > >>>> I don't believe my poor excuse from earlier (else we should > map all pipes into files!), but I'm curious why we spend less time in > system mode when going through file than pipe. Maybe /dev/null is not > as efficient as we might think? I can't believe I'm saying that... > >>>> > >>>> On Thu, Mar 15, 2018, 08:25 Fedor Sergeev > <fedor.sergeev at azul.com> wrote: > >>>> > >>>> Well, git by itself is so focused on performance, so its > not surprising > >>>> to me that even using git add/git commit does not cause > >>>> performance penalties. > >>>> > >>>> > >>>> Sure, but still, I write more stuff (entire module) into a > slower destination (file). Even ignoring git execution time it's > counter intuitive. > >>>> > >>>> The only difference is that while I write more, it overwrite > itself continuously, instead of being a long linear steam. I was > thinking of mmap the file instead of going through our raw_stream, but > maybe that's unnecessary then... > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > >> > >> > >> > >> -- > >> Alexandre Isoard > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reid Kleckner via llvm-dev
2018-Mar-22 17:37 UTC
[llvm-dev] Commit module to Git after each Pass
Obviously, we do not want all stderr output to be buffered. However, I think it would be great to change Function::print and Module::print to call raw_ostream::SetBuffered / raw_ostream::SetUnbuffered before and after printing. I guess if the original stream was buffered we don't want to mark it unbuffered, so we may need to tweak the raw_ostream interface. Looks easy, though. On Thu, Mar 22, 2018 at 8:06 AM Fedor Sergeev via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Oh, well... as usually the answer appears to be pretty obvious. > 99% of the time is spent inside the plain write. > > -print-after-all prints into llvm::errs(), which is an *unbuffered* > raw_fd_stream. > And -git-commit-after-all opens a *buffered* raw_fd_stream. > > As soon as I hacked -print-after-all to use a buffered stream to stderr > performance went > up to the normal expected values: > > ] time bin/opt -O1 big-ir.ll -disable-output -print-after-all > -print-module-scope 2>&1 | grep -c "^; ModuleID" > 526 > > real 0m2.363s > user 0m2.373s > sys 0m0.271s > ] > > So, the morale of this story is - we should not be printing module IR > into dbgs/errs(). > > And then the idea of streaming IR module dumps into a buffered stream > and then postprocessing seems > to be a right one. > > regards, > Fedor. > > On 03/21/2018 01:08 PM, Fedor Sergeev via llvm-dev wrote: > > On 03/16/2018 01:21 AM, Fedor Sergeev via llvm-dev wrote: > > > git-commit-after-all solution has one serious issue - it has a > > hardcoded git handling which > > > makes it look problematic from many angles (picking a proper git, > > > selecting exact way of storing information, creating repository, > > replacing the file etc etc). > > > > > > Just dumping information in a way that allows easy subsequent > > machine processing > > > seems to be a more flexible, less cluttered and overall clean > > solution that allows to avoid > > > making any of "user interface" decisions mentioned above. > > > > > > We need to understand why git-commit-after-all works faster than > > print-after-all. > > Made an interesting experiment today and extended your > > git-commit-after-all to avoid issuing > > any git commands if git-repo starts with "/dev/". > > > > With git-repo=/dev/stderr it becomes functionally equivalent to > > print-after-all+print-module-scope, > > dumping module into stderr after each pass. > > > > On my testcase: > > > > # first normal git-commit-after-all execution > > ] rm -rf test-git; time $RR/bin/opt -O1 some-ir.ll -disable-output > > -git-commit-after-all -git-repo=./test-git > > > > real 0m7.172s > > user 0m6.303s > > sys 0m0.902s > > # then "printing" git-commit-after-all execution > > ] time $RR/bin/opt -O1 some-ir.ll -disable-output > > -git-commit-after-all -git-repo=/dev/stderr 2>&1 | grep -c '^; ModuleID' > > 615 > > > > real 0m2.893s > > user 0m2.859s > > sys 0m0.356s > > # and finally print-after-all > > ] time $RR/bin/opt -O1 some-ir.ll -disable-output -print-after-all > > -print-module-scope 2>&1 | grep -c "^; ModuleID" > > 526 > > > > real 2m8.024s > > user 0m55.933s > > sys 3m19.253s > > ] > > Ugh... 60x??? > > Now, I'm set to analyze this astonishing difference that threatens my > > sanity (while I'm still sane ... hopefully). > > > > regards, > > Fedor. > > PS btw, I checked /dev/null - and it works faster than /dev/stderr as > > expected :) > > > > > I dont believe in magic... yet :) > > > > > > And, btw, thanks for both the idea and the patch. > > > > > > regards, > > > Fedor. > > > > > > On 03/16/2018 12:03 AM, Alexandre Isoard wrote: > > >> If this is faster than -print-after-all we may actually consider > > pushing that in the code base then? (after diligent code review of > > course) > > >> > > >> Note that it uses the same printing method as -print-after-all: > > >> - create a pass of the same pass kind as the pass we just ran > > >> - use Module::print(raw_ostream) to print (except -print-after-all > > only print the concerned part and into stdout) > > >> > > >> If there is improvement to be done to print-after-all it might also > > improve git-commit-after-all. (unless that only improve speed when > > printing constructs smaller than module) > > >> > > >> In any case, it is, to me, much more usable (and extensible) than > > -print-after-all. But requires git to be in PATH (I'm curious if that > > works on Windows). > > >> > > >> On Thu, Mar 15, 2018 at 1:35 PM, Daniel Sanders > > <daniel_l_sanders at apple.com> wrote: > > >> > > >> Does https://reviews.llvm.org/D44132 help at all? > > >> > > >> > > >>> On 15 Mar 2018, at 09:16, Philip Reames via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > >>> > > >>> The most likely answer is that the printer used by > > print-after-all is slow. I know there were some changes made around > > passing in some form of state cache (metadata related?) and that > > running printers without doing so work, but are dog slow. I suspect > > the print-after-all support was never updated. Look at what we do for > > the normal IR emission "-S" and see if print-after-all is out of sync. > > >>> > > >>> Philip > > >>> > > >>> On 03/15/2018 08:45 AM, Alexandre Isoard via llvm-dev wrote: > > >>>> Huh. Great! 😁 > > >>>> > > >>>> I don't believe my poor excuse from earlier (else we should > > map all pipes into files!), but I'm curious why we spend less time in > > system mode when going through file than pipe. Maybe /dev/null is not > > as efficient as we might think? I can't believe I'm saying that... > > >>>> > > >>>> On Thu, Mar 15, 2018, 08:25 Fedor Sergeev > > <fedor.sergeev at azul.com> wrote: > > >>>> > > >>>> Well, git by itself is so focused on performance, so its > > not surprising > > >>>> to me that even using git add/git commit does not cause > > >>>> performance penalties. > > >>>> > > >>>> > > >>>> Sure, but still, I write more stuff (entire module) into a > > slower destination (file). Even ignoring git execution time it's > > counter intuitive. > > >>>> > > >>>> The only difference is that while I write more, it overwrite > > itself continuously, instead of being a long linear steam. I was > > thinking of mmap the file instead of going through our raw_stream, but > > maybe that's unnecessary then... > > >>>> > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> LLVM Developers mailing list > > >>>> llvm-dev at lists.llvm.org > > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >>> > > >>> _______________________________________________ > > >>> LLVM Developers mailing list > > >>> llvm-dev at lists.llvm.org > > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >> > > >> > > >> > > >> > > >> -- > > >> Alexandre Isoard > > > > > > > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180322/78c7dca7/attachment.html>
Alexandre Isoard via llvm-dev
2018-Jun-14 17:49 UTC
[llvm-dev] Commit module to Git after each Pass
Hello, Just an update on that. I am personally using -git-commit-after-all *as-is* extremely frequently (combined with "git filter-branch" and "opt -S -instnamer" it is extremely useful). I unfortunately won't have time to write a better implementation of that, and I agree "git fast-import" seems the way to go. If anybody is motivated enough to do so, feel free. Best regard! On Thu, Mar 22, 2018 at 10:38 AM Reid Kleckner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Obviously, we do not want all stderr output to be buffered. However, I > think it would be great to change Function::print and Module::print to call > raw_ostream::SetBuffered / raw_ostream::SetUnbuffered before and after > printing. I guess if the original stream was buffered we don't want to mark > it unbuffered, so we may need to tweak the raw_ostream interface. Looks > easy, though. > > > On Thu, Mar 22, 2018 at 8:06 AM Fedor Sergeev via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Oh, well... as usually the answer appears to be pretty obvious. >> 99% of the time is spent inside the plain write. >> >> -print-after-all prints into llvm::errs(), which is an *unbuffered* >> raw_fd_stream. >> And -git-commit-after-all opens a *buffered* raw_fd_stream. >> >> As soon as I hacked -print-after-all to use a buffered stream to stderr >> performance went >> up to the normal expected values: >> >> ] time bin/opt -O1 big-ir.ll -disable-output -print-after-all >> -print-module-scope 2>&1 | grep -c "^; ModuleID" >> 526 >> >> real 0m2.363s >> user 0m2.373s >> sys 0m0.271s >> ] >> >> So, the morale of this story is - we should not be printing module IR >> into dbgs/errs(). >> >> And then the idea of streaming IR module dumps into a buffered stream >> and then postprocessing seems >> to be a right one. >> >> regards, >> Fedor. >> >> On 03/21/2018 01:08 PM, Fedor Sergeev via llvm-dev wrote: >> > On 03/16/2018 01:21 AM, Fedor Sergeev via llvm-dev wrote: >> > > git-commit-after-all solution has one serious issue - it has a >> > hardcoded git handling which >> > > makes it look problematic from many angles (picking a proper git, >> > > selecting exact way of storing information, creating repository, >> > replacing the file etc etc). >> > > >> > > Just dumping information in a way that allows easy subsequent >> > machine processing >> > > seems to be a more flexible, less cluttered and overall clean >> > solution that allows to avoid >> > > making any of "user interface" decisions mentioned above. >> > > >> > > We need to understand why git-commit-after-all works faster than >> > print-after-all. >> > Made an interesting experiment today and extended your >> > git-commit-after-all to avoid issuing >> > any git commands if git-repo starts with "/dev/". >> > >> > With git-repo=/dev/stderr it becomes functionally equivalent to >> > print-after-all+print-module-scope, >> > dumping module into stderr after each pass. >> > >> > On my testcase: >> > >> > # first normal git-commit-after-all execution >> > ] rm -rf test-git; time $RR/bin/opt -O1 some-ir.ll -disable-output >> > -git-commit-after-all -git-repo=./test-git >> > >> > real 0m7.172s >> > user 0m6.303s >> > sys 0m0.902s >> > # then "printing" git-commit-after-all execution >> > ] time $RR/bin/opt -O1 some-ir.ll -disable-output >> > -git-commit-after-all -git-repo=/dev/stderr 2>&1 | grep -c '^; ModuleID' >> > 615 >> > >> > real 0m2.893s >> > user 0m2.859s >> > sys 0m0.356s >> > # and finally print-after-all >> > ] time $RR/bin/opt -O1 some-ir.ll -disable-output -print-after-all >> > -print-module-scope 2>&1 | grep -c "^; ModuleID" >> > 526 >> > >> > real 2m8.024s >> > user 0m55.933s >> > sys 3m19.253s >> > ] >> > Ugh... 60x??? >> > Now, I'm set to analyze this astonishing difference that threatens my >> > sanity (while I'm still sane ... hopefully). >> > >> > regards, >> > Fedor. >> > PS btw, I checked /dev/null - and it works faster than /dev/stderr as >> > expected :) >> > >> > > I dont believe in magic... yet :) >> > > >> > > And, btw, thanks for both the idea and the patch. >> > > >> > > regards, >> > > Fedor. >> > > >> > > On 03/16/2018 12:03 AM, Alexandre Isoard wrote: >> > >> If this is faster than -print-after-all we may actually consider >> > pushing that in the code base then? (after diligent code review of >> > course) >> > >> >> > >> Note that it uses the same printing method as -print-after-all: >> > >> - create a pass of the same pass kind as the pass we just ran >> > >> - use Module::print(raw_ostream) to print (except -print-after-all >> > only print the concerned part and into stdout) >> > >> >> > >> If there is improvement to be done to print-after-all it might also >> > improve git-commit-after-all. (unless that only improve speed when >> > printing constructs smaller than module) >> > >> >> > >> In any case, it is, to me, much more usable (and extensible) than >> > -print-after-all. But requires git to be in PATH (I'm curious if that >> > works on Windows). >> > >> >> > >> On Thu, Mar 15, 2018 at 1:35 PM, Daniel Sanders >> > <daniel_l_sanders at apple.com> wrote: >> > >> >> > >> Does https://reviews.llvm.org/D44132 help at all? >> > >> >> > >> >> > >>> On 15 Mar 2018, at 09:16, Philip Reames via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> > >>> >> > >>> The most likely answer is that the printer used by >> > print-after-all is slow. I know there were some changes made around >> > passing in some form of state cache (metadata related?) and that >> > running printers without doing so work, but are dog slow. I suspect >> > the print-after-all support was never updated. Look at what we do for >> > the normal IR emission "-S" and see if print-after-all is out of sync. >> > >>> >> > >>> Philip >> > >>> >> > >>> On 03/15/2018 08:45 AM, Alexandre Isoard via llvm-dev wrote: >> > >>>> Huh. Great! 😁 >> > >>>> >> > >>>> I don't believe my poor excuse from earlier (else we should >> > map all pipes into files!), but I'm curious why we spend less time in >> > system mode when going through file than pipe. Maybe /dev/null is not >> > as efficient as we might think? I can't believe I'm saying that... >> > >>>> >> > >>>> On Thu, Mar 15, 2018, 08:25 Fedor Sergeev >> > <fedor.sergeev at azul.com> wrote: >> > >>>> >> > >>>> Well, git by itself is so focused on performance, so its >> > not surprising >> > >>>> to me that even using git add/git commit does not cause >> > >>>> performance penalties. >> > >>>> >> > >>>> >> > >>>> Sure, but still, I write more stuff (entire module) into a >> > slower destination (file). Even ignoring git execution time it's >> > counter intuitive. >> > >>>> >> > >>>> The only difference is that while I write more, it overwrite >> > itself continuously, instead of being a long linear steam. I was >> > thinking of mmap the file instead of going through our raw_stream, but >> > maybe that's unnecessary then... >> > >>>> >> > >>>> >> > >>>> >> > >>>> _______________________________________________ >> > >>>> LLVM Developers mailing list >> > >>>> llvm-dev at lists.llvm.org >> > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >>> >> > >>> _______________________________________________ >> > >>> LLVM Developers mailing list >> > >>> llvm-dev at lists.llvm.org >> > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> >> > >> >> > >> >> > >> >> > >> -- >> > >> Alexandre Isoard >> > > >> > > >> > > >> > > _______________________________________________ >> > > LLVM Developers mailing list >> > > llvm-dev at lists.llvm.org >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180614/3eba1dfb/attachment-0001.html>