thr3ads.net - llvm dev - [llvm-dev] Commit module to Git after each Pass [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Troy Johnson via llvm-dev

2018-Jun-15 16:51 UTC

[llvm-dev] Commit module to Git after each Pass

> FWIW: We could also just have a mode that dumps 1 file per pass. That is
enough to make it convenient/easy to run diff between passes. (And if you wanted
to you could still
> make a git repository out of it with an external script).
>
> - Matthias

I have done this before and would strongly encourage this approach as opposed to
direct emission to std[out|err] or directly involving a source control system. 
The most convenient way was to add an additional option, -print-to-files, which
modified the behavior of -print-after-all, -print-before-all, etc.  The filename
was constructed by massaging the pass name to comply with file system naming
conventions and prepending a monotonically increasing integer (with suitable
leading zeros) plus "bef" or "aft" to indicate sequencing. 
The only awkward part was modifying createPrinterPass to accept a filename,
which had to be done because otherwise you end up having to keep each stream
open from the time you setup the pass pipeline until the printing pass actually
runs.


-Troy

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of mbraun
via llvm-dev <llvm-dev at lists.llvm.org>
Sent: Thursday, June 14, 2018 3:48 PM
To: Alexandre Isoard
Cc: llvm-dev
Subject: Re: [llvm-dev] Commit module to Git after each Pass

FWIW: We could also just have a mode that dumps 1 file per pass. That is enough
to make it convenient/easy to run diff between passes.
(And if you wanted to you could still make a git repository out of it with an
external script).

- Matthias

On Jun 14, 2018, at 10:49 AM, Alexandre Isoard via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hello,

Just an update on that. I am personally using -git-commit-after-all *as-is*
extremely frequently (combined with "git filter-branch" and "opt
-S -instnamer" it is extremely useful).
I unfortunately won't have time to write a better implementation of that,
and I agree "git fast-import" seems the way to go. If anybody is
motivated enough to do so, feel free.

Best regard!

On Thu, Mar 22, 2018 at 10:38 AM Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Obviously, we do not want all stderr output to be buffered. However, I think it
would be great to change Function::print and Module::print to call
raw_ostream::SetBuffered / raw_ostream::SetUnbuffered before and after printing.
I guess if the original stream was buffered we don't want to mark it
unbuffered, so we may need to tweak the raw_ostream interface. Looks easy,
though.


On Thu, Mar 22, 2018 at 8:06 AM Fedor Sergeev via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Oh, well... as usually the answer appears to be pretty obvious.
99% of the time is spent inside the plain write.

-print-after-all prints into llvm::errs(), which is an *unbuffered*
raw_fd_stream.
And -git-commit-after-all opens a *buffered* raw_fd_stream.

As soon as I hacked -print-after-all to use a buffered stream to stderr
performance went
up to the normal expected values:

] time bin/opt -O1 big-ir.ll -disable-output -print-after-all
-print-module-scope 2>&1 | grep -c "^; ModuleID"
526

real    0m2.363s
user    0m2.373s
sys     0m0.271s
]

So, the morale of this story is - we should not be printing module IR
into dbgs/errs().

And then the idea of streaming IR module dumps into a buffered stream
and then postprocessing seems
to be a right one.

regards,
   Fedor.

On 03/21/2018 01:08 PM, Fedor Sergeev via llvm-dev
wrote:> On 03/16/2018 01:21 AM, Fedor Sergeev via llvm-dev wrote:
> > git-commit-after-all solution has one serious issue - it has a
> hardcoded git handling which
> > makes it look problematic from many angles (picking a proper git,
> > selecting exact way of storing information, creating repository,
> replacing the file etc etc).
> >
> > Just dumping information in a way that allows easy subsequent
> machine processing
> > seems to be a more flexible, less cluttered and overall clean
> solution that allows to avoid
> > making any of "user interface" decisions mentioned above.
> >
> > We need to understand why git-commit-after-all works faster than
> print-after-all.
> Made an interesting experiment today and extended your
> git-commit-after-all to avoid issuing
> any git commands if git-repo starts with "/dev/".
>
> With git-repo=/dev/stderr it becomes functionally equivalent to
> print-after-all+print-module-scope,
> dumping module into stderr after each pass.
>
> On my testcase:
>
> # first normal git-commit-after-all execution
> ] rm -rf test-git; time $RR/bin/opt -O1 some-ir.ll -disable-output
> -git-commit-after-all -git-repo=./test-git
>
> real    0m7.172s
> user    0m6.303s
> sys     0m0.902s
> # then "printing" git-commit-after-all execution
> ] time $RR/bin/opt -O1 some-ir.ll -disable-output
> -git-commit-after-all -git-repo=/dev/stderr 2>&1 | grep -c '^;
ModuleID'
> 615
>
> real    0m2.893s
> user    0m2.859s
> sys     0m0.356s
> # and finally print-after-all
> ] time $RR/bin/opt -O1 some-ir.ll -disable-output -print-after-all
> -print-module-scope 2>&1 | grep -c "^; ModuleID"
> 526
>
> real    2m8.024s
> user    0m55.933s
> sys     3m19.253s
> ]
> Ugh... 60x???
> Now, I'm set to analyze this astonishing difference that threatens my
> sanity (while I'm still sane ... hopefully).
>
> regards,
>   Fedor.
> PS btw, I checked /dev/null - and it works faster than /dev/stderr as
> expected :)
>
> > I dont believe in magic... yet :)
> >
> > And, btw, thanks for both the idea and the patch.
> >
> > regards,
> >   Fedor.
> >
> > On 03/16/2018 12:03 AM, Alexandre Isoard wrote:
> >> If this is faster than -print-after-all we may actually consider
> pushing that in the code base then? (after diligent code review of
> course)
> >>
> >> Note that it uses the same printing method as -print-after-all:
> >> - create a pass of the same pass kind as the pass we just ran
> >> - use Module::print(raw_ostream) to print (except -print-after-all
> only print the concerned part and into stdout)
> >>
> >> If there is improvement to be done to print-after-all it might
also
> improve git-commit-after-all. (unless that only improve speed when
> printing constructs smaller than module)
> >>
> >> In any case, it is, to me, much more usable (and extensible) than
> -print-after-all. But requires git to be in PATH (I'm curious if that
> works on Windows).
> >>
> >> On Thu, Mar 15, 2018 at 1:35 PM, Daniel Sanders
> <daniel_l_sanders at apple.com<mailto:daniel_l_sanders at
apple.com>> wrote:
> >>
> >>     Does https://reviews.llvm.org/D44132 help at all?
> >>
> >>
> >>>     On 15 Mar 2018, at 09:16, Philip Reames via llvm-dev
> <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
wrote:
> >>>
> >>>     The most likely answer is that the printer used by
> print-after-all is slow.  I know there were some changes made around
> passing in some form of state cache (metadata related?) and that
> running printers without doing so work, but are dog slow.  I suspect
> the print-after-all support was never updated.  Look at what we do for
> the normal IR emission "-S" and see if print-after-all is out of
sync.
> >>>
> >>>     Philip
> >>>
> >>>     On 03/15/2018 08:45 AM, Alexandre Isoard via llvm-dev
wrote:
> >>>>     Huh. Great! 😁
> >>>>
> >>>>     I don't believe my poor excuse from earlier (else
we should
> map all pipes into files!), but I'm curious why we spend less time in
> system mode when going through file than pipe. Maybe /dev/null is not
> as efficient as we might think? I can't believe I'm saying that...
> >>>>
> >>>>     On Thu, Mar 15, 2018, 08:25 Fedor Sergeev
> <fedor.sergeev at azul.com<mailto:fedor.sergeev at azul.com>>
wrote:
> >>>>
> >>>>         Well, git by itself is so focused on performance,
so its
> not surprising
> >>>>         to me that even using git add/git commit does not
cause
> >>>>         performance penalties.
> >>>>
> >>>>
> >>>>     Sure, but still, I write more stuff (entire module)
into a
> slower destination (file). Even ignoring git execution time it's
> counter intuitive.
> >>>>
> >>>>     The only difference is that while I write more, it
overwrite
> itself continuously, instead of being a long linear steam. I was
> thinking of mmap the file instead of going through our raw_stream, but
> maybe that's unnecessary then...
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>>     LLVM Developers mailing list
> >>>>     llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>>     _______________________________________________
> >>>     LLVM Developers mailing list
> >>>     llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >>
> >>
> >> --
> >> Alexandre Isoard
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Alexandre Isoard
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/94a9a3e0/attachment-0001.html>

Alexandre Isoard via llvm-dev

2018-Jun-15 17:49 UTC

head link

[llvm-dev] Commit module to Git after each Pass

On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> > FWIW: We could also just have a mode that dumps 1 file per pass. That
is
> enough to make it convenient/easy to run diff between passes. (And if you
> wanted to you could still
> > make a git repository out of it with an external script).
> >
> > - Matthias
>
> I have done this before and would strongly encourage this approach as
> opposed to direct emission to std[out|err] or directly involving a source
> control system.  The most convenient way was to add an additional option,
> -print-to-files, which modified the behavior of -print-after-all,
> -print-before-all, etc.  The filename was constructed by massaging the pass
> name to comply with file system naming conventions and prepending a
> monotonically increasing integer (with suitable leading zeros) plus
"bef"
> or "aft" to indicate sequencing.  The only awkward part was
modifying
> createPrinterPass to accept a filename, which had to be done because
> otherwise you end up having to keep each stream open from the time you
> setup the pass pipeline until the printing pass actually runs.
>
>
> -Troy
>
That was the exact implementation we had, and that was way too many files
for our file system, we would have to create subfolders each ~100 passes.
Additionally, this took a lot of disk space and the only metadata we could
store was in the file-name. Do you skip passes that don't change the
module? How do you store the missed optimization opportunities messages?

On the other hand, with git, I can store much more in the commit message (I
actually extended the thing to allow a pass to tag a commit, and I am
planning to allow passes to print into the commit message itself).
Yesterday, I wanted to see when the compiler diverge when I tweak SCEV
reduction rules so what I did is run the compiler once, switch the branch
back to the beginning, do a second run with my modification, and the git
history will automatically identify identical commit. That is, I directly
get, in the git history tree, the divergence point between the two versions.

And that's just scrapping the top of the iceberg. Git is designed to be a
version control system, true, but it can also be re-purposed into a
tremendous tool box.

I would seriously encourage going into the "git fast-import"
direction, or
a semantically equivalent output format that we post-process, because I
think it would simplify the implementation (especially to allow a pass to
dump anything into the commit message). But don't pass on the actual
benefits of having a version control system backend.

-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/a5d1b256/attachment.html>

Troy Johnson via llvm-dev

2018-Jun-15 18:13 UTC

head link

[llvm-dev] Commit module to Git after each Pass

It's only a huge number of files if you're running over a set of input
files and are using the -print-*-all options, which was not my use-case. 
Typically the use-case is debugging a problem in a single input file with
-print-*-all, where generating a few hundred files is fine, or debugging a
specific pass with -print-*= for some set of files, which similarly might
generate a few hundred files.  In other words, you usually know which input file
is experiencing a problem or you know which pass is causing a problem.  If you
don't know either, then, well, you are kind of stuck until you narrow your
scope further, but there are other tools to help with that.

I was not skipping any passes.  Storing optimization messages was not of
interest.  Storing additional metadata was not of interest.  As I said,
-print-to-files only modified where the -print-* options sent their output. 
That's it.

I use git, and I like git, but would rather leave separate tools as separate
tools.  Printing to files, you are totally free to add them to a git repository
if you want, but committing them directly forces others to use git just to see
the data.

Given that at least two people have implemented virtually the same thing, it
seems like -print-to-files would be generally useful.  Others may not need so
many files or have your file system constraint.  Would others find it useful?

-Troy

________________________________
From: Alexandre Isoard <alexandre.isoard at gmail.com>
Sent: Friday, June 15, 2018 12:49 PM
To: Troy Johnson
Cc: llvm-dev
Subject: Re: [llvm-dev] Commit module to Git after each Pass

On Fri, Jun 15, 2018 at 9:52 AM Troy Johnson via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
> FWIW: We could also just have a mode that dumps 1 file per pass. That is
enough to make it convenient/easy to run diff between passes. (And if you wanted
to you could still
> make a git repository out of it with an external script).
>
> - Matthias

I have done this before and would strongly encourage this approach as opposed to
direct emission to std[out|err] or directly involving a source control system. 
The most convenient way was to add an additional option, -print-to-files, which
modified the behavior of -print-after-all, -print-before-all, etc.  The filename
was constructed by massaging the pass name to comply with file system naming
conventions and prepending a monotonically increasing integer (with suitable
leading zeros) plus "bef" or "aft" to indicate sequencing. 
The only awkward part was modifying createPrinterPass to accept a filename,
which had to be done because otherwise you end up having to keep each stream
open from the time you setup the pass pipeline until the printing pass actually
runs.

-Troy

That was the exact implementation we had, and that was way too many files for
our file system, we would have to create subfolders each ~100 passes.
Additionally, this took a lot of disk space and the only metadata we could store
was in the file-name. Do you skip passes that don't change the module? How
do you store the missed optimization opportunities messages?

On the other hand, with git, I can store much more in the commit message (I
actually extended the thing to allow a pass to tag a commit, and I am planning
to allow passes to print into the commit message itself).
Yesterday, I wanted to see when the compiler diverge when I tweak SCEV reduction
rules so what I did is run the compiler once, switch the branch back to the
beginning, do a second run with my modification, and the git history will
automatically identify identical commit. That is, I directly get, in the git
history tree, the divergence point between the two versions.

And that's just scrapping the top of the iceberg. Git is designed to be a
version control system, true, but it can also be re-purposed into a tremendous
tool box.

I would seriously encourage going into the "git fast-import"
direction, or a semantically equivalent output format that we post-process,
because I think it would simplify the implementation (especially to allow a pass
to dump anything into the commit message). But don't pass on the actual
benefits of having a version control system backend.

--
Alexandre Isoard
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/f69504e9/attachment.html>

llvm dev - Jun 2018 - Commit module to Git after each Pass

[llvm-dev] Commit module to Git after each Pass

[llvm-dev] Commit module to Git after each Pass

[llvm-dev] Commit module to Git after each Pass