Björn Pettersson A via llvm-dev
2019-Mar-27 13:23 UTC
[llvm-dev] monorepo: bad performance when using gitk / git log
Hi! Anyone else experiencing performance problems when using the new monorepo? My experience is that performance of gitk (and git log) sometimes is really bad when working in the monorepo. I've mainly seen it when using gitk on specific files/directories, but since gitk seems to be using "git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- <file>" it is possible to observe the same thing when using git log. The problem can be seen when creating a brand new commit (with a new file): bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project bash-4.1$ cd llvm-project bash-4.1$ touch dummy bash-4.1$ git add dummy bash-4.1$ git commit -m "test" [master 6539b74dd0e] test 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 llvm/dummy bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- dummy > /dev/null 198.37user 0.40system 3:18.67elapsed 100%CPU (0avgtext+0avgdata 696456maxresident)k 0inputs+0outputs (0major+175765minor)pagefaults 0swaps But also when examining older files, here are some tests using the monorepo: bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project bash-4.1$ cd llvm-project bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null 5.15user 0.26system 0:05.42elapsed 99%CPU (0avgtext+0avgdata 220344maxresident)k 0inputs+0outputs (0major+56131minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- README.md > /dev/null 155.20user 0.34system 2:35.45elapsed 100%CPU (0avgtext+0avgdata 636744maxresident)k 0inputs+0outputs (0major+160862minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- llvm/CODE_OWNERS.TXT > /dev/null 55.48user 0.34system 0:55.80elapsed 100%CPU (0avgtext+0avgdata 690124maxresident)k 0inputs+0outputs (0major+174196minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null 192.97user 0.33system 3:13.19elapsed 100%CPU (0avgtext+0avgdata 696496maxresident)k 0inputs+0outputs (0major+176003minor)pagefaults 0swaps Same tests when using the old llvm repo (there is no README.md so I skipped that test here): bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null 2.72user 0.12system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 136628maxresident)k 0inputs+0outputs (0major+36354minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- CODE_OWNERS.TXT > /dev/null 2.74user 0.19system 0:02.93elapsed 99%CPU (0avgtext+0avgdata 344756maxresident)k 0inputs+0outputs (0major+88975minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- test/CodeGen/Generic/bswap.ll > /dev/null 3.76user 0.19system 0:03.96elapsed 99%CPU (0avgtext+0avgdata 380416maxresident)k 0inputs+0outputs (0major+98218minor)pagefaults 0swaps The example with test/CodeGen/Generic/bswap.ll indicates that it can take 193/4=48 times longer time to open gitk (or run git log) on a file when using the monorepo(!?!?). I'm not so familiar with the inner details of git. Could this be a bad repack of the llvm-projects repo or something? Or is it just that we now squeeze so many commits into the same repo that I should expect the performance to be even worse in the future? The figures above is when using git 2.14.1, but I've also tried 2.20.0 with similar results. Regards, Björn -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/1131a8e4/attachment.html>
David Greene via llvm-dev
2019-Mar-27 16:20 UTC
[llvm-dev] monorepo: bad performance when using gitk / git log
Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org> writes:> I’m not so familiar with the inner details of git. Could this be a bad > repack of the llvm-projects repo or something? > > Or is it just that we now squeeze so many commits into the same repo > that I should expect the performance to be even worse in the future?All of your log commands log the entire history of the repository. Since the monorepo contains the history of all projects, it's a lot more than the individual project repositories used to contain. I don't know what gitk does in terms of logging. If it insists on logging the entire history, then yes, it's going to be slower with the monorepo. Personally, I rarely have the need to log further back than a couple of years of history and the monorepo has been all right for that. On the rare occasion I need to look back much further, the extra time hasn't be burdensome. But then I never use gitk. -David
Alexander Benikowski via llvm-dev
2019-Mar-27 16:44 UTC
[llvm-dev] monorepo: bad performance when using gitk / git log
I use GitExtension and have no performance issues. However i noticed GitExt will only visualize the last 3 years on the overview. When looking at a specific files history or blame, it will show the entire history of it(last 10 years) Am Mi., 27. März 2019 um 17:20 Uhr schrieb David Greene via llvm-dev < llvm-dev at lists.llvm.org>:> Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > I’m not so familiar with the inner details of git. Could this be a bad > > repack of the llvm-projects repo or something? > > > > Or is it just that we now squeeze so many commits into the same repo > > that I should expect the performance to be even worse in the future? > > All of your log commands log the entire history of the repository. > Since the monorepo contains the history of all projects, it's a lot more > than the individual project repositories used to contain. > > I don't know what gitk does in terms of logging. If it insists on > logging the entire history, then yes, it's going to be slower with the > monorepo. > > Personally, I rarely have the need to log further back than a couple of > years of history and the monorepo has been all right for that. On the > rare occasion I need to look back much further, the extra time hasn't be > burdensome. But then I never use gitk. > > -David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/a77d45d4/attachment.html>
James Y Knight via llvm-dev
2019-Mar-27 19:37 UTC
[llvm-dev] monorepo: bad performance when using gitk / git log
The problem here seems to be due to the combination of specifying --parents, and specifying a pathname to filter by. I can certainly reproduce a _remarkable_ slowness with that combination from git.... On my machine: $ time git log --parents --oneline origin/master > /dev/null real 0m4.001s $ time git log origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null real 0m5.332s $ time git log --parents --oneline origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null real 2m48.944s That said, I use gitk frequently, and had not noticed performance issues. But, I'd never tried invoking it with a path on the command-line, only with ref names, so it's not hitting the bad case. Nor have I noted issues with git log, but again, I'd never have run it with --parents, so I don't hit this bad case. Maybe worth reporting as a possible bug to git? Surely whatever algorithm it's using shouldn't be _this_ slow. On Wed, Mar 27, 2019 at 9:23 AM Björn Pettersson A via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi! > > > > Anyone else experiencing performance problems when using the new monorepo? > > > > My experience is that performance of gitk (and git log) sometimes is > really bad when working in the monorepo. > > > > I’ve mainly seen it when using gitk on specific files/directories, but > since gitk seems to be using “git log --no-color -z --pretty=raw > --show-notes --parents --boundary HEAD -- <file>” it is possible to observe > the same thing when using git log. > > > > > > The problem can be seen when creating a brand new commit (with a new file): > > > > bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project > > bash-4.1$ cd llvm-project > > bash-4.1$ touch dummy > > bash-4.1$ git add dummy > > bash-4.1$ git commit -m "test" > > [master 6539b74dd0e] test > > 1 file changed, 0 insertions(+), 0 deletions(-) > > create mode 100644 llvm/dummy > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- dummy > /dev/null > > 198.37user 0.40system 3:18.67elapsed 100%CPU (0avgtext+0avgdata > 696456maxresident)k > > 0inputs+0outputs (0major+175765minor)pagefaults 0swaps > > > > > > But also when examining older files, here are some tests using the > monorepo: > > > > bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project > > bash-4.1$ cd llvm-project > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD > /dev/null > > 5.15user 0.26system 0:05.42elapsed 99%CPU (0avgtext+0avgdata > 220344maxresident)k > > 0inputs+0outputs (0major+56131minor)pagefaults 0swaps > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- README.md > /dev/null > > 155.20user 0.34system 2:35.45elapsed 100%CPU (0avgtext+0avgdata > 636744maxresident)k > > 0inputs+0outputs (0major+160862minor)pagefaults 0swaps > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- llvm/CODE_OWNERS.TXT > /dev/null > > 55.48user 0.34system 0:55.80elapsed 100%CPU (0avgtext+0avgdata > 690124maxresident)k > > 0inputs+0outputs (0major+174196minor)pagefaults 0swaps > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null > > 192.97user 0.33system 3:13.19elapsed 100%CPU (0avgtext+0avgdata > 696496maxresident)k > > 0inputs+0outputs (0major+176003minor)pagefaults 0swaps > > > > > > Same tests when using the old llvm repo (there is no README.md so I > skipped that test here): > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD > /dev/null > > 2.72user 0.12system 0:02.84elapsed 99%CPU (0avgtext+0avgdata > 136628maxresident)k > > 0inputs+0outputs (0major+36354minor)pagefaults 0swaps > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- CODE_OWNERS.TXT > /dev/null > > 2.74user 0.19system 0:02.93elapsed 99%CPU (0avgtext+0avgdata > 344756maxresident)k > > 0inputs+0outputs (0major+88975minor)pagefaults 0swaps > > > > bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes > --parents --boundary HEAD -- test/CodeGen/Generic/bswap.ll > /dev/null > > 3.76user 0.19system 0:03.96elapsed 99%CPU (0avgtext+0avgdata > 380416maxresident)k > > 0inputs+0outputs (0major+98218minor)pagefaults 0swaps > > > > > > The example with test/CodeGen/Generic/bswap.ll indicates that it can take > 193/4=48 times longer time to open gitk (or run git log) on a file when > using the monorepo(!?!?). > > > > I’m not so familiar with the inner details of git. Could this be a bad > repack of the llvm-projects repo or something? > > Or is it just that we now squeeze so many commits into the same repo that > I should expect the performance to be even worse in the future? > > > > The figures above is when using git 2.14.1, but I’ve also tried 2.20.0 > with similar results. > > > > Regards, > > Björn > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/a5497eb9/attachment.html>
Björn Pettersson A via llvm-dev
2019-Apr-02 15:16 UTC
[llvm-dev] monorepo: bad performance when using gitk / git log
I asked about this on git at vger.kernel.org: https://public-inbox.org/git/20190402132756.GB13141 at sigill.intra.peff.net/T/#m1fd5da534d39f967a8ce8b3361bc2e00b9214f31 I’ve already got an answer that we seem to be unlucky with some access patterns when doing “git log –parents” in the monorepo, and that we hit some quadratic analysis of the commit history. Hopefully something they can fix (Jeff King already had some ideas). From: James Y Knight <jyknight at google.com> Sent: den 27 mars 2019 20:38 To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] monorepo: bad performance when using gitk / git log The problem here seems to be due to the combination of specifying --parents, and specifying a pathname to filter by. I can certainly reproduce a _remarkable_ slowness with that combination from git.... On my machine: $ time git log --parents --oneline origin/master > /dev/null real 0m4.001s $ time git log origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null real 0m5.332s $ time git log --parents --oneline origin/master -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null real 2m48.944s That said, I use gitk frequently, and had not noticed performance issues. But, I'd never tried invoking it with a path on the command-line, only with ref names, so it's not hitting the bad case. Nor have I noted issues with git log, but again, I'd never have run it with --parents, so I don't hit this bad case. Maybe worth reporting as a possible bug to git? Surely whatever algorithm it's using shouldn't be _this_ slow. On Wed, Mar 27, 2019 at 9:23 AM Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi! Anyone else experiencing performance problems when using the new monorepo? My experience is that performance of gitk (and git log) sometimes is really bad when working in the monorepo. I’ve mainly seen it when using gitk on specific files/directories, but since gitk seems to be using “git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- <file>” it is possible to observe the same thing when using git log. The problem can be seen when creating a brand new commit (with a new file): bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project bash-4.1$ cd llvm-project bash-4.1$ touch dummy bash-4.1$ git add dummy bash-4.1$ git commit -m "test" [master 6539b74dd0e] test 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 llvm/dummy bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- dummy > /dev/null 198.37user 0.40system 3:18.67elapsed 100%CPU (0avgtext+0avgdata 696456maxresident)k 0inputs+0outputs (0major+175765minor)pagefaults 0swaps But also when examining older files, here are some tests using the monorepo: bash-4.1$ git clone https://github.com/llvm/llvm-project.git llvm-project bash-4.1$ cd llvm-project bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null 5.15user 0.26system 0:05.42elapsed 99%CPU (0avgtext+0avgdata 220344maxresident)k 0inputs+0outputs (0major+56131minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- README.md > /dev/null 155.20user 0.34system 2:35.45elapsed 100%CPU (0avgtext+0avgdata 636744maxresident)k 0inputs+0outputs (0major+160862minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- llvm/CODE_OWNERS.TXT > /dev/null 55.48user 0.34system 0:55.80elapsed 100%CPU (0avgtext+0avgdata 690124maxresident)k 0inputs+0outputs (0major+174196minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- llvm/test/CodeGen/Generic/bswap.ll > /dev/null 192.97user 0.33system 3:13.19elapsed 100%CPU (0avgtext+0avgdata 696496maxresident)k 0inputs+0outputs (0major+176003minor)pagefaults 0swaps Same tests when using the old llvm repo (there is no README.md so I skipped that test here): bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD > /dev/null 2.72user 0.12system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 136628maxresident)k 0inputs+0outputs (0major+36354minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- CODE_OWNERS.TXT > /dev/null 2.74user 0.19system 0:02.93elapsed 99%CPU (0avgtext+0avgdata 344756maxresident)k 0inputs+0outputs (0major+88975minor)pagefaults 0swaps bash-4.1$ /usr/bin/time git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- test/CodeGen/Generic/bswap.ll > /dev/null 3.76user 0.19system 0:03.96elapsed 99%CPU (0avgtext+0avgdata 380416maxresident)k 0inputs+0outputs (0major+98218minor)pagefaults 0swaps The example with test/CodeGen/Generic/bswap.ll indicates that it can take 193/4=48 times longer time to open gitk (or run git log) on a file when using the monorepo(!?!?). I’m not so familiar with the inner details of git. Could this be a bad repack of the llvm-projects repo or something? Or is it just that we now squeeze so many commits into the same repo that I should expect the performance to be even worse in the future? The figures above is when using git 2.14.1, but I’ve also tried 2.20.0 with similar results. Regards, Björn _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190402/e7d67854/attachment.html>