Justin Lebar via llvm-dev
2016-Jul-27 17:21 UTC
[llvm-dev] [RFC] One or many git repositories?
Thanks for your thoughts, Chris.> As supporting evidence of this, I was discussing this thread yesterday around the office yesterday and had quite a few people responding something along the lines of “they’re proposing what?”.I hope they'll join us in this thread. Ultimately a survey is going to be strongly biased in favor of "don't change anything". There is a strong psychological bias to weight losses more than gains, so if one doesn't engage with the issue, it's only natural to conclude "keep it as similar as possible to what it is today -- that is safe." But that line of thinking does not necessarily lead us to the best outcome. We've heard in thread from a lot of developers about how a monorepo would improve their workflow. I would love to hear from some developers who are actually affected in the way you describe, rather than just considering the hypothetical. My expectation is that the effect of the monorepo on said developers would be relatively small -- we're talking about 1gb of disk space. I understand that there's a "yuck" factor to this, but inasmuch as there aren't other concrete effects, this is just change aversion. And essentially all of the other effects of the monorepo can be hidden via sparse checkouts, as we've discussed. Maybe I am wrong. But I don't think we're going to get to the bottom of it without actually engaging with people who are actually affected in the way you posit.> While admittedly you do get a linear history with using the mono-repository, that isn’t the only way to solve the problem, and I don’t really think that the benefit (not needing to write some tooling) justifies the increased burden applied to contributors that don’t use the full LLVM family of projects.I think the trade-off you're considering here (cost to developers who use llvm plus a version-locked subrepo vs. cost to developers who don't want an llvm clone) is the right one. But as someone who has extensively used git submodules and repo (a wrapper script), I strongly disagree with the judgement that a monorepo would not be a significant improvement. Our primary disagreement, I think, is over how much cost there is to "writing some tooling". To me, this is a significant barrier standing in the way of developer productivity. Here at Google I did a quick survey, and more than half of us don't have scripts of the sort that Justin Bogner described. We are all just floundering around rebasing clang and llvm until it compiles. It *sucks*. I suggest that saying that all of these developers are "doing it wrong" is not helpful. Not everyone has the git and python/bash chops to write the necessary scripts. Not everyone has the personality to obsessively script around stuff, or the desire to maintain said scripts. Not everyone works on llvm/clang so much that it's worth adopting a special-snowflake workflow. And some of us -- myself included -- have extensive git scripts which work with the standard git workflow but would be completely broken by adding a custom level of indirection around git. When put this way, maybe it's clear that it's actually a niche set of people for whom "script around the brokenness" is a good solution. As I've said a bunch of times above, we have to weigh a cost paid by all of us every time we type a command that starts with "git" -- something we do tens or hundreds of times a day -- versus the one-time cost of asking people to download 1gb of data. On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I’m just now catching up on this massive thread after being on vacation last > week, and I have a few thoughts I’d like to share. > > First and foremost please don’t consider lack of dissent on the thread as > presence of consensus. The various git-related threads on LLVM-dev lately > have been so active and contentious that I think a lot of people are zoning > out on the conversations. As supporting evidence of this, I was discussing > this thread yesterday around the office yesterday and had quite a few people > responding something along the lines of “they’re proposing what?”. > > I think it would be great for us to have several different proposals for how > the git-transition could work, and have a survey to get people’s opinions. I > know this has been discussed repeatedly, and I want to put in my vote in > favor of having a survey that takes into account multiple different > approaches. > > WRT the actual proposal in this thread, I’m strongly opposed to a > mono-repository. While I understand the argument that the full clone’s cost > on disk space is minimal compared to an LLVM object directory, what about > for contributors that contribute to the smaller runtimes projects but *not* > to LLVM or Clang. A contributor that only contributes to libcxx or > compiler-rt being forced to do a full clone of all the LLVM projects in > order to push a patch kinda sucks. > > I want to point out a few workflows people may not be considering. > > Clang can be built against an installed LLVM. I know this workflow is used > by some people because I’ve broken it in the past and had to fix it. With a > mono-repo this workflow gets a bit more complicated because you’d need to do > sparse checkouts, and it probably means we should just nuke the workflow > entirely because there is no real value added by having it. > > Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the > common use case maintaining sparse repository mirrors would limit impact of > this on users, should any GCC user want to contribute to Compiler-RT, you’re > forcing them to clone a much larger repository than necessary. > > The same problem with Compiler-RT’s sanitizers also applies to libcxx, > libcxxabi, libunwind, and potentially any other runtime library projects > that we may create in the future. > > Beyond all that I want to point out that the git multi-repository story is > basically the same thing we have today with SVN except for the absence of a > monotonically increasing number that corresponds across repositories. While > admittedly you do get a linear history with using the mono-repository, that > isn’t the only way to solve the problem, and I don’t really think that the > benefit (not needing to write some tooling) justifies the increased burden > applied to contributors that don’t use the full LLVM family of projects. > > I think we have some pretty strong evidence in the form of the github fork > counts (https://github.com/llvm-mirror/) that most people aren’t using all > of the LLVM projects. In fact, by that evidence Clang (the second most > popular project) is forked less than 2/3 as many times as LLVM. > > -Chris > > > On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Even if it were possible, I would still keep my upstream checkout > separate just as a safety measure, to keep from sending private stuff > upstream by accident. > > > Just FYI, this is our (Azul's) workflow as well, and for similar > reasons. > > > Same here. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Justin Bogner via llvm-dev
2016-Jul-27 18:30 UTC
[llvm-dev] [RFC] One or many git repositories?
Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:> Thanks for your thoughts, Chris. > >> As supporting evidence of this, I was discussing this thread >> yesterday around the office yesterday and had quite a few people >> responding something along the lines of “they’re proposing what?”. > > I hope they'll join us in this thread. > > Ultimately a survey is going to be strongly biased in favor of "don't > change anything". There is a strong psychological bias to weight > losses more than gains, so if one doesn't engage with the issue, it's > only natural to conclude "keep it as similar as possible to what it is > today -- that is safe." But that line of thinking does not > necessarily lead us to the best outcome. > > We've heard in thread from a lot of developers about how a monorepo > would improve their workflow. I would love to hear from some > developers who are actually affected in the way you describe, rather > than just considering the hypothetical. > > My expectation is that the effect of the monorepo on said developers > would be relatively small -- we're talking about 1gb of disk space. I > understand that there's a "yuck" factor to this, but inasmuch as there > aren't other concrete effects, this is just change aversion. And > essentially all of the other effects of the monorepo can be hidden via > sparse checkouts, as we've discussed. > > Maybe I am wrong. But I don't think we're going to get to the bottom > of it without actually engaging with people who are actually affected > in the way you posit.Well, I'm one of those people. I'm still convinced that, for me, switching to a monorepo is a few weeks or maybe a couple of months of disruption to my life[1] and we end up in a state that isn't any better, just arbitrarily different. [1]: re-cloning tens of repos across a few machines and migrating branches from them, adjusting my workflow to deal with the new layout, blowing away all of my existing build trees, arguing about how to handle legacy branches that now need to merge between a multi-repo layout and a monorepo layout, asking people to update bot configs, figuring out how the downstream clones of clang-without-llvm that I have to deal with will work, etc).>> While admittedly you do get a linear history with using the >> mono-repository, that isn’t the only way to solve the problem, and I >> don’t really think that the benefit (not needing to write some >> tooling) justifies the increased burden applied to contributors that >> don’t use the full LLVM family of projects. > > I think the trade-off you're considering here (cost to developers who > use llvm plus a version-locked subrepo vs. cost to developers who > don't want an llvm clone) is the right one. But as someone who has > extensively used git submodules and repo (a wrapper script), I > strongly disagree with the judgement that a monorepo would not be a > significant improvement. > > Our primary disagreement, I think, is over how much cost there is to > "writing some tooling". To me, this is a significant barrier standing > in the way of developer productivity. Here at Google I did a quick > survey, and more than half of us don't have scripts of the sort that > Justin Bogner described. We are all just floundering around rebasing > clang and llvm until it compiles. It *sucks*.Note that the only tooling I have is a single script that just loops over all of the git directories and runs a single git commit. Here's what I actually use, but it started as literally a single loop over the results of $(find . -name .git), which is good enough. -------------- next part -------------- A non-text attachment was scrubbed... Name: git-llvm Type: text/x-shell Size: 494 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/fc9dc51c/attachment.bin> -------------- next part --------------> I suggest that saying that all of these developers are "doing it > wrong" is not helpful. Not everyone has the git and python/bash chops > to write the necessary scripts. Not everyone has the personality to > obsessively script around stuff, or the desire to maintain said > scripts. Not everyone works on llvm/clang so much that it's worth > adopting a special-snowflake workflow. And some of us -- myself > included -- have extensive git scripts which work with the standard > git workflow but would be completely broken by adding a custom level > of indirection around git. > > When put this way, maybe it's clear that it's actually a niche set of > people for whom "script around the brokenness" is a good solution. > > As I've said a bunch of times above, we have to weigh a cost paid by > all of us every time we type a command that starts with "git" -- > something we do tens or hundreds of times a day -- versus the one-time > cost of asking people to download 1gb of data.It's important to take into account that the cost of migrating to a radically different layout has really been glossed over in this thread. It's certainly a one-time cost, but it's *a lot* more than "downloading 1gb of data". Every downstream project will need to change workflows, and every downstream developer will need to adjust how they do things. I expect everyone to lose at least a day of work from this. Maybe that's worth it for you to up your productivity on daily tasks, I don't know, but please take this into consideration. Anyways, I need to drop out of this thread again. I've decided I can live with whatever we do, and I don't want to spend any more time on this.
Jonathan Roelofs via llvm-dev
2016-Jul-27 19:18 UTC
[llvm-dev] [RFC] One or many git repositories?
On 7/27/16 12:30 PM, Justin Bogner via llvm-dev wrote:> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: >> >> Maybe I am wrong. But I don't think we're going to get to the bottom >> of it without actually engaging with people who are actually affected >> in the way you posit. > > Well, I'm one of those people. I'm still convinced that, for me, > switching to a monorepo is a few weeks or maybe a couple of months of > disruption to my life[1] and we end up in a state that isn't any better, > just arbitrarily different. > > [1]: re-cloning tens of repos across a few machines and migrating > branches from them, adjusting my workflow to deal with the new > layout, blowing away all of my existing build trees, arguing about > how to handle legacy branches that now need to merge between a > multi-repo layout and a monorepo layout, asking people to update > bot configs, figuring out how the downstream clones of > clang-without-llvm that I have to deal with will work, etc). >Would this be mitigated for you if the existing git mirrors continued to work as-is, with their history at some point coming from the monorepo instead of from git-svn? (i.e. use a monorepo, but re-construct the commits to the individual git repositories with some cron job, or hooks, or otherwise) Jon -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor Embedded
Sean Silva via llvm-dev
2016-Jul-28 02:29 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 27, 2016 at 10:21 AM, Justin Lebar via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thanks for your thoughts, Chris. > > > As supporting evidence of this, I was discussing this thread yesterday > around the office yesterday and had quite a few people responding something > along the lines of “they’re proposing what?”. > > I hope they'll join us in this thread. > > Ultimately a survey is going to be strongly biased in favor of "don't > change anything". There is a strong psychological bias to weight > losses more than gains, so if one doesn't engage with the issue, it's > only natural to conclude "keep it as similar as possible to what it is > today -- that is safe." But that line of thinking does not > necessarily lead us to the best outcome. > > We've heard in thread from a lot of developers about how a monorepo > would improve their workflow.I'd just like to point out that having LLVM upstream use a monorepo would probably not hugely help my workflow (or it might hinder it for some period of time, as I adapt). This is even though earlier in this thread I was saying that a monorepo is actually a big improvements for certain things. My main use of a monorepo is for internal development, where we merge private patches daily and a unified history to bisect is important. The current situation is that llvm-project is available, and if you need its benefits, you can use it. That does not necessarily mean that those benefits are needed / are an improvement for everyday development. The same goes for vendors' private workflow repositories. They can base them off of llvm-project. What upstream does is pretty much irrelevant. Our official toolchain repository for PS4 does not use llvm-project, but instead a subtree merged monorepo with the upstream git repos in its history, so there is a layer of indirection here anyway. Also, moving to git as the repository of record is also something that, even as someone who uses exclusively git, I am not necessarily in favor of. In my personal experience (and I'm sure others disagree!), it seems like the current thing is working well enough. So, broadly speaking, if it came to a vote for "should LLVM move to git as the repository of record" I would probably vote no. For example, my personal preference, all things said and done, is that if we move to git, we just continue the history of the existing git mirrors (e.g. http://llvm.org/git/llvm.git, http://llvm.org/git/clang.git, etc.) and set up some sort of automation that adds a thing in the same format as git-svn does with a monotonically increasing "fake" SVN number. Each developer with commit access would be able to push to a particular special branch on the server, and then automation cherry picks the commit into the repo adding the "fake git-svn" info. The server-side automation is the only thing that can commit to the official `master` branch of each git repo. The result is that: - people have to habituate themselves to `git push master:myusername/master` instead of `git svn dcommit` - nothing else changes Making a community-wide change to workflow is no small task. Nor should we expect that, even with our best intentions and best efforts, that any proposed change will actually end up being an overall improvement for the community. I think Phabricator is a good example where the community has largely changed workflow: compatibility with the existing workflow is paramount to gaining traction without making a *very* expensive change. Even then, we have not "made the leap" to obsolete the existing workflow -- there is no need. Another example is svn -> git-svn. At some point in the past, everybody was using SVN. That number has dwindled enough that now we can consider making git the repository of record. My personal preference would be for somebody to stand up, implement an alternative that is compatible with the existing thing (e.g. people can use it and it will commit into the current SVN on their behalf, or whatever). If it turns out to be better, then people will gradually move to it and eventually the fact that it commits to SVN will just be of historical interest. My suspicion is that if somebody sets up a monorepo with perfect bidirectional integration with the existing workflow and add it as an option to the getting started page (as with the git-svn workflow), people will generally not move to it. The discussion of whether a monorepo (or any of the other changes discussed in this thread) is worth it is essentially equivalent to whether people would move of their own accord to the alternative if it had perfect bidirectional integration with the current state of things. Do we really think we can answer determine this with a discussion or survey? (I consider new contributors coming into the project and just going with the new stuff as a form of "moving" to the alternative) That is, I really think that the onus should be on the people proposing the change to make it work in practice. Discussion of alternatives should be done with the aim of exploring the design space so that somebody can step up and implement things, rather than trying to make a final decision or come to a proposal to be voted on. One key factor that can doom *any* proposal is that it is unimplementable, nobody steps up to implement it, or is poorly implemented. In that sense, a proposal in the absence of an implementation is only a partial indication of what we'll actually get. -- Sean Silva> I would love to hear from some > developers who are actually affected in the way you describe, rather > than just considering the hypothetical. > > My expectation is that the effect of the monorepo on said developers > would be relatively small -- we're talking about 1gb of disk space. I > understand that there's a "yuck" factor to this, but inasmuch as there > aren't other concrete effects, this is just change aversion. And > essentially all of the other effects of the monorepo can be hidden via > sparse checkouts, as we've discussed. > > Maybe I am wrong. But I don't think we're going to get to the bottom > of it without actually engaging with people who are actually affected > in the way you posit. > > > While admittedly you do get a linear history with using the > mono-repository, that isn’t the only way to solve the problem, and I don’t > really think that the benefit (not needing to write some tooling) justifies > the increased burden applied to contributors that don’t use the full LLVM > family of projects. > > I think the trade-off you're considering here (cost to developers who > use llvm plus a version-locked subrepo vs. cost to developers who > don't want an llvm clone) is the right one. But as someone who has > extensively used git submodules and repo (a wrapper script), I > strongly disagree with the judgement that a monorepo would not be a > significant improvement. > > Our primary disagreement, I think, is over how much cost there is to > "writing some tooling". To me, this is a significant barrier standing > in the way of developer productivity. Here at Google I did a quick > survey, and more than half of us don't have scripts of the sort that > Justin Bogner described. We are all just floundering around rebasing > clang and llvm until it compiles. It *sucks*. > > I suggest that saying that all of these developers are "doing it > wrong" is not helpful. Not everyone has the git and python/bash chops > to write the necessary scripts. Not everyone has the personality to > obsessively script around stuff, or the desire to maintain said > scripts. Not everyone works on llvm/clang so much that it's worth > adopting a special-snowflake workflow. And some of us -- myself > included -- have extensive git scripts which work with the standard > git workflow but would be completely broken by adding a custom level > of indirection around git. > > When put this way, maybe it's clear that it's actually a niche set of > people for whom "script around the brokenness" is a good solution. > > As I've said a bunch of times above, we have to weigh a cost paid by > all of us every time we type a command that starts with "git" -- > something we do tens or hundreds of times a day -- versus the one-time > cost of asking people to download 1gb of data. > > On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > I’m just now catching up on this massive thread after being on vacation > last > > week, and I have a few thoughts I’d like to share. > > > > First and foremost please don’t consider lack of dissent on the thread as > > presence of consensus. The various git-related threads on LLVM-dev lately > > have been so active and contentious that I think a lot of people are > zoning > > out on the conversations. As supporting evidence of this, I was > discussing > > this thread yesterday around the office yesterday and had quite a few > people > > responding something along the lines of “they’re proposing what?”. > > > > I think it would be great for us to have several different proposals for > how > > the git-transition could work, and have a survey to get people’s > opinions. I > > know this has been discussed repeatedly, and I want to put in my vote in > > favor of having a survey that takes into account multiple different > > approaches. > > > > WRT the actual proposal in this thread, I’m strongly opposed to a > > mono-repository. While I understand the argument that the full clone’s > cost > > on disk space is minimal compared to an LLVM object directory, what about > > for contributors that contribute to the smaller runtimes projects but > *not* > > to LLVM or Clang. A contributor that only contributes to libcxx or > > compiler-rt being forced to do a full clone of all the LLVM projects in > > order to push a patch kinda sucks. > > > > I want to point out a few workflows people may not be considering. > > > > Clang can be built against an installed LLVM. I know this workflow is > used > > by some people because I’ve broken it in the past and had to fix it. > With a > > mono-repo this workflow gets a bit more complicated because you’d need > to do > > sparse checkouts, and it probably means we should just nuke the workflow > > entirely because there is no real value added by having it. > > > > Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for > the > > common use case maintaining sparse repository mirrors would limit impact > of > > this on users, should any GCC user want to contribute to Compiler-RT, > you’re > > forcing them to clone a much larger repository than necessary. > > > > The same problem with Compiler-RT’s sanitizers also applies to libcxx, > > libcxxabi, libunwind, and potentially any other runtime library projects > > that we may create in the future. > > > > Beyond all that I want to point out that the git multi-repository story > is > > basically the same thing we have today with SVN except for the absence > of a > > monotonically increasing number that corresponds across repositories. > While > > admittedly you do get a linear history with using the mono-repository, > that > > isn’t the only way to solve the problem, and I don’t really think that > the > > benefit (not needing to write some tooling) justifies the increased > burden > > applied to contributors that don’t use the full LLVM family of projects. > > > > I think we have some pretty strong evidence in the form of the github > fork > > counts (https://github.com/llvm-mirror/) that most people aren’t using > all > > of the LLVM projects. In fact, by that evidence Clang (the second most > > popular project) is forked less than 2/3 as many times as LLVM. > > > > -Chris > > > > > > On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > > On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > > Even if it were possible, I would still keep my upstream checkout > > separate just as a safety measure, to keep from sending private stuff > > upstream by accident. > > > > > > Just FYI, this is our (Azul's) workflow as well, and for similar > > reasons. > > > > > > Same here. > > > > cheers, > > --renato > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/6692ace8/attachment-0001.html>