LLVM Community,> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041738.htmlThis was extraordinarily valuable in learning to understand the situation - thank you David Blaikie for pointing me to it. A few key snippets: "Because I optimize for the code reviewer, not the patch submitter," Chris Lattner "Forcing transitioning to git makes no sense for a lot of us - for example, we have lots of scripts that depend on svn revision numbers," Jason Kim "Let me say this again: We are not fundamentally changing the development policy around LLVM," Chris Lattner My interpretations, which later in this long email, I'll assume as premises to a recommended action: * Chris finds code reviewers to be exceptionally rare and the community's most valuable participants. My previous "spork" suggestion would be a decision made my maintainers, not influenced by patch-contributors, and would only happen if the maintainers felt the transition made it easier for them to review and/or commit patches. * Dropping SVN would be expensive for some. Instead of dropping SVN, it is more reasonable to make git the central repo and have SVN mirror git. * A linear history is highly valued by Chris and many members of the community. My input (or from my perspective, my output): In my humble opinion, there is a one biggest problem with git-svn and svn. It requires the maintainer to rebase before committing, and in git, this changes the the patch's unique ID. Changing the ID creates a serious problem, one which forces the private fork to make an early decision about contributing back to the community. The private fork must decide, "do we want this patch today or would we rather wait for it to come in through a "fast-forward" of the community's repository?" If we choose to accept the patch locally, we have another decision to make, "do we want to deal with merge conflicts after the patch makes it through the community's review process, or should we just keep it private and enjoy easy automatic merges until the community eventually finds the same bug and redundantly makes a similar fix?" I hope you see this as not a Good Thing for the community. The policy of rebasing provides private forks incentive *not* to contribute patches. Please oh please, do not reply saying "but that's just selfish." The point I am hoping to illustrate is only that this incentive exists, and it is a consequence of policy. However, one could argue that the same policy, to always rebase, provides incentive not to fork at all. That is, it is easier to contribute to the community than to make a private patch and risk merge conflicts. Indeed, but one problem, a fact of software: The private fork of any project will always and only exist as a mechanism to meet functional requirements and/or schedule that do not align with the official "mainline". More concretely, if I have an upcoming release planned and have a bug-fix that affects the correctness of the compiler, I will most certainly add it to my private fork and not wait on a community review. At this point, I actually have incentive to stop the code review process and hope the community never finds and fixes my bug. My life is easier when I choose not to contribute, and this a direct consequence of the policy decision to rebase instead of merge. But rebasing is fundamental in providing a linear history, right? I question the validity of this popular argument, and argue this is just a tooling issue. The very fact that a rebase can often be achieved automatically and without conflicts should send a strong signal that this may be true. At it turns out, the git object tree does encode a linear history. But this is not obvious! "git log" makes an awkward design decision in ordering commits by date. Instead, I think it should be ordered by merge, or specifically, a pre-order, depth-first traversal of the commit tree. I believe people care more about when the patch entered their own repository than when the author made the commit to his or hers. Proposal: a slow, multistep, backward-compatible transition to remove the disincentive to contribute patches from private forks: Step 1: Demonstrate "git log" or a similar tool can produce a linear history in the presence of merging. This may already be possible. Step 2: Swap the roles of git and svn. Make svn the mirror and git the central repository, and update the online documentation accordingly. In this step, do not change any policies and demand anyone with commit access to maintain a linear history. This restriction is necessary for the svn mirror, but aims to give everyone with svn-dependencies a strong hint that LLVM's use of svn is on its way out. Step 3: Once all svn automation dependencies have been dropped, discontinue the svn mirror. Relax the "always rebase" policy and ask code-owners to start preferring merges to rebasing. If the community is willing to make this transition, I commit to coordinating a worldwide decentralized party celebrating our successful move to decentralized version control. Thank you for your time, Greg Fitzgerald P.S. tl;dr, right?
On Fri, Nov 16, 2012 at 01:53:12PM -0800, Greg Fitzgerald wrote:> In my humble opinion, there is a one biggest problem with git-svn and > svn. It requires the maintainer to rebase before committing, and in > git, this changes the the patch's unique ID. Changing the ID creates > a serious problem, one which forces the private fork to make an early > decision about contributing back to the community. The private fork > must decide, "do we want this patch today or would we rather wait for > it to come in through a "fast-forward" of the community's repository?"I fully agree with your analysis and to me, it strongly sounds like "we should switch to git because git rebase doesn't work properly" or however else you want to call it. Joerg
Greg Fitzgerald <garious at gmail.com> writes:> In my humble opinion, there is a one biggest problem with git-svn and > svn. It requires the maintainer to rebase before committing, and in > git, this changes the the patch's unique ID.I didn't totally follow your argument so I'm sure I missed something. However, I don't think rebase is really the issue here. Linux kernel developers rebase all the time. It's required before merging to mainline. Of course at the point of the merge your feature branch should go away so the rebase is moot. I _think_ the "rebase" problem you describe is more to do with an innappropriate use of git - keeping branches around through multiple merges. I totally understand the attraction of doing that but with a rebase policy (common in the git world) it's going to cause issues of the type your described. Still, the merge often works just fine even in the presence of the rebase - git is often smart enough to recognize when a commit is already applied locally. The git-svn argument has been settled a while ago, so best not to stir that pot right now. But moving to git should _not_ mean we drop linear history. There is no reason it must. -David
David A. Green wrote:> However, I don't think rebase is really the issue here.Thanks David. After a bit of experimentation, I see you are quite right, my "rebase" problem is actually a "squash" problem. As it turns out, every time I've rebased, and then a later pull caused a manual merge, it was because I had also squashed in changes from a code review. So here's the problematic workflow: 1) fork mainline 2) add patch to fork 3) submit patch to mainline for review 4) patch the patch as part of review process 5) squash "patch of patch" into "patch" 6) mainline accepts squashed patch 7) merge into private fork, and kaboom! In this workflow, git can't possibly know whether I want the patch or patched-patch. They represent different solutions to the same problem. An automation-friendly workflow: 4.5) submit "patch of patch" to private fork. Then, when you merge, the history will include your original patch, the patched-patch, and the squashed patch, and a 'merge' patch that tells git that little mess is no problem. I retract my proposal and apologize for the noise. I guess I'll have to find some other reason to throw a big party. Suggestions welcome. :-) -Greg On Fri, Nov 16, 2012 at 2:30 PM, <dag at cray.com> wrote:> Greg Fitzgerald <garious at gmail.com> writes: > >> In my humble opinion, there is a one biggest problem with git-svn and >> svn. It requires the maintainer to rebase before committing, and in >> git, this changes the the patch's unique ID. > > I didn't totally follow your argument so I'm sure I missed something. > > However, I don't think rebase is really the issue here. Linux kernel > developers rebase all the time. It's required before merging to > mainline. Of course at the point of the merge your feature branch > should go away so the rebase is moot. > > I _think_ the "rebase" problem you describe is more to do with an > innappropriate use of git - keeping branches around through multiple > merges. I totally understand the attraction of doing that but with a > rebase policy (common in the git world) it's going to cause issues of > the type your described. Still, the merge often works just fine even in > the presence of the rebase - git is often smart enough to recognize when > a commit is already applied locally. > > The git-svn argument has been settled a while ago, so best not to stir > that pot right now. But moving to git should _not_ mean we drop linear > history. There is no reason it must. > > -David
On Fri, Nov 16, 2012 at 1:53 PM, Greg Fitzgerald <garious at gmail.com> wrote:> My interpretations, which later in this long email, I'll assume as > premises to a recommended action: > > * Chris finds code reviewers to be exceptionally rare and the > community's most valuable participants. My previous "spork" > suggestion would be a decision made my maintainers, not influenced by > patch-contributors, and would only happen if the maintainers felt the > transition made it easier for them to review and/or commit patches. > > * Dropping SVN would be expensive for some. Instead of dropping SVN, > it is more reasonable to make git the central repo and have SVN mirror > git. > > * A linear history is highly valued by Chris and many members of the community.You missed what is (IMO) the most important point: LLVM's development process and VCS optimize for active developers working in the open on mainline LLVM. They don't optimize for private forks or other development processes. This isn't an accident, and it helps incentivize members of the community to contribute early and in small, incremental patches.> Changing the ID creates > a serious problem, one which forces the private fork to make an early > decision about contributing back to the community.The fact that this (and all of the related and restated problems you and others have outlined since this email) is predicated on a private fork is why it isn't a priority for the process. If you instead make the early decision to contribute small, incremental patches, then this is not a problem. I'm not claiming it is not a problem as a hypothetical claim which I haven't tested. There are numerous groups (including mine) working with LLVM without any problem due to this. There are even several that *do* have some code which doesn't go upstream, and they also are not thwarted by this. <snip>> Proposal: a slow, multistep, backward-compatible transition to remove > the disincentive to contribute patches from private forks:I strongly doubt that this is the primary barrier for the contribution of such patches. Code review, the fact that these patches have accreted for long periods of time outside the view of the community, and a lack (or broken nature) of incremental development processes seem likely to cost much more.> P.S. tl;dr, right?Actually, yes. This entire conversation is too long to read. Many are claiming there is something wrong with the use of Subversion. And yet, despite these "problems", LLVM and Clang are among the fastest growing and most active open source projects I have had the pleasure of working on. Also, the most active members of the community (who should be hitting these problems the most often) are never the ones crying for change. I suggest folks instead work to demonstrate the scaling problems of Subversion by contributing ever more rapidly to LLVM. Perhaps we will discover the problem, but either way LLVM will improve by leaps and bounds. ;] Everybody wins.
Chandler Carruth <chandlerc at google.com> writes:> You missed what is (IMO) the most important point: LLVM's development > process and VCS optimize for active developers working in the open on > mainline LLVM. They don't optimize for private forks or other > development processes. This isn't an accident, and it helps > incentivize members of the community to contribute early and in small, > incremental patches.It is desirable to incentivize members to contribute early and in small, incremental patches? The impression I got from all those years using and (occasionally) watching the development of LLVM is that it is prone to incorporate non-functional or deficient code coming from half-done or tentative projects. That code is removed sooner or later, but it creates noise for both developers and users. Sometimes it is destabilizing too. Then, it is a bit hard to understand the preference some people show for reviewing the introduction of a feature as a series of patches introducing non-definitive code, distributed along a significant time and mixed in the same queue with dozens of unrelated patches, instead of a cohesive, well-defined and ready-to-run series of patches that clearly implements the advertised feature. The only explanation I can think of is a mind-state created by old development styles, when CVS-style tools (like svn) were the only option and accepting any sizable code contribution was a nightmare. [snip]