Renato Golin via llvm-dev
2016-Jul-22 12:56 UTC
[llvm-dev] [RFC] One or many git repositories?
On 22 July 2016 at 13:40, Daniel Sanders <Daniel.Sanders at imgtec.com> wrote:> I don't see it as a step backwards but rather as a way of making people > comfortable with the switch. I think opinions may gradually shift towards > a more conventional git model after the switch but that doesn't necessarily > detract from the value of a more svn-ish model if having one helps people > switch.The original idea was to change one thing at a time. SVN to Git, keep everything else the same. But that has proven harder than we imagined. So, maybe the best way forward is not to do one step at a time, but to understand where we are and what we need and take the "right" (tm) step forwards. Even if it requires multiple steps, we can combine them into larger, fewer steps.>> * public and downstream forks that *rely* on linear history > > Do you have an example in mind? I'd expect them to rely on each 'master' being > an improvement on 'master^'. I wouldn't expect them to be interested in how > 'master^' became 'master'.Paul Robinson was outlining some of the issues he had with git history. I don't know their setup, so I'll let him describe the issues (or he may have done so already in some thread, but I haven't read it all).> Assuming the goal is to preserve what we have rather than improve it, buildbot > will be fine without any changes (beyond switching the source steps from svn to > git of course) whichever model we pick. It would just check out the latest 'master' > on each build like it currently does for trunk.I meant Zorg and the like. Buildbot itself can handle Git, but we may have assumptions that the repos are linked and linear in the builders. But we have been discussing pre-commit testing for a while and it's clear that Buildbots, in the way they're setup now, are not the answer. For the sake of the argument, here is the list of things we found: * buildbots can have pre-commit testing via patch submission, but controlling security and load is not trivial if we want people to actually use it * buildbots tracking non-master branches have the load problems if we allow people to create branches, but not the security problems * having a mirror so that bots track that mirror would solve the security and load problems, but remove the ability for other people to use it. In essence, buildbots are single purpose and hard to configure (much of it needs master restart). OTOH, Jenkins can have configurable build scripts, with parameters and customisations, that allow for us to pick pull requests and build them, as they come. It also scales independently, per architecture, from the number of configurations, if you can use something like containers. So, in the long term, it's cheaper and more robust to maintain. However, it's a big change and will require another massive change in how we do things, and the repository is already big enough. cheers, --renato
Robinson, Paul via llvm-dev
2016-Jul-22 17:50 UTC
[llvm-dev] [RFC] One or many git repositories?
> >> * public and downstream forks that *rely* on linear history > > > > Do you have an example in mind? I'd expect them to rely on each 'master' > being > > an improvement on 'master^'. I wouldn't expect them to be interested in > how > > 'master^' became 'master'. > > Paul Robinson was outlining some of the issues he had with git > history. I don't know their setup, so I'll let him describe the issues > (or he may have done so already in some thread, but I haven't read it > all).Since you asked... The key point is that a (basically) linear upstream history makes it feasible to do bisection on a downstream branch that mixes in a pile of local changes, because the (basically) linear upstream history can be merged into the downstream branch commit-by-commit which retains the crucial linearity property. We have learned through experience that a bulk merge from upstream is a Bad Idea(tm). Suppose we have a test that fails; it does not repro with an upstream compiler; we try to bisect it; we discover that it started after a bulk merge of 1000 commits from upstream. But we can't bisect down the second-parent line of history, because that turns back into a straight upstream compiler and the problem fails to repro. If instead we had rolled the 1000 commits into our repo individually, we'd have a linear history mixing upstream with our stuff and we would be able to bisect naturally. But that relies on the *upstream* history being basically linear, because we can't pick apart an upstream commit that is itself a big merge of lots of commits. At least I don't know how. Now, I do say "basically" linear because the important thing is to have small increments of change each time. It doesn't mean we have to have everything be ff-only, and we can surely tolerate the merge commits that wrap individual commits in a pull-request kind of workflow. But merges that bring in long chains of commits are not what we want. --paulr
Daniel Sanders via llvm-dev
2016-Jul-25 13:55 UTC
[llvm-dev] [RFC] One or many git repositories?
> -----Original Message----- > From: Robinson, Paul [mailto:paul.robinson at sony.com] > Sent: 22 July 2016 18:50 > To: Renato Golin; Daniel Sanders > Cc: llvm-dev at lists.llvm.org > Subject: RE: [llvm-dev] [RFC] One or many git repositories? > > > >> * public and downstream forks that *rely* on linear history > > > > > > Do you have an example in mind? I'd expect them to rely on each > 'master' > > being > > > an improvement on 'master^'. I wouldn't expect them to be interested in > > how > > > 'master^' became 'master'. > > > > Paul Robinson was outlining some of the issues he had with git > > history. I don't know their setup, so I'll let him describe the issues > > (or he may have done so already in some thread, but I haven't read it > > all). > > Since you asked... > > The key point is that a (basically) linear upstream history makes it > feasible to do bisection on a downstream branch that mixes in a pile > of local changes, because the (basically) linear upstream history can > be merged into the downstream branch commit-by-commit which retains > the crucial linearity property. > > We have learned through experience that a bulk merge from upstream is > a Bad Idea(tm). Suppose we have a test that fails; it does not repro > with an upstream compiler; we try to bisect it; we discover that it > started after a bulk merge of 1000 commits from upstream. But we can't > bisect down the second-parent line of history, because that turns back > into a straight upstream compiler and the problem fails to repro. > > If instead we had rolled the 1000 commits into our repo individually, > we'd have a linear history mixing upstream with our stuff and we would > be able to bisect naturally. But that relies on the *upstream* history > being basically linear, because we can't pick apart an upstream commit > that is itself a big merge of lots of commits. At least I don't know how.I know of a way but it's not very nice. The gist of it is to checkout the downstream branch just before the bad merge and then merge the first 100 commits from upstream. If the result is good then merge the next 100, but if it's bad then 'git reset --hard' and merge 10 instead. You'll eventually find the commit that made it bad. Essentially, the idea is to make a throwaway branch that merges more frequently. I do something similar to rebase my work to master since gradually rebasing often causes all the conflicts to go away.> Now, I do say "basically" linear because the important thing is to have > small increments of change each time. It doesn't mean we have to have > everything be ff-only, and we can surely tolerate the merge commits that > wrap individual commits in a pull-request kind of workflow. But merges > that bring in long chains of commits are not what we want. > --paulrI agree that we should probably keep the history as close to linear as possible (mostly because I find the linux kernel's history difficult to follow) but it sounds like the issue is more about the content of the merge than the linearity of the history. A long-lived branch with a complex history sounds like it would be ok in your scenario if the eventual merge was a small change to master.