thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Renato Golin via llvm-dev

2016-Jul-22 12:56 UTC

[llvm-dev] [RFC] One or many git repositories?

On 22 July 2016 at 13:40, Daniel Sanders <Daniel.Sanders at imgtec.com>
wrote:> I don't see it as a step backwards but rather as a way of making people
> comfortable with the switch. I think opinions may gradually shift towards
> a more conventional git model after the switch but that doesn't
necessarily
> detract from the value of a more svn-ish model if having one helps people
> switch.
The original idea was to change one thing at a time. SVN to Git, keep
everything else the same.

But that has proven harder than we imagined. So, maybe the best way
forward is not to do one step at a time, but to understand where we
are and what we need and take the "right" (tm) step forwards. Even if
it requires multiple steps, we can combine them into larger, fewer
steps.

>> * public and downstream forks that *rely* on linear history
>
> Do you have an example in mind? I'd expect them to rely on each
'master' being
> an improvement on 'master^'. I wouldn't expect them to be
interested in how
> 'master^' became 'master'.
Paul Robinson was outlining some of the issues he had with git
history. I don't know their setup, so I'll let him describe the issues
(or he may have done so already in some thread, but I haven't read it
all).

> Assuming the goal is to preserve what we have rather than improve it,
buildbot
> will be fine without any changes (beyond switching the source steps from
svn to
> git of course) whichever model we pick. It would just check out the latest
'master'
> on each build like it currently does for trunk.
I meant Zorg and the like. Buildbot itself can handle Git, but we may
have assumptions that the repos are linked and linear in the builders.

But we have been discussing pre-commit testing for a while and it's
clear that Buildbots, in the way they're setup now, are not the
answer.

For the sake of the argument, here is the list of things we found:
 * buildbots can have pre-commit testing via patch submission, but
controlling security and load is not trivial if we want people to
actually use it
 * buildbots tracking non-master branches have the load problems if we
allow people to create branches, but not the security problems
 * having a mirror so that bots track that mirror would solve the
security and load problems, but remove the ability for other people to
use it.

In essence, buildbots are single purpose and hard to configure (much
of it needs master restart).

OTOH, Jenkins can have configurable build scripts, with parameters and
customisations, that allow for us to pick pull requests and build
them, as they come.

It also scales independently, per architecture, from the number of
configurations, if you can use something like containers. So, in the
long term, it's cheaper and more robust to maintain.

However, it's a big change and will require another massive change in
how we do things, and the repository is already big enough.

cheers,
--renato

Robinson, Paul via llvm-dev

2016-Jul-22 17:50 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> >> * public and downstream forks that *rely* on linear history
> >
> > Do you have an example in mind? I'd expect them to rely on each
'master'
> being
> > an improvement on 'master^'. I wouldn't expect them to be
interested in
> how
> > 'master^' became 'master'.
> 
> Paul Robinson was outlining some of the issues he had with git
> history. I don't know their setup, so I'll let him describe the
issues
> (or he may have done so already in some thread, but I haven't read it
> all).
Since you asked...

The key point is that a (basically) linear upstream history makes it
feasible to do bisection on a downstream branch that mixes in a pile
of local changes, because the (basically) linear upstream history can
be merged into the downstream branch commit-by-commit which retains
the crucial linearity property.

We have learned through experience that a bulk merge from upstream is
a Bad Idea(tm).  Suppose we have a test that fails; it does not repro
with an upstream compiler; we try to bisect it; we discover that it
started after a bulk merge of 1000 commits from upstream.  But we can't
bisect down the second-parent line of history, because that turns back
into a straight upstream compiler and the problem fails to repro.

If instead we had rolled the 1000 commits into our repo individually,
we'd have a linear history mixing upstream with our stuff and we would
be able to bisect naturally.  But that relies on the *upstream* history
being basically linear, because we can't pick apart an upstream commit
that is itself a big merge of lots of commits. At least I don't know how.

Now, I do say "basically" linear because the important thing is to
have
small increments of change each time.  It doesn't mean we have to have
everything be ff-only, and we can surely tolerate the merge commits that
wrap individual commits in a pull-request kind of workflow.  But merges
that bring in long chains of commits are not what we want.
--paulr

Daniel Sanders via llvm-dev

2016-Jul-25 13:55 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> -----Original Message-----
> From: Robinson, Paul [mailto:paul.robinson at sony.com]
> Sent: 22 July 2016 18:50
> To: Renato Golin; Daniel Sanders
> Cc: llvm-dev at lists.llvm.org
> Subject: RE: [llvm-dev] [RFC] One or many git repositories?
> 
> > >> * public and downstream forks that *rely* on linear history
> > >
> > > Do you have an example in mind? I'd expect them to rely on
each
> 'master'
> > being
> > > an improvement on 'master^'. I wouldn't expect them
to be interested in
> > how
> > > 'master^' became 'master'.
> >
> > Paul Robinson was outlining some of the issues he had with git
> > history. I don't know their setup, so I'll let him describe
the issues
> > (or he may have done so already in some thread, but I haven't read
it
> > all).
> 
> Since you asked...
> 
> The key point is that a (basically) linear upstream history makes it
> feasible to do bisection on a downstream branch that mixes in a pile
> of local changes, because the (basically) linear upstream history can
> be merged into the downstream branch commit-by-commit which retains
> the crucial linearity property.
> 
> We have learned through experience that a bulk merge from upstream is
> a Bad Idea(tm).  Suppose we have a test that fails; it does not repro
> with an upstream compiler; we try to bisect it; we discover that it
> started after a bulk merge of 1000 commits from upstream.  But we can't
> bisect down the second-parent line of history, because that turns back
> into a straight upstream compiler and the problem fails to repro.
> 
> If instead we had rolled the 1000 commits into our repo individually,
> we'd have a linear history mixing upstream with our stuff and we would
> be able to bisect naturally.  But that relies on the *upstream* history
> being basically linear, because we can't pick apart an upstream commit
> that is itself a big merge of lots of commits. At least I don't know
how.
I know of a way but it's not very nice. The gist of it is to checkout the
downstream branch just before the bad merge and then merge the first
100 commits from upstream. If the result is good then merge the next
100, but if it's bad then 'git reset --hard' and merge 10 instead.
You'll
eventually find the commit that made it bad. Essentially, the idea is to
make a throwaway branch that merges more frequently. I do something
similar to rebase my work to master since gradually rebasing often
causes all the conflicts to go away.
> Now, I do say "basically" linear because the important thing is
to have
> small increments of change each time.  It doesn't mean we have to have
> everything be ff-only, and we can surely tolerate the merge commits that
> wrap individual commits in a pull-request kind of workflow.  But merges
> that bring in long chains of commits are not what we want.
> --paulr
I agree that we should probably keep the history as close to linear as possible 
(mostly because I find the linux kernel's history difficult to follow) but
it sounds
like the issue is more about the content of the merge than the linearity of
the history. A long-lived branch with a complex history sounds like it would be
ok in your scenario if the eventual merge was a small change to master.

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?