thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

David Chisnall via llvm-dev

2016-Jul-25 11:40 UTC

[llvm-dev] [RFC] One or many git repositories?

On 25 Jul 2016, at 12:09, Renato Golin <renato.golin at linaro.org>
wrote:> 
> A pull request is a request for *comment*, and we currently express
> that via a Phab review. If I'm comfortable with the patch, I commit
> directly, if I want others opinions, I create a review. If we move as
> you propose, I'd lose the second option.
A pull request is a mechanism.  The community can decide what the policy is.
> I also don't think pre-commit review committers should be just relying
> on the fast bots passing. There are many other issues like code
> quality, formatting, tests, etc. that need to be done properly.
I agree, but it’s much harder to automate that (though having clang-format
automatically applied might be nice).
> If we go this route, we'll be effectively flattening out everyone that
> has commit access into one big pool, and that would be a massive
> change in how our community operates (for better or worse).
That’s not what I’m suggesting.  Pull requests provide a convenient staging
mechanism.  On the back end, they’re simply git branches in a namespace that
won’t be automatically fetched by clients that do a clone.
> Unless you're also proposing that only the "core team" (code
owners,
> high committers, etc) get commit access in the new style.
I’d suggest that no one should have direct push access (or, at least, the no one
should *use* the direct access - presumably some people in what GitHub calls the
admin groups would retain access for dealing with problems).
>> The workflow would be exactly the same for committers and
non-committers, except that committers would be able to tag their own pull
requests as ready to merge.
> 
> Can this be controlled via a server hook? One that would be supported by
GitHub?
GitHub supports everything that is needed to implement this suggestion.  From
the developers’ perspective, the workflow is:

- User sends pull request.

- User from a list of people that correspond to our current set of committers
labels the request as ‘approved’ (GitHub labels can only be applied by members
of the organisation).

In the case of committers who are committing to areas where they are comfortable
with post-commit review, these two users can be the same person, so you’d send a
pull request and add the tag yourself.

The buildbot master then:

- Collects all of the pull requests (git fetch)

- Uses the GitHub API to filter the ones that have been approved

- Merges these, in turn, to its local master branch

- Sends those changes to the builders

- If they succeed, pushes master to GitHub

In idle time (when not busy with approved patches), it should also try other
pull requests and add a label saying that they’ve passed the tests.

There are some obvious refinements to this, for example only kicking the fast
buildbots when there are several patches outstanding and then running the slower
ones with the combined master, then backtracking and applying them one at a time
if this fails.

Longer term, it would be nice to integrate things like running the static
analyser into this workflow.

I don’t have particularly strong feelings about this, I just wanted to point out
that it’s possible with GitHub and a little bit of client code talking to the
GitHub APIs.  There’s a slightly more complex workflow for people to push code,
but in exchange for that we’d never be in a state where anything pushed to
master introduces regressions for which tests exist.

David

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3719 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/dc741896/attachment.bin>

Renato Golin via llvm-dev

2016-Jul-25 12:34 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 25 July 2016 at 12:40, David Chisnall <david.chisnall at cl.cam.ac.uk>
wrote:> I’d suggest that no one should have direct push access
That's what I meant by "massive change". :)

I don't think that's a bad model, but it's a very different model
than
we currently have, so would need more discussion on that side before
we can even think about the technical details.

> - User sends pull request.
> - User from a list of people that correspond to our current set of
committers labels the request as ‘approved’ (GitHub labels can only be applied
by members of the organisation).
> The buildbot master then:
> - Collects all of the pull requests (git fetch)
> - Uses the GitHub API to filter the ones that have been approved
> - Merges these, in turn, to its local master branch
> - Sends those changes to the builders
> - If they succeed, pushes master to GitHub
This sounds a lot like the Gerrit model.

Can buildbots do that? How much code would be needed?

Wouldn't it be easier to do this in Jenkins (push based)?

> In idle time (when not busy with approved patches), it should also try
other pull requests and add a label saying that they’ve passed the tests.
This sounds like an added complexity to the buildbot code. (priority
queues?) Not impossible, maybe not even hard, but yet another small
thing to get right...

> There are some obvious refinements to this, for example only kicking the
fast buildbots when there are several patches outstanding and then running the
slower ones with the combined master, then backtracking and applying them one at
a time if this fails.
If this is going to work, we'll need to run a single pull request on
top of the correct "master" for every change.

IIUC, GitHub will "trigger" a CI change for every *update* on every
pull request, too. So we'll end up with a lot more half-baked patches
than usually go through the current buildbots.

This will mean more builders per bot (I'm expecting at least 3) and a
silent bots (we can use 8014 for that).

> I don’t have particularly strong feelings about this, I just wanted to
point out that it’s possible with GitHub and a little bit of client code talking
to the GitHub APIs.  There’s a slightly more complex workflow for people to push
code, but in exchange for that we’d never be in a state where anything pushed to
master introduces regressions for which tests exist.
I'm a big advocate of pre-commit tests and the Gerrit model, I think
it makes developers more aware of their own failures (which bbots do)
without upsetting the rest of the community (which bbots don't).

For instance, in this model, I only need to review code that has been
marked by the pre-commit bots as "good", so it also saves a lot of
time from code reviewers, in addition to buildbot owners, and it will
make the releases, back-porting, bisecting, *much* easier by the
massive reduction in silly reverts.

If I understand correctly, the model you propose is:

1. Admins: people with direct commit access used only for fixing the
repo, fork releases, etc.
2. Pre-review committers: people with the right to create a
"auto-merge" tag on pull reviews, but not commit directly.
3. Post-review committers: just pull request access, need to wait for
someone in (2) to approve
4. Everyone else: just pull request access, need to wait for someone
in (2) to approve

A few problems with this:

A. How do we divide the current (fuzzy) "groups" into those well
defined groups?
B. (2) folks will have the choice to get reviews, (3) won't have the
choice to quickly fix their commits (which some do today).
C. The migration between (3) to (2) is not defined *at all* today, and
this is where the most radical cut will be.

Post-review commit access is, as I understand, a social recognition
for those striving to become more active committers in the future. By
flattening (3) and (4), any threshold we pick will be "wrong" in some
form, and may discourage people from contributing changes, or striving
to become pre-review committers.

I'm not saying you're wrong, or that I don't agree with you, but
this
won't be an easy sell...

cheers,
--renato

David Chisnall via llvm-dev

2016-Jul-25 13:29 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 25 Jul 2016, at 13:34, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 25 July 2016 at 12:40, David Chisnall <david.chisnall at
cl.cam.ac.uk> wrote:
>> I’d suggest that no one should have direct push access
> 
> That's what I meant by "massive change". :)
> 
> I don't think that's a bad model, but it's a very different
model than
> we currently have, so would need more discussion on that side before
> we can even think about the technical details.
> 
> 
>> - User sends pull request.
>> - User from a list of people that correspond to our current set of
committers labels the request as ‘approved’ (GitHub labels can only be applied
by members of the organisation).
>> The buildbot master then:
>> - Collects all of the pull requests (git fetch)
>> - Uses the GitHub API to filter the ones that have been approved
>> - Merges these, in turn, to its local master branch
>> - Sends those changes to the builders
>> - If they succeed, pushes master to GitHub
> 
> This sounds a lot like the Gerrit model.
> 
> Can buildbots do that? How much code would be needed?
> 
> Wouldn't it be easier to do this in Jenkins (push based)?
These are both good questions for someone more familiar with the Buildbot
infrastructure than me.
> 
> 
>> In idle time (when not busy with approved patches), it should also try
other pull requests and add a label saying that they’ve passed the tests.
> 
> This sounds like an added complexity to the buildbot code. (priority
> queues?) Not impossible, maybe not even hard, but yet another small
> thing to get right…
Not hard - there are two queues (stuff that’s ready to merge, stuff awaiting
review) and you only start on the second queue when the first one is empty.
>> There are some obvious refinements to this, for example only kicking
the fast buildbots when there are several patches outstanding and then running
the slower ones with the combined master, then backtracking and applying them
one at a time if this fails.
> 
> If this is going to work, we'll need to run a single pull request on
> top of the correct "master" for every change.
> 
> IIUC, GitHub will "trigger" a CI change for every *update* on
every
> pull request, too. So we'll end up with a lot more half-baked patches
> than usually go through the current buildbots.
Yes, GitHub can do this on a push model, but the first filter would be ‘does
this have the correct label’, and if not then it goes on the low-priority pile.
> 
> This will mean more builders per bot (I'm expecting at least 3) and a
> silent bots (we can use 8014 for that).
> 
> 
>> I don’t have particularly strong feelings about this, I just wanted to
point out that it’s possible with GitHub and a little bit of client code talking
to the GitHub APIs.  There’s a slightly more complex workflow for people to push
code, but in exchange for that we’d never be in a state where anything pushed to
master introduces regressions for which tests exist.
> 
> I'm a big advocate of pre-commit tests and the Gerrit model, I think
> it makes developers more aware of their own failures (which bbots do)
> without upsetting the rest of the community (which bbots don't).
> 
> For instance, in this model, I only need to review code that has been
> marked by the pre-commit bots as "good", so it also saves a lot
of
> time from code reviewers, in addition to buildbot owners, and it will
> make the releases, back-porting, bisecting, *much* easier by the
> massive reduction in silly reverts.
These are the advantages.  The disadvantages are:

- Someone needs to write the Buildbot integration code

- Committers need to adopt a slightly more complex workflow (though not more
complex than going via phabricator)
> If I understand correctly, the model you propose is:
> 
> 1. Admins: people with direct commit access used only for fixing the
> repo, fork releases, etc.
Yup.
> 2. Pre-review committers: people with the right to create a
> "auto-merge" tag on pull reviews, but not commit directly.
Yup.
> 3. Post-review committers: just pull request access, need to wait for
> someone in (2) to approve
Yup.
> 4. Everyone else: just pull request access, need to wait for someone
> in (2) to approve
There’s not really a difference between 3 and 4 in this model.
> 
> A few problems with this:
> 
> A. How do we divide the current (fuzzy) "groups" into those well
defined groups?
Group 4 is easy, that’s everyone that we don’t actively put in a group.

I have no idea about group 1.  This set roughly corresponds today to the set of
people who are Chris, and should probably be half a dozen people for a bit more
redundancy (admins are also the people who can add people to the GitHub group,
create new repos, and so on).  It might be something overseen by the foundation.

For groups 2/3 as they map to current committers, the same way that we do now:

Give people membership in the group and let them know what the community expects
them to have reviewed pre-commit.  If they start committing things that should
have been reviewed without proper review and they don’t change their behaviour,
then move them back into group 4.
> B. (2) folks will have the choice to get reviews, (3) won't have the
> choice to quickly fix their commits (which some do today).
> C. The migration between (3) to (2) is not defined *at all* today, and
> this is where the most radical cut will be.
If the difference is purely cultural and not mechanically enforced, then (3)
will be able to fix their commits by labelling them as approved.

I don’t see a need for a mechanical separation between groups 2 and 3.  In
FreeBSD we have a concept of mentorship, where new committers must have
‘Approved by: {name} (mentor)’ in all of their commits and their mentor is
responsible for ensuring that they are following the expectations of the
community.  This is similarly not enforced by the revision control system
(though, again, people who don’t do it may find that they can no longer commit).
> Post-review commit access is, as I understand, a social recognition
> for those striving to become more active committers in the future. By
> flattening (3) and (4), any threshold we pick will be "wrong" in
some
> form, and may discourage people from contributing changes, or striving
> to become pre-review committers.
I’m not proposing flattening 3 and 4.
> I'm not saying you're wrong, or that I don't agree with you,
but this
> won't be an easy sell…
It’s also worth pointing out that this is in no way required to happen at the
same time as the GitHub migration (or at all), so should not delay that.  Just
something for people to think about...

David

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?