David Greene via llvm-dev
2019-Feb-01  16:01 UTC
[llvm-dev] [cfe-dev] [Github] RFC: linear history vs merge commits
Oh, I'm completely in favor of making bad commits much less likely.  I
simply think there is a decent solution between "let everything in"
and
"don't let everything in unless its proven to work everywhere"
that gets
90% of the improvement.  The complexity of guaranteeing a buildable
branch is high.  If someone wants to take that on, great!  I just don't
think we should reasonably expect a group of volunteers to do it.  :)
                        -David
Jeremy Lakeman <Jeremy.Lakeman at gmail.com> writes:
> I realise that llvm trunk can move fairly quickly. 
> So my original, but brief, suggestion was to merge the current set of
> approved patches together rather than attempting them one at a time.
> Build on a set of fast smoke test bots. If something breaks, it should
> be possible to bisect it to reject a PR and make forward progress.
> Occasionally bisecting a large set of PR's should still be less bot
> time than attempting to build each of them individually.
> Blocking the PR's due to target specific and or slow bot failures
> would be a different question.
> You could probably do this with a linear history, so long as the final
> tree is the same as the merge commit, it should still build.
> I'm just fond of the idea of trying to prevent bad commits from ever
> being merged. Since they sometimes waste everyone's time.
>
> On Fri, 1 Feb 2019 at 04:02, David Greene <dag at cray.com> wrote:
>
>     <paul.robinson at sony.com> writes:
>     
>     > Systems that I've seen will funnel all submitted PRs into a
>     single queue
>     > which *does* guarantee that the trial builds are against HEAD
>     and there
>     > are no "later commits" that can screw it up. If the
trial build
>     passes,
>     > the PR goes in and becomes the new HEAD.
>     
>     The downside of a system like this is that when changes are going
>     in
>     rapidly, the queue grows very large and it takes forever to get
>     your
>     change merged. PR builds are serialized and if a "build"
means
>     "make
>     sure it builds on all the Buildbots" then it takes a very long
>     time
>     indeed.
>     
>     There are ways to parallelize builds by speculating that PRs will
>     build
>     cleanly but it gets pretty complicated quickly.
>     
>     > But this would be a radical redesign of the LLVM bot system, and
>     would
>     > have to wait until we're done with the GitHub migration and
can
>     spend
>     > a couple of years debating the use of PRs. :-)
>     
>     Heh. Fully guaranteeing buildability of a branch is not a trivial
>     task
>     and will take a LOT of thought and discussion.
>     
>     -David
Arthur O'Dwyer via llvm-dev
2019-Feb-01  17:41 UTC
[llvm-dev] [cfe-dev] [Github] RFC: linear history vs merge commits
My two cents on git:
- A linear history is important for "git bisect", "git
blame", and just
general developer sanity. Rebase should be preferred over merge.
- Rewriting the history of "master" (by force-pushing to
"master") should
never be done. GitHub Enterprise can enforce this by server-side hooks. I
know because I've done it for a previous employer's repositories
(although
I forget the details; it might have involved a customer support ticket).
- Murphy's Law: Force-pushes *will* happen, by accident, if they are not
prevented by technological means. (Again, I know from first-hand
experience.)
- Murphy's Law: Merge commits *will* happen, by accident, if they are not
prevented by technological means. (Again, I know from first-hand
experience.)  And, because rewriting history should be prevented, once the
history contains a "merge bubble," it will always contain it;
there's no
way to recover a completely bisectable/blameable linear history after a
merge has been committed.
- Full disclosure: "Revert" commits also wreak havoc on "git
bisect" and
"git blame", and LLVM does a lot of reverts.
Conclusion: LLVM should find a technological way to prevent force-pushes to
master (and to release branches) and to prevent multi-parent merge
commits.  If you maintain your own "git" server, then you can use
git's
server-side hooks to reject offending commits very easily.  Since your
"git" server is hosted on GitHub, you'll have to ask GitHub to
install the
appropriate server-side hooks.  I know GitHub can be configured to reject
force-pushes to {any, a specific} branch.  I strongly suspect that if *the
maintainers of LLVM* asked nicely, GitHub would also be able to reject
merges to {any, a specific} branch.
My two cents on Phab:
- It's a little nicer than GitHub PRs for commenting-on, but both are cool.
- GitHub PRs have Travis integration so the reviewer can see if a
particular patch is actually compileable at all, before spending time
looking at it. I wish Phab had this feature (or maybe it exists and LLVM
doesn't use it?).
My two cents on gating on buildbot:
- If buildbot runs only in master, then you wind up with a lot of revert
commits (LLVM's status quo). This seems to be working fine for LLVM in my
opinion; it is not pathological.
- If merging-to-master is *gated* on buildbot — that is, if clicking "OK to
merge" on a patch means "rebase it on master, run buildbot, then merge
to
master only if green" — then you must solve at least two hard problems.
Number one, buildbot becomes a single point of failure. If buildbot is
heavily loaded, patches merge slowly (but this can be solved by clever
engineering as already mentioned in this thread). If buildbot is *down or
unreachable*, then *no* changes can be made to master (because no change
can get through the gate). This requires a contingency plan. Number two, if
buildbot *goes permanently red* because of a change outside the LLVM
codebase — a dependency breaks, or something in buildbot's configuration is
changed and can't easily be changed back, or a non-deterministic unit test
was committed by accident — then again good patches will be rejected by the
gate. This is almost the same as hard problem number one, and it would be
reasonable to treat it as the same problem, but it is not *100% obvious*
that it should be treated as the same problem.  Because of these two hard
problems, I believe that gating on buildbot has a high likelihood of
becoming pathological, say, once a year or more.  I think it is a bad
tradeoff.
- I strongly believe that the sweet spot is what GitHub does: run buildbot
(Travis) on every pull request and display the results conspicuously, but
do not *gate* on those results. Cultivate a human culture that we do not
merge red PRs and we do not merge PRs whose buildbot run is still in
progress, any more than we merge PRs without code review approval. (Which
is to say, *sometimes* we do, and we certainly can in an emergency. But we
know we shouldn't!)
HTH,
Arthur
On Fri, Feb 1, 2019 at 11:02 AM David Greene via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Oh, I'm completely in favor of making bad commits much less likely.  I
> simply think there is a decent solution between "let everything
in" and
> "don't let everything in unless its proven to work
everywhere" that gets
> 90% of the improvement.  The complexity of guaranteeing a buildable
> branch is high.  If someone wants to take that on, great!  I just don't
> think we should reasonably expect a group of volunteers to do it.  :)
>
>                         -David
>
> Jeremy Lakeman <Jeremy.Lakeman at gmail.com> writes:
>
> > I realise that llvm trunk can move fairly quickly.
> > So my original, but brief, suggestion was to merge the current set of
> > approved patches together rather than attempting them one at a time.
> > Build on a set of fast smoke test bots. If something breaks, it should
> > be possible to bisect it to reject a PR and make forward progress.
> > Occasionally bisecting a large set of PR's should still be less
bot
> > time than attempting to build each of them individually.
> > Blocking the PR's due to target specific and or slow bot failures
> > would be a different question.
> > You could probably do this with a linear history, so long as the final
> > tree is the same as the merge commit, it should still build.
> > I'm just fond of the idea of trying to prevent bad commits from
ever
> > being merged. Since they sometimes waste everyone's time.
> >
> > On Fri, 1 Feb 2019 at 04:02, David Greene <dag at cray.com>
wrote:
> >
> >     <paul.robinson at sony.com> writes:
> >
> >     > Systems that I've seen will funnel all submitted PRs into
a
> >     single queue
> >     > which *does* guarantee that the trial builds are against HEAD
> >     and there
> >     > are no "later commits" that can screw it up. If the
trial build
> >     passes,
> >     > the PR goes in and becomes the new HEAD.
> >
> >     The downside of a system like this is that when changes are going
> >     in
> >     rapidly, the queue grows very large and it takes forever to get
> >     your
> >     change merged. PR builds are serialized and if a "build"
means
> >     "make
> >     sure it builds on all the Buildbots" then it takes a very
long
> >     time
> >     indeed.
> >
> >     There are ways to parallelize builds by speculating that PRs will
> >     build
> >     cleanly but it gets pretty complicated quickly.
> >
> >     > But this would be a radical redesign of the LLVM bot system,
and
> >     would
> >     > have to wait until we're done with the GitHub migration
and can
> >     spend
> >     > a couple of years debating the use of PRs. :-)
> >
> >     Heh. Fully guaranteeing buildability of a branch is not a trivial
> >     task
> >     and will take a LOT of thought and discussion.
> >
> >     -David
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/8f1b8a54/attachment.html>
Peter Wu via llvm-dev
2019-Feb-01  22:48 UTC
[llvm-dev] [cfe-dev] [Github] RFC: linear history vs merge commits
On Fri, Feb 01, 2019 at 12:41:05PM -0500, Arthur O'Dwyer via cfe-dev wrote:> I know GitHub can be configured to reject force-pushes to {any, a > specific} branch. I strongly suspect that if *the maintainers of > LLVM* asked nicely, GitHub would also be able to reject > merges to {any, a specific} branch.Anyone with admin permissions in a repo can add "branch protection rules" that prevent force pushes, there is no need to contact Github support for that.> - GitHub PRs have Travis integration so the reviewer can see if a > particular patch is actually compileable at all, before spending time > looking at it. I wish Phab had this feature (or maybe it exists and LLVM > doesn't use it?).This kind of pre-merge testing is very valuable, it could potentially prevent some unnecessary reverts by catching issues early. On caveat is that the test coverage would be limited or else the build times would be very long. There are quite some builders for various projects (llvm, cfe, compiler-rt, etc.) on a lot of different platforms and targets (Linux, macOS, Windows, Android, etc.). With limited resources, Arthur's suggestion seems workable to me: - Enable pre-commit testing of few configurations (in parallel). - Encourage developers to wait for tests to pass before pushing changes. - Do not block hard to avoid a situation where unrelated changes (commits or build environment) cause failures. I would also be in favor of linear history for reasons mentioned before (easier bisection, more readable logs). Though whatever choice happens (cherry-pick vs merge), I think it is important to keep the link between a change and the review. I have seen various strategies: - Phabricator: manually include a "Differential revision" link in the commit message. - Github (merge via web interface): automatically includes a reference to the "Pull Request". - Github (cherry-pick / rebase and merge with no merge commit): unfortunately loses the review information. - Gerrit: developers cannot merge directly, instead they use the web interface to submit a change. This will add a "Reviewed-on" link that references the review. (Used by Wireshark, Qt, boringssl, etc.) Projects that are use mailing lists to review patches (like Linux and QEMU) commonly include a Message-Id tag in the commit message that references the original mailing list discussion. The curl project also uses Github for reviews, but encourages developers with push permissions to manually fetch the commit, edit the commit message to reference the review and then push to the master branch (with no merge commits). Again, this is a manual process and new developers who are not familiar with this process accidentally create a merge commit anyway (or forget to reference the review). Ideally changes go through a single gatekeeper, a tool that recognizes comments to a merge request and subsequently tries to add extra info to a commit message (link to a review) and enforce a linear history (by cherry-picking changes or rebase + merge commits). This tool could be triggered by posting comments like "@name-of-the-bot merge this". (If necessary this could be combined with requiring tests to pass.) Such an extra indirection with a single gatekeeper could be a quite drastic change from the current development model though. It could reduce the number of broken builds and improve the quality of commit messages though. Just my two cents, hope it helps :) -- Kind regards, Peter Wu https://lekensteyn.nl