Renato Golin via llvm-dev
2016-Jun-26 21:39 UTC
[llvm-dev] Git Move: GitHub+modules proposal
So, It's been a while and the GitHub thread is officially dead, so I'll propose a development methodology based on the feedback from that thread. This is not *my* view, but all that was discussed in the threads. My objective is to form an official proposal to use Git as our main repository, overcoming all the problems we currently have without creating many others. In the end, I think Tanya wanted to make a vote, and see how string the community feels about it. The vote should be "Should we move to GitHub as the proposed workflow, or should we try to find another solution away from our own hosting?". The important part is that we *must* move from the current schema. It does not scale and the administrative costs are not worth the trouble. So, if we don't go with GitHub, we have to find another way out. The proposal ========= Move to GitHub with N+1 projects: all current LLVM projects + the "llvm-projs" umbrella. The latter will have all other projects as "submodules" with the intent to: 1. Control the history via server hooks updating a unique and auto-increment identifier, which will apply to every commit on its submodules (ie, every other LLVM project). 2. Serve as a reference for releases, buildbots, etc., checking out only the necessary submodules for each. 3. Have additional logic to handle the additional complexity for mailing lists, tools, buildbots to deal with the umbrella project *only*. The existing LLVM projects (llvm, clang, compiler-rt, etc) will continue on their own repositories and be built locally just like they are today. You can check them out individually inside the final directory (llvm/tools/clang) or use symbolic links, just like we do today. You can also checkout "llvm-projs" and update only the required submodules, and use symbolic links. The llvm-projs umbrella will have its own versioning, and tools can report that ID as their "version", if they're not in a release branch. Release branches should be off of master and have a linear history, just like master, in the exact same way we do now with SVN. This will guarantee the umbrella project will be able to correctly auto-increment the ID and make sure current tools work as usual. We don't want private branches to end up in upstream LLVM (only upstream release branches), but that's perfectly natural in GitHub, where anyone can fork and implement their features and own releases off of the upstream official repositories. This can work as well for upstream development of "feature branches", where upstream developers contribute to both repositories, but keeping a specific feature in test separate. Merges will still have to be like it is today, one patch at a time, or risk reverting the whole merge window if the buildbots start breaking, which can be impossible if the window is large or two or more windows get committed at the same time. For "feature branches" we could use git-imerge, but that's for the future and not considered in the first stage of the move. Git Submodules --------------------- There were concerns is submodules would work with our flow, but the concerns were addressed by demonstrating that: 1. Submodules can work in an umbrella project, which controls the auto-increment ID 2. You can check-out individual modules or all, so work well for releases and buildbots 3. The history is shared with the root project, so git-bisect works out of the box The Alternatives --------------------- A few alternatives were proposed, but git submodules ended up being considered more thoroughly. Here are some of the reasons: * Google repo: It's an independent tool, which may suite us today, but not necessarily tomorrow. It may work well with the infrastructure that is already set for it on other projects (mostly Google projects like Android), but it does require some tooling (like git submodules). The point is, that it's much more likely to exist tooling for official git features than third-party projects, especially on Windows. * All-in-one: Proposals to have all projects inside one big repo were quickly dismissed due to the problems it creates to *users* of LLVM as a library, and to build systems (specific buildbots) that don't need to monitor all changes all the time. It simply won't scale. * Multiple clones: Allowing the projects to *exist* in different conditions (clang inside LLVM, or as a stand-alone library) will not scale. CMake will have to cope with all the different styles, it doesn't solve the unique auto-increment ID nor it helps downstream projects migrate to a common infrastructure. Questions ======= In order to make this proposal final, I still need a few questions to be answered. 1. How will the umbrella project's auto-increment hook work? Will it be one ID for every commit in every other repo? How will it know which one came first? Does it matter? If we have two commit "at the same time", do we create a priority list? Ex. LLVM commits get a lower ID than Clang ones, because it's more likely that an LLVM commit needs to go in first. 2. How do we update the commits mailing lists? Can we add a mail script to the auto-increment ID hook? Or should we have a cron job that picks the new commits every 5 minutes in a server somewhere and email them (in ID order) to the respective lists? Approval ====== Right now, we should not discuss if moving to Git or GitHub is a good idea or not. This is about the proposal itself. So, if you don't want Git or GitHub, wait for the voting to express that. If you do want Git and GitHub, than please keep your comments on topic, answer the questions and let's make sure we have a solid proposal that most Git proponents are happy with. If there is an alternative proposal (say, Google's repo), than this has to be separate, and well explained and accepted to be voted, too. Once we all agree in general, we should put it to vote. If there's an overwhelming majority (not sure how to measure this), and no critical problems (for example, learning new tools is less critical than breaking all buildbots), we should go with the move. For logistical reasons, if we do decide to move, I would like to do so before 3.10 / 4.0 branches. cheers, --renato
Anton Korobeynikov via llvm-dev
2016-Jun-26 22:02 UTC
[llvm-dev] Git Move: GitHub+modules proposal
> 1. Control the history via server hooks updating a unique and > auto-increment identifier, which will apply to every commit on its > submodules (ie, every other LLVM project).Does github allow this? IIRC their support for server-side hooks was very limited due to obvious reasons. And executing hooks e.g. on llvm.org seems very error-prone. -- With best regards, Anton Korobeynikov Department of Statistical Modelling, Saint Petersburg State University
Renato Golin via llvm-dev
2016-Jun-26 23:32 UTC
[llvm-dev] Git Move: GitHub+modules proposal
On 26 June 2016 at 23:02, Anton Korobeynikov <anton at korobeynikov.info> wrote:> Does github allow this? IIRC their support for server-side hooks was > very limited due to obvious reasons. And executing hooks e.g. on > llvm.org seems very error-prone.Someone suggested it was possible. I have sent them an email with a draft proposal and they said everything was fine, though they didn't confirm specific support. I can't see shy changing a local auto-increment ID on the repository itself would be a breach of security, so even if there are limitations, I think we can get this done. I'll send them another email to confirm this specific point. cheers, --renato
Alexey Denisov via llvm-dev
2016-Jun-27 16:04 UTC
[llvm-dev] Git Move: GitHub+modules proposal
Hello there, Renato, thank you for putting everything together. Talking about second question (commits mailing list): github provides set of various web hooks. I think here we are interested In 'push'es particularly. Besides that it has some CI related integrations: buildbots can update pull request status to show if tests are passing or not. The builds can be also triggered using web hooks (issue_comment with specific text). IIRC swift and rust do this (and more) in a very similar way. Cheers, Alex. [1] https://developer.github.com/webhooks <https://developer.github.com/webhooks/#payloads> On Sunday, 26 June 2016, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> So, > > It's been a while and the GitHub thread is officially dead, so I'll > propose a development methodology based on the feedback from that > thread. This is not *my* view, but all that was discussed in the > threads. > > My objective is to form an official proposal to use Git as our main > repository, overcoming all the problems we currently have without > creating many others. In the end, I think Tanya wanted to make a vote, > and see how string the community feels about it. The vote should be > "Should we move to GitHub as the proposed workflow, or should we try > to find another solution away from our own hosting?". > > The important part is that we *must* move from the current schema. It > does not scale and the administrative costs are not worth the trouble. > So, if we don't go with GitHub, we have to find another way out. > > The proposal > =========> > Move to GitHub with N+1 projects: all current LLVM projects + the > "llvm-projs" umbrella. The latter will have all other projects as > "submodules" with the intent to: > > 1. Control the history via server hooks updating a unique and > auto-increment identifier, which will apply to every commit on its > submodules (ie, every other LLVM project). > 2. Serve as a reference for releases, buildbots, etc., checking out > only the necessary submodules for each. > 3. Have additional logic to handle the additional complexity for > mailing lists, tools, buildbots to deal with the umbrella project > *only*. > > The existing LLVM projects (llvm, clang, compiler-rt, etc) will > continue on their own repositories and be built locally just like they > are today. You can check them out individually inside the final > directory (llvm/tools/clang) or use symbolic links, just like we do > today. You can also checkout "llvm-projs" and update only the required > submodules, and use symbolic links. > > The llvm-projs umbrella will have its own versioning, and tools can > report that ID as their "version", if they're not in a release branch. > > Release branches should be off of master and have a linear history, > just like master, in the exact same way we do now with SVN. This will > guarantee the umbrella project will be able to correctly > auto-increment the ID and make sure current tools work as usual. > > We don't want private branches to end up in upstream LLVM (only > upstream release branches), but that's perfectly natural in GitHub, > where anyone can fork and implement their features and own releases > off of the upstream official repositories. > > This can work as well for upstream development of "feature branches", > where upstream developers contribute to both repositories, but keeping > a specific feature in test separate. Merges will still have to be like > it is today, one patch at a time, or risk reverting the whole merge > window if the buildbots start breaking, which can be impossible if the > window is large or two or more windows get committed at the same time. > > For "feature branches" we could use git-imerge, but that's for the > future and not considered in the first stage of the move. > > Git Submodules > --------------------- > > There were concerns is submodules would work with our flow, but the > concerns were addressed by demonstrating that: > > 1. Submodules can work in an umbrella project, which controls the > auto-increment ID > 2. You can check-out individual modules or all, so work well for > releases and buildbots > 3. The history is shared with the root project, so git-bisect works > out of the box > > The Alternatives > --------------------- > > A few alternatives were proposed, but git submodules ended up being > considered more thoroughly. Here are some of the reasons: > > * Google repo: > > It's an independent tool, which may suite us today, but not > necessarily tomorrow. It may work well with the infrastructure that is > already set for it on other projects (mostly Google projects like > Android), but it does require some tooling (like git submodules). The > point is, that it's much more likely to exist tooling for official git > features than third-party projects, especially on Windows. > > * All-in-one: > > Proposals to have all projects inside one big repo were quickly > dismissed due to the problems it creates to *users* of LLVM as a > library, and to build systems (specific buildbots) that don't need to > monitor all changes all the time. It simply won't scale. > > * Multiple clones: > > Allowing the projects to *exist* in different conditions (clang inside > LLVM, or as a stand-alone library) will not scale. CMake will have to > cope with all the different styles, it doesn't solve the unique > auto-increment ID nor it helps downstream projects migrate to a common > infrastructure. > > Questions > =======> > In order to make this proposal final, I still need a few questions to > be answered. > > 1. How will the umbrella project's auto-increment hook work? > > Will it be one ID for every commit in every other repo? How will it > know which one came first? Does it matter? If we have two commit "at > the same time", do we create a priority list? > > Ex. LLVM commits get a lower ID than Clang ones, because it's more > likely that an LLVM commit needs to go in first. > > 2. How do we update the commits mailing lists? > > Can we add a mail script to the auto-increment ID hook? Or should we > have a cron job that picks the new commits every 5 minutes in a server > somewhere and email them (in ID order) to the respective lists? > > Approval > ======> > Right now, we should not discuss if moving to Git or GitHub is a good > idea or not. This is about the proposal itself. So, if you don't want > Git or GitHub, wait for the voting to express that. > > If you do want Git and GitHub, than please keep your comments on > topic, answer the questions and let's make sure we have a solid > proposal that most Git proponents are happy with. > > If there is an alternative proposal (say, Google's repo), than this > has to be separate, and well explained and accepted to be voted, too. > > Once we all agree in general, we should put it to vote. If there's an > overwhelming majority (not sure how to measure this), and no critical > problems (for example, learning new tools is less critical than > breaking all buildbots), we should go with the move. > > For logistical reasons, if we do decide to move, I would like to do so > before 3.10 / 4.0 branches. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <javascript:;> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160627/a6665c94/attachment.html>
Renato Golin via llvm-dev
2016-Jun-29 14:03 UTC
[llvm-dev] Git Move: GitHub+modules proposal
Hi all, A short summary: Takumi has done 90% of the work here: https://github.com/llvm-project/llvm-project-submodule and I've been talking to GitHub, and here are the answers to my questions:> 1. How will the umbrella project's auto-increment hook work?Since the umbrella project cannot see the sub-modules' commits without some form of update, there are two ways to do this: P. Per push: Every push (not commit) on all other repositories will trigger a hook that will hit a URL on our server, telling it to generate an incremental ID, update some umbrella's SeqID property (or even a commit SHA) and update the sub-modules. T. Time based: A cron job in our server would frequently pull from all repos and update ID/modules. Option P is less confusion and more fine grained, but if it misfire, we'll lose that push, and its commits will be bundled with the next push on that repo. Option T will invariably bundle things together, even from different repositories. The change that this will "correctly" merge an LLVM+Clang double-patch is not worth the trouble for the noise. For both of them, we need an external server, as there's no way to update a repository's property from another. Multiple commits eventually getting into a single umbrella revision can be innocuous for development, but they can make controlling the version for releases a bit more complicated. Though, it would also have no effect on back-ports, since they'll be done on Git and get their own SeqID. All in all, I'm not too worried about this...> 2. How do we update the commits mailing lists?This is, apparently, trivial in GitHub: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/#enabling-email-service-notifications-for-pushes-to-your-repository Any more comments before we put this proposal to vote? Is anyone going to propose an alternative Git solution? Or maybe an external, reliable and trustworthy SVN repository (ie. *not* SourceForge)? In the interests of brevity and peacefulness, we should aim to only have one vote, even if it has multiple choices, so if we have more proposals, please bring them up. cheers, --renato
David Chisnall via llvm-dev
2016-Jun-29 14:11 UTC
[llvm-dev] Git Move: GitHub+modules proposal
On 29 Jun 2016, at 15:03, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Any more comments before we put this proposal to vote?Thank you very much for driving all of this. I just have one quick question: Will existing clones from the LLVM git mirror and / or llvm-mirror on GitHub continue to work by simply switching the remote in the config? David
Tom Honermann via llvm-dev
2016-Jun-29 15:50 UTC
[llvm-dev] [cfe-dev] Git Move: GitHub+modules proposal
On 6/29/2016 10:03 AM, Renato Golin via cfe-dev wrote:> Since the umbrella project cannot see the sub-modules' commits without > some form of update, there are two ways to do this: > > P. Per push: Every push (not commit) on all other repositories will > trigger a hook that will hit a URL on our server, telling it to > generate an incremental ID, update some umbrella's SeqID property (or > even a commit SHA) and update the sub-modules.How would you coordinate dependent updates to the sub-modules? For example, in the case where someone makes a change to the LLVM sub-module that requires changes to the Clang sub-module? Would there be some way for a developer to push both sets of updates as an atomic update to the umbrella project? It probably doesn't matter often, as long as the updates to both sub-modules are pushed close together in time. Tom.
Mehdi Amini via llvm-dev
2016-Jun-30 04:03 UTC
[llvm-dev] Git Move: GitHub+modules proposal
> On Jun 29, 2016, at 10:03 AM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > A short summary: Takumi has done 90% of the work here: > > https://github.com/llvm-project/llvm-project-submodule > > and I've been talking to GitHub, and here are the answers to my questions: > > >> 1. How will the umbrella project's auto-increment hook work? > > Since the umbrella project cannot see the sub-modules' commits without > some form of update, there are two ways to do this: > > P. Per push: Every push (not commit) on all other repositories will > trigger a hook that will hit a URL on our server, telling it to > generate an incremental ID, update some umbrella's SeqID property (or > even a commit SHA) and update the sub-modules. > > T. Time based: A cron job in our server would frequently pull from all > repos and update ID/modules. > > Option P is less confusion and more fine grained, but if it misfire, > we'll lose that push, and its commits will be bundled with the next > push on that repo. > > Option T will invariably bundle things together, even from different > repositories. The change that this will "correctly" merge an > LLVM+Clang double-patch is not worth the trouble for the noise. > > For both of them, we need an external server, as there's no way to > update a repository's property from another.That makes it fragile, and that’s why I disagree with your “90% done” assessment. What if the service behing the hook is down for a few days? Who will maintain it? — Mehdi> > Multiple commits eventually getting into a single umbrella revision > can be innocuous for development, but they can make controlling the > version for releases a bit more complicated. Though, it would also > have no effect on back-ports, since they'll be done on Git and get > their own SeqID. > > All in all, I'm not too worried about this... > > >> 2. How do we update the commits mailing lists? > > This is, apparently, trivial in GitHub: > > https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/#enabling-email-service-notifications-for-pushes-to-your-repository > > Any more comments before we put this proposal to vote? > > Is anyone going to propose an alternative Git solution? > > Or maybe an external, reliable and trustworthy SVN repository (ie. > *not* SourceForge)? > > In the interests of brevity and peacefulness, we should aim to only > have one vote, even if it has multiple choices, so if we have more > proposals, please bring them up. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev