Mehdi Amini via llvm-dev
2016-Jul-29 04:11 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Jul 28, 2016, at 7:32 PM, Lang Hames <lhames at gmail.com> wrote: > > Hi Mehdi, > > This a narrow view IMO: the criteria #1 Chris mentioned to include projects in the monorepo was " must be tightly coupled to specific versions”. > It means that even with the test suite (and possibly some runtime) out of the monorepo, all the software that is tightly coupled would be in the monorepo, and that alone would be enough to alleviate the needs for (most of the) tooling/infrastructure. > > Fair point, but coupling isn't binary: even the test-suite is coupled to the versions of clang that can compile it, it's just relatively loose compared to LLVM/clang. > > I find it a fairly different scale to clone 3 repos on a bot versus having to keep multiple repositories *in sync* (i.e. cross repository synchronization). > > I think it depends on the nature of the tools that are required. Bots are relatively simple since they're only reading from the repos, not writing. They're not the only use-case I have in mind though. > > Different problems, different tools… I’m against artificially creating “problems" for upstream developers only because the tooling to solve them works for downstream users. > > I don't think these are actually different problems: I would guess that the problem of collecting some subset of the LLVM projects into a usable source-tree is shared by many downstream users, and it's common in my workflows (e.g. just checking out llvm and lld). It will have to be solved by someone, since downstream users need it even if we adopted a mono-repo.What I meant by “different problem" is that “downstream users” for instance don’t need to commit, that makes their problem/workflow quite different from an upstream developer (for instance it is fairly easy to maintain a read-only view of the existing individual git repo currently on llvm.org <http://llvm.org/>). Also while we can create scripts for (almost) every scenarios, one have to put in balance the script that is run once at checkout time vs the set of scripts required for day-to-day development: for example what if I want to switch my tree to my work-in-progress branch where I changed a LLVM library to use the new "Error checking” API and adapted all the other projects that using this API, and then I want to rebase this branch on master for every projects so that I can get ready to push. My impression is that a single repo makes this use-case trivial with a standard set of git commands. I believe a repo like https://github.com/llvm-project/llvm-project <https://github.com/llvm-project/llvm-project> solves most of the workflows (both for developers and downstream users) with little to no tooling required. Providing a read-only export from this repo is also fairly easy, and can be done asynchronously in a deterministic way (contrary to the submodule umbrella update that requires some server-side hooks). The only two unanswered drawbacks that I got from this thread are: 1) A "major drawback of a single huge repo IMHO: In git, to push a commit you must have it at the remote HEAD. If HEAD has changed you need to rebase/rebuild/retest/retry. With a single monster repo, a commit to 'lld' means I have to go through this pain to put in my 'clang' tweak.”, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html> 2) Chris Bienemann: What about a *contributor* only wanting to contribute to compiler-rt? He has to pay the price of cloning the full repo. http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html> I haven’t seen a good answer for 1), and for 2) it’ll come down to a balance of “how much a burden it is in 2016 to download 500MB once to contribute to a project”, and how many people (and number of commits) does this represent?> A shared solution (if it's possible) may be an opportunity to both share infrastructure with downstream projects and adopt a more modular approach to the LLVM project sources.I had the impression that the current situation is that sources are “modular”, and that’s painful when you work cross-projects (luckily I have been focused on LLVM itself lately…). On the opposite of a “more modular approach to the LLVM project sources”, I’d favor a goal toward "a more coherent approach to maintaining the LLVM projects sources”.> I'm staying deliberately light on specifics here. As I said I don't have strong feelings yet -- I'm still digesting all the ideas in this thread.The other thread on the submodules proposal driven by Renato has also a lot of ideas/workflow descriptions if you’re looking for inspiration. — Mehdi> To the extent that I have a gut feeling though, this feels like it introduces very strong coupling between LLVM project sources (more than is required by the projects APIs) for the sake of convenience, so I'm trying to consider the alternatives. > > Cheers, > Lang. > > > On Thu, Jul 28, 2016 at 6:41 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > >> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Aaaand I'm (mostly) caught up. Phew. >> >> FWIW Chris B is right: I had been put off commenting on this thread by the length, and the number of git discussions that have come before this. He convinced me to make the effort to put my 2 cents in though - thanks Chris. >> >> So - for my use-case I don't have strong feelings one way or the other* <https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something about the discussion so far strikes me as dissonant: If we're going to break out some sub-projects (the test-suite for licensing reasons, the runtimes for modularity) then it's not really a mono-repo any more. It's a multi-repo where we've collapsed some (but not all) of the existing repos. > > This a narrow view IMO: the criteria #1 Chris mentioned to include projects in the monorepo was " must be tightly coupled to specific versions”. > It means that even with the test suite (and possibly some runtime) out of the monorepo, all the software that is tightly coupled would be in the monorepo, and that alone would be enough to alleviate the needs for (most of the) tooling/infrastructure. > > >> To the extent that we have to build tooling to support multiple-repos (auto-mergers for test bots, command line utils for devs who want the main repo plus tests plus ...), could we re-use that to keep the existing modular project setup? > > I find it a fairly different scale to clone 3 repos on a bot versus having to keep multiple repositories *in sync* (i.e. cross repository synchronization). > > >> This might be a fairly low-benefit proposition if the tools we develop were only usable by in-tree projects, but there are many other users of LLVM (Swift leaps to mind since I'm at Apple, but there are many others) who might appreciate the ability to use LLVM-provided tools to pick-and-mix LLVM projects into their repos. Otherwise, every downstream user will have to roll some version of these tools themselves. > > Different problems, different tools… I’m against artificially creating “problems" for upstream developers only because the tooling to solve them works for downstream users. > > — > Mehdi > > >> >> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com <mailto:beanz at apple.com>> wrote: >> > It is worth pointing out the Jenkins job that runs that is a playground I setup for myself. It is nowhere near production ready, and it will fail frequently as I iterate messing around with it. >> >> Sure, I think that's implied. >> >> cheers, >> --renato >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/e201be49/attachment.html>
David Chisnall via llvm-dev
2016-Jul-29 08:47 UTC
[llvm-dev] [RFC] One or many git repositories?
<html><head></head><body class="ApplePlainTextBody" dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev@lists.llvm.org> wrote:<br><blockquote type="cite"><br>What I meant by “different problem" is that “downstream users” for instance don’t need to commit, that makes their problem/workflow quite different from an upstream developer (for instance it is fairly easy to maintain a read-only view of the existing individual git repo currently on llvm.org).<br></blockquote><br>I’m not convinced by this distinction. A lot of downstream developers need to patch LLVM and we benefit when they upstream their changes. We should not make it harder for them to do this. To give a couple of example downstream projects, both FreeBSD and Swift have patches on LLVM / Clang in their versions that they gradually filter upstream. Both projects have LLVM committers among their members. If the workflow that we recommend for them makes upstreaming easy then they benefit (maintaining a fork is effort) and LLVM benefits (having people provide bug fixes makes our code better).<br><br>The workflow that we want to recommend to these people is:<br><br>- Fork the repo that you’re interested in from the LLVM GitHub organisation<br>- Make your changes<br>- Send pull requests for anything that you think is of interest to upstream<br><br>This makes the barrier to entry for sending code back upstream *much* lower than it currently is, to the benefit of all. If the alternative is:<br><br>- Fork a read-only repo that you’re interested in from the LLVM GitHub organisation<br>- Make your changes<br>- Fork a different repo from the LLVM GitHub organisation<br>- Run a script to filter some of your changes into that one<br>- Send a pull request from that<br>- Deal with merging between the two yourself<br><br>I strongly suspect that we’ll get a lot fewer useful contributions from downstream. Or downstream people will just work on the monorepo and eat the cost.<br><br>If someone is working on a downstream LLVM project and becoming familiar with our codebase, then we want them to be subtly nudging their workflow so that they eventually become LLVM contributors without noticing!<br><br>David<br><br><br></body></html>
David Chisnall via llvm-dev
2016-Jul-29 09:19 UTC
[llvm-dev] [RFC] One or many git repositories?
On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > What I meant by “different problem" is that “downstream users” for instance don’t need to commit, that makes their problem/workflow quite different from an upstream developer (for instance it is fairly easy to maintain a read-only view of the existing individual git repo currently on llvm.org).I’m not convinced by this distinction. A lot of downstream developers need to patch LLVM and we benefit when they upstream their changes. We should not make it harder for them to do this. To give a couple of example downstream projects, both FreeBSD and Swift have patches on LLVM / Clang in their versions that they gradually filter upstream. Both projects have LLVM committers among their members. If the workflow that we recommend for them makes upstreaming easy then they benefit (maintaining a fork is effort) and LLVM benefits (having people provide bug fixes makes our code better). The workflow that we want to recommend to these people is: - Fork the repo that you’re interested in from the LLVM GitHub organisation - Make your changes - Send pull requests for anything that you think is of interest to upstream This makes the barrier to entry for sending code back upstream *much* lower than it currently is, to the benefit of all. If the alternative is: - Fork a read-only repo that you’re interested in from the LLVM GitHub organisation - Make your changes - Fork a different repo from the LLVM GitHub organisation - Run a script to filter some of your changes into that one - Send a pull request from that - Deal with merging between the two yourself I strongly suspect that we’ll get a lot fewer useful contributions from downstream. Or downstream people will just work on the monorepo and eat the cost. If someone is working on a downstream LLVM project and becoming familiar with our codebase, then we want them to be subtly nudging their workflow so that they eventually become LLVM contributors without noticing! David -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3719 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/058c9465/attachment.bin>
Dean Michael Berris via llvm-dev
2016-Jul-29 11:35 UTC
[llvm-dev] [RFC] One or many git repositories?
> On 29 Jul 2016, at 19:19, David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> What I meant by “different problem" is that “downstream users” for instance don’t need to commit, that makes their problem/workflow quite different from an upstream developer (for instance it is fairly easy to maintain a read-only view of the existing individual git repo currently on llvm.org). > > I’m not convinced by this distinction. A lot of downstream developers need to patch LLVM and we benefit when they upstream their changes. We should not make it harder for them to do this. To give a couple of example downstream projects, both FreeBSD and Swift have patches on LLVM / Clang in their versions that they gradually filter upstream. Both projects have LLVM committers among their members. If the workflow that we recommend for them makes upstreaming easy then they benefit (maintaining a fork is effort) and LLVM benefits (having people provide bug fixes makes our code better). > > The workflow that we want to recommend to these people is: > > - Fork the repo that you’re interested in from the LLVM GitHub organisation > - Make your changes > - Send pull requests for anything that you think is of interest to upstream >I understand this, but why isn't "the repo you're interested in" just the megarepo (or monorepo) where every LLVM project resides?> This makes the barrier to entry for sending code back upstream *much* lower than it currently is, to the benefit of all. If the alternative is: > > - Fork a read-only repo that you’re interested in from the LLVM GitHub organisation > - Make your changes > - Fork a different repo from the LLVM GitHub organisation > - Run a script to filter some of your changes into that one > - Send a pull request from that > - Deal with merging between the two yourself > > I strongly suspect that we’ll get a lot fewer useful contributions from downstream. Or downstream people will just work on the monorepo and eat the cost. >It isn't -- for downstream users of any of the LLVM projects, I suspect the answer will just be "instead of forking N repositories to get the benefit from these N projects, just fork the megarepo". If I was a downstream user, this sounds like a simpler proposition *even if I'm only interested in one part of the overall LLVM project*.> If someone is working on a downstream LLVM project and becoming familiar with our codebase, then we want them to be subtly nudging their workflow so that they eventually become LLVM contributors without noticing! >Indeed. The best way I think, all things considered, is that we have a single megarepo where everything LLVM is in there. That way in case anybody wants to make any changes to any part of it, it's a simpler process _especially compared to the status quo_. Cheers
Tom Honermann via llvm-dev
2016-Jul-29 15:36 UTC
[llvm-dev] [RFC] One or many git repositories?
On 7/29/2016 5:21 AM, David Chisnall via llvm-dev wrote:> ... A lot of downstream developers > need to patch LLVM and we benefit when they upstream their changes. We > should not make it harder for them to do this. To give a couple of > example downstream projects, both FreeBSD and Swift have patches on LLVM > / Clang in their versions that they gradually filter upstream. Both > projects have LLVM committers among their members. If the workflow that > we recommend for them makes upstreaming easy then they benefit > (maintaining a fork is effort) and LLVM benefits (having people provide > bug fixes makes our code better).While I agree with all of the above, I think the cost difference in the mechanics of how changes are upstreamed is negligible. The much larger cost of upstreaming changes is the engagement with the community itself. In our local repos, we can skimp on code quality and testing (to our own detriment of course). When we upstream, we need to ensure code style is consistent and that unit tests are in place. We need to find someone with commit access that is willing to commit on our behalf, go through code review with them and address any requests for changes, and then resolve conflicts when the upstreamed changes get pulled back into our repo (we #ifdef our local customizations for auditing and testing purposes, so there are always conflicts for us). This is all exactly as it should be. The only reason I mention it is to say, don't dwell on the mechanics of upstreaming changes too much as the cost differences just aren't that significant *unless* those mechanics significantly reduce the code review and commit process costs. Tom.
Mehdi Amini via llvm-dev
2016-Jul-29 17:01 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Jul 29, 2016, at 2:19 AM, David Chisnall <david.chisnall at cl.cam.ac.uk> wrote: > > On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> What I meant by “different problem" is that “downstream users” for instance don’t need to commit, that makes their problem/workflow quite different from an upstream developer (for instance it is fairly easy to maintain a read-only view of the existing individual git repo currently on llvm.org). > > I’m not convinced by this distinction. A lot of downstream developers need to patch LLVM and we benefit when they upstream their changes.I made a difference between downstream users and developers. I.e. someone that just need to get and build compiler-rt vs someone that want to *commit* to LLVM. Note that even by getting a single repo you can still send a patch to the mailing list and someone can commit it for you (including correct author attribution contrary to SVN).> We should not make it harder for them to do this. To give a couple of example downstream projects, both FreeBSD and Swift have patches on LLVM / Clang in their versions that they gradually filter upstream. Both projects have LLVM committers among their members. If the workflow that we recommend for them makes upstreaming easy then they benefit (maintaining a fork is effort) and LLVM benefits (having people provide bug fixes makes our code better). > > The workflow that we want to recommend to these people is: > > - Fork the repo that you’re interested in from the LLVM GitHub organisation > - Make your changes > - Send pull requests for anything that you think is of interest to upstreamNote that the workflow you describe above still requires to export their patch and import it in this clone before pushing. (Note also that we accept patches on the mailing list, so one does not even need to clone the official repo).> This makes the barrier to entry for sending code back upstream *much* lower than it currently is,I don’t understand this statement. As of today you can send a diff to the mailing list, I don’t see how lower the bar can be.> to the benefit of all. If the alternative is: > > - Fork a read-only repo that you’re interested in from the LLVM GitHub organisation > - Make your changesWhy? If you know you want to *push* commits upstream, fork the only useful repo for that in the first place.> - Fork a different repo from the LLVM GitHub organisation > - Run a script to filter some of your changes into that oneI don’t know why you think there is a need for a script, or why it is different from today. Let say I’m working on a fork of the compiler-rt read-only repo and I want to upstream a patch at some point: Today: - cd /path/to/compiler_rt-forked - git format-patch … - cd /path/to/compiler_rt-upstream - git am /path/to/compiler_rt-forked/0001-My-awesome-changes.patch - git svn dcommit - done Tomorrow with a monorepo: - cd /path/to/compiler_rt-forked - git format-patch … - cd /path/to/unifiedrepo-upstream - git am /path/to/compiler_rt-forked/0001-My-awesome-changes.patch —directory=compiler-rt - git push - done Alternatively, if I’m upstream a patch once a year, I don’t really need to push it myself. - cd /path/to/compiler_rt-forked - git format-patch … - email the patch.> - Send a pull request from thatNote that I think we deferred any change to the workflow for future discussions (pull-request are not part of our workflow today).> - Deal with merging between the two yourselfI don’t know what you mean by dealing with the merging, I don’t expect any difficulties, you need to elaborate.> > I strongly suspect that we’ll get a lot fewer useful contributions from downstream. Or downstream people will just work on the monorepo and eat the cost. > > If someone is working on a downstream LLVM project and becoming familiar with our codebase, then we want them to be subtly nudging their workflow so that they eventually become LLVM contributors without noticing!Sure. The distinction between “downstream users” and “developers” was made in response to “there exists many user that just download and build a subproject”. These are not people that are *developing* on a downstream fork. — Mehdi