Justin Lebar via llvm-dev
2016-Jul-20 23:39 UTC
[llvm-dev] [RFC] One or many git repositories?
Dear all, I would like to (re-)open a discussion on the following specific question: Assuming we are moving the llvm project to git, should we a) use multiple git repositories, linked together as subrepositories of an umbrella repo, or b) use a single git repository for most llvm subprojects. The current proposal assembled by Renato follows option (a), but I think option (b) will be significantly simpler and more effective. Moreover, I think the issues raised with option (b) are either incorrect or can be reasonably addressed. Specifically, my proposal is that all LLVM subprojects that are "version-locked" (and/or use the common CMake build system) live in a single git repository. That probably means all of the main llvm subprojects other than the test-suite and maybe libc++. From looking at the repository today that would be: llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. Let's first talk about the advantages of a single repository. Then we'll address the disadvantages raised. At a high level, one repository is simpler than multiple repos that must be kept in sync using an external mechanism. The submodules solution requires nontrivial automation to maintain the history of commits in the umbrella repo (which we need if we want to bisect, or even just build an old revision of clang), but no such mechanisms are required if we have a single repo. Similarly, it's possible to make atomic API changes across subprojects in a single repo; we simply can't do with the submodules proposal. And working with llvm release branches becomes much simpler. In addition, the single repository approach ties branches that contain changes to subprojects (e.g. clang) to a specific version of llvm proper. This means that when you switch between two branches that contain changes to clang, you'll automatically check out the right llvm bits. Although we can do this with submodules too, a single repository makes it much easier. As a concrete example, suppose you are working on some changes in clang. You want to commit the changes, then switch to a new branch based on tip of head and make some new changes. Finally you want to switch back to your original branch. And when you switch between branches, you want to get an llvm that's in sync with the clang in your working copy. Here's how I'd do it with a monolithic git repository, option (b): git commit # old-branch git fetch git checkout -b new-branch origin/master # hack hack hack git commit # new-branch git checkout old-branch Here's how I'd do it with option (a), submodules. I've used git -C here to make it explicit which repo we're working in, but in real life I'd probably use cd. # First, commit to two branches, one in your clang repo and one in your # master repo. git -C tools/clang commit # old-branch, clang submodule git commit # old-branch, master repo # Now fetch the submodule and check out head. Start a new branch in the # umbrella repo. git submodule foreach fetch git checkout -b origin/master new-branch git submodule update # Start a new branch in the clang repo pointing to the current head. git checkout -b -C tools/clang new-branch # hack hack hack # Commit both branches. git commit -C tools/clang # new-branch git commit # new-branch # Check out the old branch. git checkout old-branch git submodule update This is twice as many git commands, and almost three times as much typing, to do the same thing. Indeed, this is so complicated I expect that many developers wouldn't bother, and will continue to develop the way we currently do. They would thus continue to be unable to create clang branches that include an llvm revision. :( There are real simplifications and productivity advantages to be had by using a single repository. They will affect essentially every developer who makes changes to subprojects other than LLVM proper, cares about release branches, bisects our code, or builds old revisions. So that's the first part, what we have to gain by using a monolithic repository. Let's address the downsides. If you'll bear with a hypothetical: Imagine you could somehow make the monolithic repository behave exactly like the N separate repositories work today. If so, that would be the best of both worlds: Those of us who want a monolithic repository could have one, and those of us who don't would be unaffected. Whatever downsides you were worried about would evaporate in a mist of rainbows and puppies. It turns out this hypothetical is very close to reality. The key is git sparse checkouts [1], which let you check out only some files or directories from a repository. Using this facility, if you don't like the switch to a monolithic repository, you can set up your git so you're (almost) entirely unaffected by it. If you want to check out only llvm and clang, no problem. Just set up your .git/info/sparse-checkout file appropriately. Done. If you want to be able to have two different revisions of llvm and clang checked out at once (maybe you want to update your clang bits more often than you update your llvm bits), you can do that too. Make one sparse checkout just of llvm, and make another sparse checkout just of clang. Symlink the clang checkout to llvm/tools/clang. That's it. The two checkouts can even share a common .git dir, so you don't have to fetch and store everything twice. As far as I can tell, the only overhead of the monolithic repository is the extra storage in .git. But this is quite small in the scheme of things. The .git dir for the existing monolithic repository [2] is 1.2GB. By way of comparison, my objdir for a release build of llvm and clang is 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is 0.65G. If the 1.2G really is a problem for you (or more likely, your automated infrastructure), a shallow clone [3] takes this down to 90M. The critical point to me in all this is that it's easy to set up the monolithic repository to appear like it's a bunch of separate repos. But it is impossible, insofar as I can tell, to do the opposite. That is, option (b) is strictly more powerful than option (a). Renato has understandably pointed out that the current proposal is pretty far along, so please speak up now if you want to make this happen. I think we can. Regards, -Justin [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more info, see jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout. As far as I can tell, sparse checkouts work fine on Windows, but you have to use git-bash, see stackoverflow.com/q/23289006. [2] github.com/llvm-project/llvm-project [3] git clone --depth=1 github.com/llvm-project/llvm-project.git
Justin Bogner via llvm-dev
2016-Jul-21 00:02 UTC
[llvm-dev] [RFC] One or many git repositories?
Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:> I would like to (re-)open a discussion on the following specific question: > > Assuming we are moving the llvm project to git, should we > a) use multiple git repositories, linked together as subrepositories > of an umbrella repo, or > b) use a single git repository for most llvm subprojects. > > The current proposal assembled by Renato follows option (a), but I > think option (b) will be significantly simpler and more effective. > Moreover, I think the issues raised with option (b) are either > incorrect or can be reasonably addressed. > > Specifically, my proposal is that all LLVM subprojects that are > "version-locked" (and/or use the common CMake build system) live in a > single git repository. That probably means all of the main llvm > subprojects other than the test-suite and maybe libc++. From looking > at the repository today that would be: llvm, clang, clang-tools-extra, > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.FWIW, I'm opposed. I'm not convinced that the problems with multiple repos are any worse than the problems with a single repo, which makes this more or less just change for the sake of change, IMO.
Chandler Carruth via llvm-dev
2016-Jul-21 00:06 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: > > I would like to (re-)open a discussion on the following specific > question: > > > > Assuming we are moving the llvm project to git, should we > > a) use multiple git repositories, linked together as subrepositories > > of an umbrella repo, or > > b) use a single git repository for most llvm subprojects. > > > > The current proposal assembled by Renato follows option (a), but I > > think option (b) will be significantly simpler and more effective. > > Moreover, I think the issues raised with option (b) are either > > incorrect or can be reasonably addressed. > > > > Specifically, my proposal is that all LLVM subprojects that are > > "version-locked" (and/or use the common CMake build system) live in a > > single git repository. That probably means all of the main llvm > > subprojects other than the test-suite and maybe libc++. From looking > > at the repository today that would be: llvm, clang, clang-tools-extra, > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > FWIW, I'm opposed. I'm not convinced that the problems with multiple > repos are any worse than the problems with a single repo, which makes > this more or less just change for the sake of change, IMO. >It would be useful to know what problems you see with a single repo that are more significant. In particular, either why you think the problems jlebar already mentioned are worse than he sees them, or what other problems are that he hasn't addressed. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160721/1f54f3ae/attachment.html>
Sanjoy Das via llvm-dev
2016-Jul-21 00:23 UTC
[llvm-dev] [RFC] One or many git repositories?
Hi Justin, On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> FWIW, I'm opposed. I'm not convinced that the problems with multiple > repos are any worse than the problems with a single repo, which makes > this more or less just change for the sake of change, IMO.Right now we *are* in a monorepo, with sequential revision numbers across llvm and clang, so I'd say trying to move to separate repos is actually the "change" here. :) -- Sanjoy
Dean Michael Berris via llvm-dev
2016-Jul-21 00:29 UTC
[llvm-dev] [RFC] One or many git repositories?
> On 21 Jul 2016, at 09:39, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Dear all, > > I would like to (re-)open a discussion on the following specific question: > > Assuming we are moving the llvm project to git, should we > a) use multiple git repositories, linked together as subrepositories > of an umbrella repo, or > b) use a single git repository for most llvm subprojects. > > The current proposal assembled by Renato follows option (a), but I > think option (b) will be significantly simpler and more effective. > Moreover, I think the issues raised with option (b) are either > incorrect or can be reasonably addressed. >+1 to everything Justin points out here (and the rest of the email, which I've snipped for brevity). Before anything else, I've been through a few of these conversions from SVN to git in other projects. In most of the ones I've seen going to submodules of multiple repo's, a lot of automation is required just to keep things manageable. That's hard to do on a cross-platform basis (do you script in Python, shell script, one per OS, etc.) and is really more trouble than it's worth -- especially when adding new submodules and/or removing them. They're not impossible to do, but they're also much more work than a single repo. Just to point out some devil's advocate positions: - Keeping the current structure will be less churn to existing consumers that have "out of tree" builds based on the current structure. Asking them to change their workflow with SVN significantly (since moving to GitHub is mostly swayed by the SVN interface) will probably be non-trivial amounts of work. We probably need to document this well enough or show that the switch won't affect them too badly. - Some people value keeping the history of the commits in SVN and the Git counterpart once the move happens (for a lot of valid reasons). Making sure we can merge the histories of all the subproject repositories into a single one should be addressed to preserve "provenance". - Some people like isolation of workflows and concerns. As a git-native convert, I'm not sold on this, but there's some good reasons to be able to do this (maintainers of certain projects will probably enforce different constraints on when/who/how changes can/should/must be made). Making it possible to do so in a monorepo should be explained well (i.e. does this need any special configs on the repo on the server side, on GitHub, etc.). All in all I think optimising for the case of the everyday developer working on multiple projects (in my case LLVM, Clang, and compiler-rt, and maybe potentially XRay as a subproject too) is a good cause. Whether this translates to every special consumer of the current set-up is less clear at least to me -- so I'd like to know what other stakeholders here think. Cheers
Sean Silva via llvm-dev
2016-Jul-21 02:41 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: > > I would like to (re-)open a discussion on the following specific > question: > > > > Assuming we are moving the llvm project to git, should we > > a) use multiple git repositories, linked together as subrepositories > > of an umbrella repo, or > > b) use a single git repository for most llvm subprojects. > > > > The current proposal assembled by Renato follows option (a), but I > > think option (b) will be significantly simpler and more effective. > > Moreover, I think the issues raised with option (b) are either > > incorrect or can be reasonably addressed. > > > > Specifically, my proposal is that all LLVM subprojects that are > > "version-locked" (and/or use the common CMake build system) live in a > > single git repository. That probably means all of the main llvm > > subprojects other than the test-suite and maybe libc++. From looking > > at the repository today that would be: llvm, clang, clang-tools-extra, > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > FWIW, I'm opposed. I'm not convinced that the problems with multiple > repos are any worse than the problems with a single repo, which makes > this more or less just change for the sake of change, IMO. >Just my experience, but having worked extensively with both, the single integrated repository is *much* nicer. -- Sean Silva> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160720/ca4b2f67/attachment-0001.html>
Richard Smith via llvm-dev
2016-Jul-22 20:08 UTC
[llvm-dev] [RFC] One or many git repositories?
Having read through the entire thread and thought about this for a while, here are my thoughts: * A single monolithic repository has quite a lot of advantages, some because of what it is (for instance, you can make atomic cross-project commits), and some because of what it isn't (keeping the repositories separate creates synchronization problems for version-locked components, and it's not clear to me that we have a good answer for these problems) * A single repository from which we can build a complete LLVM toolchain, without requiring checking out a dozen components in seemingly-random locations, would be valuable. The default behavior for someone checking out and building the LLVM project should be that they get a complete, fully-functional toolchain. * We need to preserve and maintain the easy ability to mix and match LLVM components with other components (other C runtime libraries, C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository (LLVM tests will fail if you don't check out llvm/tools/opt, but we presumably want to explicitly support not checking out llvm/tools/clang) -- unless we have extensive documentation covering this, and even then there are likely to be discoverability issues. However, the move to git and the reorganization need not be done at the same time, and it seems vastly easier to reorganize *after* we move to a monolithic git repository -- it would then be essentially trivial for each person with organizational ideas to move the code around in their monolithic git repository, push it somewhere where we can all look at it, and for us to then make an informed choice about the layout, with a concrete example in front of us. Then we push the selected new layout; git supports this really nicely if all the parts are already in a single repository. So here's what I would suggest: - we move to a monolithic git repository on github - this monolithic repository contains all the LLVM subprojects necessary to build a complete toolchain, including libc++ and other pieces that are not version-locked to llvm or clang - the initial structure exactly matches the current layout implied by the build system (clang in tools/clang, lld in tools/lld, compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and so on) - after we transition to git, interested parties assemble and upload to github patches reorganizing the project structure, and we have another discussion about principles for the restructuring (including forming solid guidance for how to organize future additions to LLVM), with reference to the patches so we can look at the proposed new layout; we pick one and commit it The goal would be to have the new layout entirely settled by the time 4.0 branches. On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Dear all, > > I would like to (re-)open a discussion on the following specific question: > > Assuming we are moving the llvm project to git, should we > a) use multiple git repositories, linked together as subrepositories > of an umbrella repo, or > b) use a single git repository for most llvm subprojects. > > The current proposal assembled by Renato follows option (a), but I > think option (b) will be significantly simpler and more effective. > Moreover, I think the issues raised with option (b) are either > incorrect or can be reasonably addressed. > > Specifically, my proposal is that all LLVM subprojects that are > "version-locked" (and/or use the common CMake build system) live in a > single git repository. That probably means all of the main llvm > subprojects other than the test-suite and maybe libc++. From looking > at the repository today that would be: llvm, clang, clang-tools-extra, > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > Let's first talk about the advantages of a single repository. Then > we'll address the disadvantages raised. > > At a high level, one repository is simpler than multiple repos that > must be kept in sync using an external mechanism. The submodules > solution requires nontrivial automation to maintain the history of > commits in the umbrella repo (which we need if we want to bisect, or > even just build an old revision of clang), but no such mechanisms are > required if we have a single repo. > > Similarly, it's possible to make atomic API changes across subprojects > in a single repo; we simply can't do with the submodules proposal. > And working with llvm release branches becomes much simpler. > > In addition, the single repository approach ties branches that contain > changes to subprojects (e.g. clang) to a specific version of llvm > proper. This means that when you switch between two branches that > contain changes to clang, you'll automatically check out the right > llvm bits. > > Although we can do this with submodules too, a single repository makes > it much easier. > > As a concrete example, suppose you are working on some changes in > clang. You want to commit the changes, then switch to a new branch > based on tip of head and make some new changes. Finally you want to > switch back to your original branch. And when you switch between > branches, you want to get an llvm that's in sync with the clang in > your working copy. > > Here's how I'd do it with a monolithic git repository, option (b): > > git commit # old-branch > git fetch > git checkout -b new-branch origin/master > # hack hack hack > git commit # new-branch > git checkout old-branch > > Here's how I'd do it with option (a), submodules. I've used git -C > here to make it explicit which repo we're working in, but in real life > I'd probably use cd. > > # First, commit to two branches, one in your clang repo and one in your > # master repo. > git -C tools/clang commit # old-branch, clang submodule > git commit # old-branch, master repo > # Now fetch the submodule and check out head. Start a new branch in the > # umbrella repo. > git submodule foreach fetch > git checkout -b origin/master new-branch > git submodule update > # Start a new branch in the clang repo pointing to the current head. > git checkout -b -C tools/clang new-branch > # hack hack hack > # Commit both branches. > git commit -C tools/clang # new-branch > git commit # new-branch > # Check out the old branch. > git checkout old-branch > git submodule update > > This is twice as many git commands, and almost three times as much > typing, to do the same thing. > > Indeed, this is so complicated I expect that many developers wouldn't > bother, and will continue to develop the way we currently do. They > would thus continue to be unable to create clang branches that include > an llvm revision. :( > > There are real simplifications and productivity advantages to be had > by using a single repository. They will affect essentially every > developer who makes changes to subprojects other than LLVM proper, > cares about release branches, bisects our code, or builds old > revisions. > > > So that's the first part, what we have to gain by using a monolithic > repository. Let's address the downsides. > > If you'll bear with a hypothetical: Imagine you could somehow make the > monolithic repository behave exactly like the N separate repositories > work today. If so, that would be the best of both worlds: Those of us > who want a monolithic repository could have one, and those of us who > don't would be unaffected. Whatever downsides you were worried about > would evaporate in a mist of rainbows and puppies. > > It turns out this hypothetical is very close to reality. The key is > git sparse checkouts [1], which let you check out only some files or > directories from a repository. Using this facility, if you don't like > the switch to a monolithic repository, you can set up your git so > you're (almost) entirely unaffected by it. > > If you want to check out only llvm and clang, no problem. Just set up > your .git/info/sparse-checkout file appropriately. Done. > > If you want to be able to have two different revisions of llvm and > clang checked out at once (maybe you want to update your clang bits > more often than you update your llvm bits), you can do that too. Make > one sparse checkout just of llvm, and make another sparse checkout > just of clang. Symlink the clang checkout to llvm/tools/clang. > That's it. The two checkouts can even share a common .git dir, so you > don't have to fetch and store everything twice. > > As far as I can tell, the only overhead of the monolithic repository > is the extra storage in .git. But this is quite small in the scheme > of things. > > The .git dir for the existing monolithic repository [2] is 1.2GB. By > way of comparison, my objdir for a release build of llvm and clang is > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is > 0.65G. > > If the 1.2G really is a problem for you (or more likely, your > automated infrastructure), a shallow clone [3] takes this down to 90M. > > The critical point to me in all this is that it's easy to set up the > monolithic repository to appear like it's a bunch of separate repos. > But it is impossible, insofar as I can tell, to do the opposite. That > is, option (b) is strictly more powerful than option (a). > > > Renato has understandably pointed out that the current proposal is > pretty far along, so please speak up now if you want to make this > happen. I think we can. > > Regards, > -Justin > > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more > info, see > jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout > . > As far as I can tell, sparse checkouts work fine on Windows, but you > have to use git-bash, see stackoverflow.com/q/23289006. > [2] github.com/llvm-project/llvm-project > [3] git clone --depth=1 github.com/llvm-project/llvm-project.git > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160722/80d80564/attachment.html>
Hal Finkel via llvm-dev
2016-Jul-22 20:17 UTC
[llvm-dev] [RFC] One or many git repositories?
----- Original Message -----> From: "Richard Smith via llvm-dev" <llvm-dev at lists.llvm.org> > To: "Justin Lebar" <jlebar at google.com> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Friday, July 22, 2016 3:08:18 PM > Subject: Re: [llvm-dev] [RFC] One or many git repositories?> Having read through the entire thread and thought about this for a > while, here are my thoughts:> * A single monolithic repository has quite a lot of advantages, some > because of what it is (for instance, you can make atomic > cross-project commits), and some because of what it isn't (keeping > the repositories separate creates synchronization problems for > version-locked components, and it's not clear to me that we have a > good answer for these problems)> * A single repository from which we can build a complete LLVM > toolchain, without requiring checking out a dozen components in > seemingly-random locations, would be valuable. The default behavior > for someone checking out and building the LLVM project should be > that they get a complete, fully-functional toolchain.> * We need to preserve and maintain the easy ability to mix and match > LLVM components with other components (other C runtime libraries, > C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). > That means that it needs to be obvious what the boundaries of the > optional components are, which means that the current project layout > (the one implied by the build system) is not good enough for a > monolithic repository (LLVM tests will fail if you don't check out > llvm/tools/opt, but we presumably want to explicitly support not > checking out llvm/tools/clang) -- unless we have extensive > documentation covering this, and even then there are likely to be > discoverability issues.> However, the move to git and the reorganization need not be done at > the same time, and it seems vastly easier to reorganize *after* we > move to a monolithic git repository -- it would then be essentially > trivial for each person with organizational ideas to move the code > around in their monolithic git repository, push it somewhere where > we can all look at it, and for us to then make an informed choice > about the layout, with a concrete example in front of us. Then we > push the selected new layout; git supports this really nicely if all > the parts are already in a single repository.> So here's what I would suggest:> - we move to a monolithic git repository on github> - this monolithic repository contains all the LLVM subprojects > necessary to build a complete toolchain, including libc++ and other > pieces that are not version-locked to llvm or clang> - the initial structure exactly matches the current layout implied by > the build system (clang in tools/clang, lld in tools/lld, > compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and > so on)> - after we transition to git, interested parties assemble and upload > to github patches reorganizing the project structure, and we have > another discussion about principles for the restructuring (including > forming solid guidance for how to organize future additions to > LLVM), with reference to the patches so we can look at the proposed > new layout; we pick one and commit itI agree with all of this. I think that we should still keep the test-suite in a separate repository (both because it is very large, should be even larger, and because it follows a very different licensing policy). -Hal> The goal would be to have the new layout entirely settled by the time > 4.0 branches.> On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev < > llvm-dev at lists.llvm.org > wrote:> > Dear all, >> > I would like to (re-)open a discussion on the following specific > > question: >> > Assuming we are moving the llvm project to git, should we > > > a) use multiple git repositories, linked together as > > subrepositories > > > of an umbrella repo, or > > > b) use a single git repository for most llvm subprojects. >> > The current proposal assembled by Renato follows option (a), but I > > > think option (b) will be significantly simpler and more effective. > > > Moreover, I think the issues raised with option (b) are either > > > incorrect or can be reasonably addressed. >> > Specifically, my proposal is that all LLVM subprojects that are > > > "version-locked" (and/or use the common CMake build system) live in > > a > > > single git repository. That probably means all of the main llvm > > > subprojects other than the test-suite and maybe libc++. From > > looking > > > at the repository today that would be: llvm, clang, > > clang-tools-extra, > > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >> > Let's first talk about the advantages of a single repository. Then > > > we'll address the disadvantages raised. >> > At a high level, one repository is simpler than multiple repos that > > > must be kept in sync using an external mechanism. The submodules > > > solution requires nontrivial automation to maintain the history of > > > commits in the umbrella repo (which we need if we want to bisect, > > or > > > even just build an old revision of clang), but no such mechanisms > > are > > > required if we have a single repo. >> > Similarly, it's possible to make atomic API changes across > > subprojects > > > in a single repo; we simply can't do with the submodules proposal. > > > And working with llvm release branches becomes much simpler. >> > In addition, the single repository approach ties branches that > > contain > > > changes to subprojects (e.g. clang) to a specific version of llvm > > > proper. This means that when you switch between two branches that > > > contain changes to clang, you'll automatically check out the right > > > llvm bits. >> > Although we can do this with submodules too, a single repository > > makes > > > it much easier. >> > As a concrete example, suppose you are working on some changes in > > > clang. You want to commit the changes, then switch to a new branch > > > based on tip of head and make some new changes. Finally you want to > > > switch back to your original branch. And when you switch between > > > branches, you want to get an llvm that's in sync with the clang in > > > your working copy. >> > Here's how I'd do it with a monolithic git repository, option (b): >> > git commit # old-branch > > > git fetch > > > git checkout -b new-branch origin/master > > > # hack hack hack > > > git commit # new-branch > > > git checkout old-branch >> > Here's how I'd do it with option (a), submodules. I've used git -C > > > here to make it explicit which repo we're working in, but in real > > life > > > I'd probably use cd. >> > # First, commit to two branches, one in your clang repo and one in > > your > > > # master repo. > > > git -C tools/clang commit # old-branch, clang submodule > > > git commit # old-branch, master repo > > > # Now fetch the submodule and check out head. Start a new branch in > > the > > > # umbrella repo. > > > git submodule foreach fetch > > > git checkout -b origin/master new-branch > > > git submodule update > > > # Start a new branch in the clang repo pointing to the current > > head. > > > git checkout -b -C tools/clang new-branch > > > # hack hack hack > > > # Commit both branches. > > > git commit -C tools/clang # new-branch > > > git commit # new-branch > > > # Check out the old branch. > > > git checkout old-branch > > > git submodule update >> > This is twice as many git commands, and almost three times as much > > > typing, to do the same thing. >> > Indeed, this is so complicated I expect that many developers > > wouldn't > > > bother, and will continue to develop the way we currently do. They > > > would thus continue to be unable to create clang branches that > > include > > > an llvm revision. :( >> > There are real simplifications and productivity advantages to be > > had > > > by using a single repository. They will affect essentially every > > > developer who makes changes to subprojects other than LLVM proper, > > > cares about release branches, bisects our code, or builds old > > > revisions. >> > So that's the first part, what we have to gain by using a > > monolithic > > > repository. Let's address the downsides. >> > If you'll bear with a hypothetical: Imagine you could somehow make > > the > > > monolithic repository behave exactly like the N separate > > repositories > > > work today. If so, that would be the best of both worlds: Those of > > us > > > who want a monolithic repository could have one, and those of us > > who > > > don't would be unaffected. Whatever downsides you were worried > > about > > > would evaporate in a mist of rainbows and puppies. >> > It turns out this hypothetical is very close to reality. The key is > > > git sparse checkouts [1], which let you check out only some files > > or > > > directories from a repository. Using this facility, if you don't > > like > > > the switch to a monolithic repository, you can set up your git so > > > you're (almost) entirely unaffected by it. >> > If you want to check out only llvm and clang, no problem. Just set > > up > > > your .git/info/sparse-checkout file appropriately. Done. >> > If you want to be able to have two different revisions of llvm and > > > clang checked out at once (maybe you want to update your clang bits > > > more often than you update your llvm bits), you can do that too. > > Make > > > one sparse checkout just of llvm, and make another sparse checkout > > > just of clang. Symlink the clang checkout to llvm/tools/clang. > > > That's it. The two checkouts can even share a common .git dir, so > > you > > > don't have to fetch and store everything twice. >> > As far as I can tell, the only overhead of the monolithic > > repository > > > is the extra storage in .git. But this is quite small in the scheme > > > of things. >> > The .git dir for the existing monolithic repository [2] is 1.2GB. > > By > > > way of comparison, my objdir for a release build of llvm and clang > > is > > > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang > > is > > > 0.65G. >> > If the 1.2G really is a problem for you (or more likely, your > > > automated infrastructure), a shallow clone [3] takes this down to > > 90M. >> > The critical point to me in all this is that it's easy to set up > > the > > > monolithic repository to appear like it's a bunch of separate > > repos. > > > But it is impossible, insofar as I can tell, to do the opposite. > > That > > > is, option (b) is strictly more powerful than option (a). >> > Renato has understandably pointed out that the current proposal is > > > pretty far along, so please speak up now if you want to make this > > > happen. I think we can. >> > Regards, > > > -Justin >> > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For > > more > > > info, see > > jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout > > . > > > As far as I can tell, sparse checkouts work fine on Windows, but > > you > > > have to use git-bash, see stackoverflow.com/q/23289006 . > > > [2] github.com/llvm-project/llvm-project > > > [3] git clone --depth=1 > > github.com/llvm-project/llvm-project.git > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160722/d37625a4/attachment.html>
Piotr Padlewski via llvm-dev
2016-Jul-22 20:18 UTC
[llvm-dev] [RFC] One or many git repositories?
I have one reasone why we should not moe to monolithic repository - If you do some light stuff like clang-tidy, that don't often require syncing with clang, but you still want to have the most recent checks, then I don't see a solution in monolithic repository. And this is a real issue if you only have 2 or 4 core laptop to do work. And I guess the the build system won't solve the problem, just a small change in some llvm file will result in recompiling many files that clang-tidy depends on. 2016-07-22 13:08 GMT-07:00 Richard Smith via llvm-dev < llvm-dev at lists.llvm.org>:> Having read through the entire thread and thought about this for a while, > here are my thoughts: > > * A single monolithic repository has quite a lot of advantages, some > because of what it is (for instance, you can make atomic cross-project > commits), and some because of what it isn't (keeping the repositories > separate creates synchronization problems for version-locked components, > and it's not clear to me that we have a good answer for these problems) > > * A single repository from which we can build a complete LLVM toolchain, > without requiring checking out a dozen components in seemingly-random > locations, would be valuable. The default behavior for someone checking out > and building the LLVM project should be that they get a complete, > fully-functional toolchain. > > * We need to preserve and maintain the easy ability to mix and match LLVM > components with other components (other C runtime libraries, C++ ABI > libraries, C++ standard libraries, linkers, debuggers, ...). That means > that it needs to be obvious what the boundaries of the optional components > are, which means that the current project layout (the one implied by the > build system) is not good enough for a monolithic repository (LLVM tests > will fail if you don't check out llvm/tools/opt, but we presumably want to > explicitly support not checking out llvm/tools/clang) -- unless we have > extensive documentation covering this, and even then there are likely to be > discoverability issues. > > However, the move to git and the reorganization need not be done at the > same time, and it seems vastly easier to reorganize *after* we move to a > monolithic git repository -- it would then be essentially trivial for each > person with organizational ideas to move the code around in their > monolithic git repository, push it somewhere where we can all look at it, > and for us to then make an informed choice about the layout, with a > concrete example in front of us. Then we push the selected new layout; git > supports this really nicely if all the parts are already in a single > repository. > > So here's what I would suggest: > > - we move to a monolithic git repository on github > > - this monolithic repository contains all the LLVM subprojects necessary > to build a complete toolchain, including libc++ and other pieces that are > not version-locked to llvm or clang > > - the initial structure exactly matches the current layout implied by the > build system (clang in tools/clang, lld in tools/lld, compiler-rt in > runtimes/compiler-rt, libc++ in projects/libcxx, and so on) > > - after we transition to git, interested parties assemble and upload to > github patches reorganizing the project structure, and we have another > discussion about principles for the restructuring (including forming solid > guidance for how to organize future additions to LLVM), with reference to > the patches so we can look at the proposed new layout; we pick one and > commit it > > The goal would be to have the new layout entirely settled by the time 4.0 > branches. > > On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Dear all, >> >> I would like to (re-)open a discussion on the following specific question: >> >> Assuming we are moving the llvm project to git, should we >> a) use multiple git repositories, linked together as subrepositories >> of an umbrella repo, or >> b) use a single git repository for most llvm subprojects. >> >> The current proposal assembled by Renato follows option (a), but I >> think option (b) will be significantly simpler and more effective. >> Moreover, I think the issues raised with option (b) are either >> incorrect or can be reasonably addressed. >> >> Specifically, my proposal is that all LLVM subprojects that are >> "version-locked" (and/or use the common CMake build system) live in a >> single git repository. That probably means all of the main llvm >> subprojects other than the test-suite and maybe libc++. From looking >> at the repository today that would be: llvm, clang, clang-tools-extra, >> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >> >> Let's first talk about the advantages of a single repository. Then >> we'll address the disadvantages raised. >> >> At a high level, one repository is simpler than multiple repos that >> must be kept in sync using an external mechanism. The submodules >> solution requires nontrivial automation to maintain the history of >> commits in the umbrella repo (which we need if we want to bisect, or >> even just build an old revision of clang), but no such mechanisms are >> required if we have a single repo. >> >> Similarly, it's possible to make atomic API changes across subprojects >> in a single repo; we simply can't do with the submodules proposal. >> And working with llvm release branches becomes much simpler. >> >> In addition, the single repository approach ties branches that contain >> changes to subprojects (e.g. clang) to a specific version of llvm >> proper. This means that when you switch between two branches that >> contain changes to clang, you'll automatically check out the right >> llvm bits. >> >> Although we can do this with submodules too, a single repository makes >> it much easier. >> >> As a concrete example, suppose you are working on some changes in >> clang. You want to commit the changes, then switch to a new branch >> based on tip of head and make some new changes. Finally you want to >> switch back to your original branch. And when you switch between >> branches, you want to get an llvm that's in sync with the clang in >> your working copy. >> >> Here's how I'd do it with a monolithic git repository, option (b): >> >> git commit # old-branch >> git fetch >> git checkout -b new-branch origin/master >> # hack hack hack >> git commit # new-branch >> git checkout old-branch >> >> Here's how I'd do it with option (a), submodules. I've used git -C >> here to make it explicit which repo we're working in, but in real life >> I'd probably use cd. >> >> # First, commit to two branches, one in your clang repo and one in your >> # master repo. >> git -C tools/clang commit # old-branch, clang submodule >> git commit # old-branch, master repo >> # Now fetch the submodule and check out head. Start a new branch in the >> # umbrella repo. >> git submodule foreach fetch >> git checkout -b origin/master new-branch >> git submodule update >> # Start a new branch in the clang repo pointing to the current head. >> git checkout -b -C tools/clang new-branch >> # hack hack hack >> # Commit both branches. >> git commit -C tools/clang # new-branch >> git commit # new-branch >> # Check out the old branch. >> git checkout old-branch >> git submodule update >> >> This is twice as many git commands, and almost three times as much >> typing, to do the same thing. >> >> Indeed, this is so complicated I expect that many developers wouldn't >> bother, and will continue to develop the way we currently do. They >> would thus continue to be unable to create clang branches that include >> an llvm revision. :( >> >> There are real simplifications and productivity advantages to be had >> by using a single repository. They will affect essentially every >> developer who makes changes to subprojects other than LLVM proper, >> cares about release branches, bisects our code, or builds old >> revisions. >> >> >> So that's the first part, what we have to gain by using a monolithic >> repository. Let's address the downsides. >> >> If you'll bear with a hypothetical: Imagine you could somehow make the >> monolithic repository behave exactly like the N separate repositories >> work today. If so, that would be the best of both worlds: Those of us >> who want a monolithic repository could have one, and those of us who >> don't would be unaffected. Whatever downsides you were worried about >> would evaporate in a mist of rainbows and puppies. >> >> It turns out this hypothetical is very close to reality. The key is >> git sparse checkouts [1], which let you check out only some files or >> directories from a repository. Using this facility, if you don't like >> the switch to a monolithic repository, you can set up your git so >> you're (almost) entirely unaffected by it. >> >> If you want to check out only llvm and clang, no problem. Just set up >> your .git/info/sparse-checkout file appropriately. Done. >> >> If you want to be able to have two different revisions of llvm and >> clang checked out at once (maybe you want to update your clang bits >> more often than you update your llvm bits), you can do that too. Make >> one sparse checkout just of llvm, and make another sparse checkout >> just of clang. Symlink the clang checkout to llvm/tools/clang. >> That's it. The two checkouts can even share a common .git dir, so you >> don't have to fetch and store everything twice. >> >> As far as I can tell, the only overhead of the monolithic repository >> is the extra storage in .git. But this is quite small in the scheme >> of things. >> >> The .git dir for the existing monolithic repository [2] is 1.2GB. By >> way of comparison, my objdir for a release build of llvm and clang is >> 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is >> 0.65G. >> >> If the 1.2G really is a problem for you (or more likely, your >> automated infrastructure), a shallow clone [3] takes this down to 90M. >> >> The critical point to me in all this is that it's easy to set up the >> monolithic repository to appear like it's a bunch of separate repos. >> But it is impossible, insofar as I can tell, to do the opposite. That >> is, option (b) is strictly more powerful than option (a). >> >> >> Renato has understandably pointed out that the current proposal is >> pretty far along, so please speak up now if you want to make this >> happen. I think we can. >> >> Regards, >> -Justin >> >> [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more >> info, see >> jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout >> . >> As far as I can tell, sparse checkouts work fine on Windows, but you >> have to use git-bash, see stackoverflow.com/q/23289006. >> [2] github.com/llvm-project/llvm-project >> [3] git clone --depth=1 github.com/llvm-project/llvm-project.git >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160722/ce99e985/attachment-0001.html>
Chandler Carruth via llvm-dev
2016-Jul-22 20:36 UTC
[llvm-dev] [RFC] One or many git repositories?
On Fri, Jul 22, 2016 at 1:08 PM Richard Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Having read through the entire thread and thought about this for a while, > here are my thoughts: > > * A single monolithic repository has quite a lot of advantages, some > because of what it is (for instance, you can make atomic cross-project > commits), and some because of what it isn't (keeping the repositories > separate creates synchronization problems for version-locked components, > and it's not clear to me that we have a good answer for these problems) > > * A single repository from which we can build a complete LLVM toolchain, > without requiring checking out a dozen components in seemingly-random > locations, would be valuable. The default behavior for someone checking out > and building the LLVM project should be that they get a complete, > fully-functional toolchain. > > * We need to preserve and maintain the easy ability to mix and match LLVM > components with other components (other C runtime libraries, C++ ABI > libraries, C++ standard libraries, linkers, debuggers, ...). That means > that it needs to be obvious what the boundaries of the optional components > are, which means that the current project layout (the one implied by the > build system) is not good enough for a monolithic repository (LLVM tests > will fail if you don't check out llvm/tools/opt, but we presumably want to > explicitly support not checking out llvm/tools/clang) -- unless we have > extensive documentation covering this, and even then there are likely to be > discoverability issues. > > However, the move to git and the reorganization need not be done at the > same time, and it seems vastly easier to reorganize *after* we move to a > monolithic git repository -- it would then be essentially trivial for each > person with organizational ideas to move the code around in their > monolithic git repository, push it somewhere where we can all look at it, > and for us to then make an informed choice about the layout, with a > concrete example in front of us. Then we push the selected new layout; git > supports this really nicely if all the parts are already in a single > repository. > > So here's what I would suggest: > > - we move to a monolithic git repository on github > > - this monolithic repository contains all the LLVM subprojects necessary > to build a complete toolchain, including libc++ and other pieces that are > not version-locked to llvm or clang > > - the initial structure exactly matches the current layout implied by the > build system (clang in tools/clang, lld in tools/lld, compiler-rt in > runtimes/compiler-rt, libc++ in projects/libcxx, and so on) > > - after we transition to git, interested parties assemble and upload to > github patches reorganizing the project structure, and we have another > discussion about principles for the restructuring (including forming solid > guidance for how to organize future additions to LLVM), with reference to > the patches so we can look at the proposed new layout; we pick one and > commit it > > The goal would be to have the new layout entirely settled by the time 4.0 > branches. >Strong +1 to all of this. It was what I was trying to suggest, but more explicitly written. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160722/1ff431f5/attachment.html>
Tom Stellard via llvm-dev
2016-Jul-23 00:33 UTC
[llvm-dev] [RFC] One or many git repositories?
On Fri, Jul 22, 2016 at 01:08:18PM -0700, Richard Smith via llvm-dev wrote:> Having read through the entire thread and thought about this for a while, > here are my thoughts: > > * A single monolithic repository has quite a lot of advantages, some > because of what it is (for instance, you can make atomic cross-project > commits), and some because of what it isn't (keeping the repositories > separate creates synchronization problems for version-locked components, > and it's not clear to me that we have a good answer for these problems) > > * A single repository from which we can build a complete LLVM toolchain, > without requiring checking out a dozen components in seemingly-random > locations, would be valuable. The default behavior for someone checking out > and building the LLVM project should be that they get a complete, > fully-functional toolchain. > > * We need to preserve and maintain the easy ability to mix and match LLVM > components with other components (other C runtime libraries, C++ ABI > libraries, C++ standard libraries, linkers, debuggers, ...). That means > that it needs to be obvious what the boundaries of the optional components > are, which means that the current project layout (the one implied by the > build system) is not good enough for a monolithic repository (LLVM tests > will fail if you don't check out llvm/tools/opt, but we presumably want to > explicitly support not checking out llvm/tools/clang) -- unless we have > extensive documentation covering this, and even then there are likely to be > discoverability issues. > > However, the move to git and the reorganization need not be done at the > same time, and it seems vastly easier to reorganize *after* we move to a > monolithic git repository -- it would then be essentially trivial for each > person with organizational ideas to move the code around in their > monolithic git repository, push it somewhere where we can all look at it, > and for us to then make an informed choice about the layout, with a > concrete example in front of us. Then we push the selected new layout; git > supports this really nicely if all the parts are already in a single > repository. >I am also in favor of using a monolithic repo. We are currently using the monolithic llvm-project repo[1] for some of our automated testing, and it is much easier to deal with than the separate repos. Especially, in our case were we always build a complete toolchain (for us this means lvm, lld, and clang). -Tom [1] github.com/llvm-project/llvm-project> So here's what I would suggest: > > - we move to a monolithic git repository on github > > - this monolithic repository contains all the LLVM subprojects necessary to > build a complete toolchain, including libc++ and other pieces that are not > version-locked to llvm or clang > > - the initial structure exactly matches the current layout implied by the > build system (clang in tools/clang, lld in tools/lld, compiler-rt in > runtimes/compiler-rt, libc++ in projects/libcxx, and so on) > > - after we transition to git, interested parties assemble and upload to > github patches reorganizing the project structure, and we have another > discussion about principles for the restructuring (including forming solid > guidance for how to organize future additions to LLVM), with reference to > the patches so we can look at the proposed new layout; we pick one and > commit it > > The goal would be to have the new layout entirely settled by the time 4.0 > branches. > > On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > Dear all, > > > > I would like to (re-)open a discussion on the following specific question: > > > > Assuming we are moving the llvm project to git, should we > > a) use multiple git repositories, linked together as subrepositories > > of an umbrella repo, or > > b) use a single git repository for most llvm subprojects. > > > > The current proposal assembled by Renato follows option (a), but I > > think option (b) will be significantly simpler and more effective. > > Moreover, I think the issues raised with option (b) are either > > incorrect or can be reasonably addressed. > > > > Specifically, my proposal is that all LLVM subprojects that are > > "version-locked" (and/or use the common CMake build system) live in a > > single git repository. That probably means all of the main llvm > > subprojects other than the test-suite and maybe libc++. From looking > > at the repository today that would be: llvm, clang, clang-tools-extra, > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > > > Let's first talk about the advantages of a single repository. Then > > we'll address the disadvantages raised. > > > > At a high level, one repository is simpler than multiple repos that > > must be kept in sync using an external mechanism. The submodules > > solution requires nontrivial automation to maintain the history of > > commits in the umbrella repo (which we need if we want to bisect, or > > even just build an old revision of clang), but no such mechanisms are > > required if we have a single repo. > > > > Similarly, it's possible to make atomic API changes across subprojects > > in a single repo; we simply can't do with the submodules proposal. > > And working with llvm release branches becomes much simpler. > > > > In addition, the single repository approach ties branches that contain > > changes to subprojects (e.g. clang) to a specific version of llvm > > proper. This means that when you switch between two branches that > > contain changes to clang, you'll automatically check out the right > > llvm bits. > > > > Although we can do this with submodules too, a single repository makes > > it much easier. > > > > As a concrete example, suppose you are working on some changes in > > clang. You want to commit the changes, then switch to a new branch > > based on tip of head and make some new changes. Finally you want to > > switch back to your original branch. And when you switch between > > branches, you want to get an llvm that's in sync with the clang in > > your working copy. > > > > Here's how I'd do it with a monolithic git repository, option (b): > > > > git commit # old-branch > > git fetch > > git checkout -b new-branch origin/master > > # hack hack hack > > git commit # new-branch > > git checkout old-branch > > > > Here's how I'd do it with option (a), submodules. I've used git -C > > here to make it explicit which repo we're working in, but in real life > > I'd probably use cd. > > > > # First, commit to two branches, one in your clang repo and one in your > > # master repo. > > git -C tools/clang commit # old-branch, clang submodule > > git commit # old-branch, master repo > > # Now fetch the submodule and check out head. Start a new branch in the > > # umbrella repo. > > git submodule foreach fetch > > git checkout -b origin/master new-branch > > git submodule update > > # Start a new branch in the clang repo pointing to the current head. > > git checkout -b -C tools/clang new-branch > > # hack hack hack > > # Commit both branches. > > git commit -C tools/clang # new-branch > > git commit # new-branch > > # Check out the old branch. > > git checkout old-branch > > git submodule update > > > > This is twice as many git commands, and almost three times as much > > typing, to do the same thing. > > > > Indeed, this is so complicated I expect that many developers wouldn't > > bother, and will continue to develop the way we currently do. They > > would thus continue to be unable to create clang branches that include > > an llvm revision. :( > > > > There are real simplifications and productivity advantages to be had > > by using a single repository. They will affect essentially every > > developer who makes changes to subprojects other than LLVM proper, > > cares about release branches, bisects our code, or builds old > > revisions. > > > > > > So that's the first part, what we have to gain by using a monolithic > > repository. Let's address the downsides. > > > > If you'll bear with a hypothetical: Imagine you could somehow make the > > monolithic repository behave exactly like the N separate repositories > > work today. If so, that would be the best of both worlds: Those of us > > who want a monolithic repository could have one, and those of us who > > don't would be unaffected. Whatever downsides you were worried about > > would evaporate in a mist of rainbows and puppies. > > > > It turns out this hypothetical is very close to reality. The key is > > git sparse checkouts [1], which let you check out only some files or > > directories from a repository. Using this facility, if you don't like > > the switch to a monolithic repository, you can set up your git so > > you're (almost) entirely unaffected by it. > > > > If you want to check out only llvm and clang, no problem. Just set up > > your .git/info/sparse-checkout file appropriately. Done. > > > > If you want to be able to have two different revisions of llvm and > > clang checked out at once (maybe you want to update your clang bits > > more often than you update your llvm bits), you can do that too. Make > > one sparse checkout just of llvm, and make another sparse checkout > > just of clang. Symlink the clang checkout to llvm/tools/clang. > > That's it. The two checkouts can even share a common .git dir, so you > > don't have to fetch and store everything twice. > > > > As far as I can tell, the only overhead of the monolithic repository > > is the extra storage in .git. But this is quite small in the scheme > > of things. > > > > The .git dir for the existing monolithic repository [2] is 1.2GB. By > > way of comparison, my objdir for a release build of llvm and clang is > > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is > > 0.65G. > > > > If the 1.2G really is a problem for you (or more likely, your > > automated infrastructure), a shallow clone [3] takes this down to 90M. > > > > The critical point to me in all this is that it's easy to set up the > > monolithic repository to appear like it's a bunch of separate repos. > > But it is impossible, insofar as I can tell, to do the opposite. That > > is, option (b) is strictly more powerful than option (a). > > > > > > Renato has understandably pointed out that the current proposal is > > pretty far along, so please speak up now if you want to make this > > happen. I think we can. > > > > Regards, > > -Justin > > > > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more > > info, see > > jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout > > . > > As far as I can tell, sparse checkouts work fine on Windows, but you > > have to use git-bash, see stackoverflow.com/q/23289006. > > [2] github.com/llvm-project/llvm-project > > [3] git clone --depth=1 github.com/llvm-project/llvm-project.git > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mehdi Amini via llvm-dev
2016-Jul-24 17:31 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Jul 22, 2016, at 1:08 PM, Richard Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Having read through the entire thread and thought about this for a while, here are my thoughts: > > * A single monolithic repository has quite a lot of advantages, some because of what it is (for instance, you can make atomic cross-project commits), and some because of what it isn't (keeping the repositories separate creates synchronization problems for version-locked components, and it's not clear to me that we have a good answer for these problems) > > * A single repository from which we can build a complete LLVM toolchain, without requiring checking out a dozen components in seemingly-random locations, would be valuable. The default behavior for someone checking out and building the LLVM project should be that they get a complete, fully-functional toolchain. > > * We need to preserve and maintain the easy ability to mix and match LLVM components with other components (other C runtime libraries, C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository (LLVM tests will fail if you don't check out llvm/tools/opt, but we presumably want to explicitly support not checking out llvm/tools/clang) -- unless we have extensive documentation covering this, and even then there are likely to be discoverability issues. > > However, the move to git and the reorganization need not be done at the same time, and it seems vastly easier to reorganize *after* we move to a monolithic git repository -- it would then be essentially trivial for each person with organizational ideas to move the code around in their monolithic git repository, push it somewhere where we can all look at it, and for us to then make an informed choice about the layout, with a concrete example in front of us. Then we push the selected new layout; git supports this really nicely if all the parts are already in a single repository. > > So here's what I would suggest: > > - we move to a monolithic git repository on github > > - this monolithic repository contains all the LLVM subprojects necessary to build a complete toolchain, including libc++ and other pieces that are not version-locked to llvm or clang > > - the initial structure exactly matches the current layout implied by the build system (clang in tools/clang, lld in tools/lld, compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and so on)It is not clear to me how this plays with your earlier claim: "That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository”. The “flat” has the merit to keep the independent component clearly separated as they are today. — Mehdi> > - after we transition to git, interested parties assemble and upload to github patches reorganizing the project structure, and we have another discussion about principles for the restructuring (including forming solid guidance for how to organize future additions to LLVM), with reference to the patches so we can look at the proposed new layout; we pick one and commit it > > The goal would be to have the new layout entirely settled by the time 4.0 branches. > > On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > Dear all, > > I would like to (re-)open a discussion on the following specific question: > > Assuming we are moving the llvm project to git, should we > a) use multiple git repositories, linked together as subrepositories > of an umbrella repo, or > b) use a single git repository for most llvm subprojects. > > The current proposal assembled by Renato follows option (a), but I > think option (b) will be significantly simpler and more effective. > Moreover, I think the issues raised with option (b) are either > incorrect or can be reasonably addressed. > > Specifically, my proposal is that all LLVM subprojects that are > "version-locked" (and/or use the common CMake build system) live in a > single git repository. That probably means all of the main llvm > subprojects other than the test-suite and maybe libc++. From looking > at the repository today that would be: llvm, clang, clang-tools-extra, > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > Let's first talk about the advantages of a single repository. Then > we'll address the disadvantages raised. > > At a high level, one repository is simpler than multiple repos that > must be kept in sync using an external mechanism. The submodules > solution requires nontrivial automation to maintain the history of > commits in the umbrella repo (which we need if we want to bisect, or > even just build an old revision of clang), but no such mechanisms are > required if we have a single repo. > > Similarly, it's possible to make atomic API changes across subprojects > in a single repo; we simply can't do with the submodules proposal. > And working with llvm release branches becomes much simpler. > > In addition, the single repository approach ties branches that contain > changes to subprojects (e.g. clang) to a specific version of llvm > proper. This means that when you switch between two branches that > contain changes to clang, you'll automatically check out the right > llvm bits. > > Although we can do this with submodules too, a single repository makes > it much easier. > > As a concrete example, suppose you are working on some changes in > clang. You want to commit the changes, then switch to a new branch > based on tip of head and make some new changes. Finally you want to > switch back to your original branch. And when you switch between > branches, you want to get an llvm that's in sync with the clang in > your working copy. > > Here's how I'd do it with a monolithic git repository, option (b): > > git commit # old-branch > git fetch > git checkout -b new-branch origin/master > # hack hack hack > git commit # new-branch > git checkout old-branch > > Here's how I'd do it with option (a), submodules. I've used git -C > here to make it explicit which repo we're working in, but in real life > I'd probably use cd. > > # First, commit to two branches, one in your clang repo and one in your > # master repo. > git -C tools/clang commit # old-branch, clang submodule > git commit # old-branch, master repo > # Now fetch the submodule and check out head. Start a new branch in the > # umbrella repo. > git submodule foreach fetch > git checkout -b origin/master new-branch > git submodule update > # Start a new branch in the clang repo pointing to the current head. > git checkout -b -C tools/clang new-branch > # hack hack hack > # Commit both branches. > git commit -C tools/clang # new-branch > git commit # new-branch > # Check out the old branch. > git checkout old-branch > git submodule update > > This is twice as many git commands, and almost three times as much > typing, to do the same thing. > > Indeed, this is so complicated I expect that many developers wouldn't > bother, and will continue to develop the way we currently do. They > would thus continue to be unable to create clang branches that include > an llvm revision. :( > > There are real simplifications and productivity advantages to be had > by using a single repository. They will affect essentially every > developer who makes changes to subprojects other than LLVM proper, > cares about release branches, bisects our code, or builds old > revisions. > > > So that's the first part, what we have to gain by using a monolithic > repository. Let's address the downsides. > > If you'll bear with a hypothetical: Imagine you could somehow make the > monolithic repository behave exactly like the N separate repositories > work today. If so, that would be the best of both worlds: Those of us > who want a monolithic repository could have one, and those of us who > don't would be unaffected. Whatever downsides you were worried about > would evaporate in a mist of rainbows and puppies. > > It turns out this hypothetical is very close to reality. The key is > git sparse checkouts [1], which let you check out only some files or > directories from a repository. Using this facility, if you don't like > the switch to a monolithic repository, you can set up your git so > you're (almost) entirely unaffected by it. > > If you want to check out only llvm and clang, no problem. Just set up > your .git/info/sparse-checkout file appropriately. Done. > > If you want to be able to have two different revisions of llvm and > clang checked out at once (maybe you want to update your clang bits > more often than you update your llvm bits), you can do that too. Make > one sparse checkout just of llvm, and make another sparse checkout > just of clang. Symlink the clang checkout to llvm/tools/clang. > That's it. The two checkouts can even share a common .git dir, so you > don't have to fetch and store everything twice. > > As far as I can tell, the only overhead of the monolithic repository > is the extra storage in .git. But this is quite small in the scheme > of things. > > The .git dir for the existing monolithic repository [2] is 1.2GB. By > way of comparison, my objdir for a release build of llvm and clang is > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is > 0.65G. > > If the 1.2G really is a problem for you (or more likely, your > automated infrastructure), a shallow clone [3] takes this down to 90M. > > The critical point to me in all this is that it's easy to set up the > monolithic repository to appear like it's a bunch of separate repos. > But it is impossible, insofar as I can tell, to do the opposite. That > is, option (b) is strictly more powerful than option (a). > > > Renato has understandably pointed out that the current proposal is > pretty far along, so please speak up now if you want to make this > happen. I think we can. > > Regards, > -Justin > > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more > info, see jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout <jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout>. > As far as I can tell, sparse checkouts work fine on Windows, but you > have to use git-bash, see stackoverflow.com/q/23289006 <stackoverflow.com/q/23289006>. > [2] github.com/llvm-project/llvm-project <github.com/llvm-project/llvm-project> > [3] git clone --depth=1 github.com/llvm-project/llvm-project.git <github.com/llvm-project/llvm-project.git> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20160724/861baded/attachment.html>