Chandler Carruth via llvm-dev
2016-Jul-21 00:06 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: > > I would like to (re-)open a discussion on the following specific > question: > > > > Assuming we are moving the llvm project to git, should we > > a) use multiple git repositories, linked together as subrepositories > > of an umbrella repo, or > > b) use a single git repository for most llvm subprojects. > > > > The current proposal assembled by Renato follows option (a), but I > > think option (b) will be significantly simpler and more effective. > > Moreover, I think the issues raised with option (b) are either > > incorrect or can be reasonably addressed. > > > > Specifically, my proposal is that all LLVM subprojects that are > > "version-locked" (and/or use the common CMake build system) live in a > > single git repository. That probably means all of the main llvm > > subprojects other than the test-suite and maybe libc++. From looking > > at the repository today that would be: llvm, clang, clang-tools-extra, > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > > FWIW, I'm opposed. I'm not convinced that the problems with multiple > repos are any worse than the problems with a single repo, which makes > this more or less just change for the sake of change, IMO. >It would be useful to know what problems you see with a single repo that are more significant. In particular, either why you think the problems jlebar already mentioned are worse than he sees them, or what other problems are that he hasn't addressed. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/1f54f3ae/attachment.html>
Justin Bogner via llvm-dev
2016-Jul-21 00:36 UTC
[llvm-dev] [RFC] One or many git repositories?
Chandler Carruth <chandlerc at google.com> writes:> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: >> > I would like to (re-)open a discussion on the following specific >> question: >> > >> > Assuming we are moving the llvm project to git, should we >> > a) use multiple git repositories, linked together as subrepositories >> > of an umbrella repo, or >> > b) use a single git repository for most llvm subprojects. >> > >> > The current proposal assembled by Renato follows option (a), but I >> > think option (b) will be significantly simpler and more effective. >> > Moreover, I think the issues raised with option (b) are either >> > incorrect or can be reasonably addressed. >> > >> > Specifically, my proposal is that all LLVM subprojects that are >> > "version-locked" (and/or use the common CMake build system) live in a >> > single git repository. That probably means all of the main llvm >> > subprojects other than the test-suite and maybe libc++. From looking >> > at the repository today that would be: llvm, clang, clang-tools-extra, >> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >> >> FWIW, I'm opposed. I'm not convinced that the problems with multiple >> repos are any worse than the problems with a single repo, which makes >> this more or less just change for the sake of change, IMO. >> > > It would be useful to know what problems you see with a single repo that > are more significant. In particular, either why you think the problems > jlebar already mentioned are worse than he sees them, or what other > problems are that he hasn't addressed.Running the same 'git checkout' commands on multiple repos has always been sufficient to manage the multiple repos so far - as long as you create the same branches and tags in each repo, it's easy[1] to manage the set of repos with a script that cd's to each one and runs whatever git command. So it's a pretty minor inconvenience today to have the multiple repos in the case where you want to check out all of them. OTOH, if all of the repos are combined into one, you have to do work when you only want some of them. In my experience, this is basically always - between my various machines and projects I have a several checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of checkouts of just llvm. I've only checked out the other repos when I was changing APIs and needed to update them. I haven't tried the options jlebar has described to deal with these - sparse checkouts and whatnot, but they seem like an equivalent amount of work/learning curve as writing a script that cd's to several directories and runs the same git command in each. Thus, this also sounds like a minor inconvenience. I just don't see how trading one for the other is worth doing, since AFAICT they're equally inconvenient. [1] My understanding of the "umbrella repo" thing for bisecting is that it'll be managed automatically by a cron or checkin hooks or whatever, so the bit's in jlebar's description about updating submodules seem like a red herring. I'm assuming that we end up in a place where working with git is essentially the same as we work with git-svn today.
Chandler Carruth via llvm-dev
2016-Jul-21 00:46 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 20, 2016 at 5:36 PM Justin Bogner <mail at justinbogner.com> wrote:> Chandler Carruth <chandlerc at google.com> writes: > > On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > > > >> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> > I would like to (re-)open a discussion on the following specific > >> question: > >> > > >> > Assuming we are moving the llvm project to git, should we > >> > a) use multiple git repositories, linked together as subrepositories > >> > of an umbrella repo, or > >> > b) use a single git repository for most llvm subprojects. > >> > > >> > The current proposal assembled by Renato follows option (a), but I > >> > think option (b) will be significantly simpler and more effective. > >> > Moreover, I think the issues raised with option (b) are either > >> > incorrect or can be reasonably addressed. > >> > > >> > Specifically, my proposal is that all LLVM subprojects that are > >> > "version-locked" (and/or use the common CMake build system) live in a > >> > single git repository. That probably means all of the main llvm > >> > subprojects other than the test-suite and maybe libc++. From looking > >> > at the repository today that would be: llvm, clang, clang-tools-extra, > >> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. > >> > >> FWIW, I'm opposed. I'm not convinced that the problems with multiple > >> repos are any worse than the problems with a single repo, which makes > >> this more or less just change for the sake of change, IMO. > >> > > > > It would be useful to know what problems you see with a single repo that > > are more significant. In particular, either why you think the problems > > jlebar already mentioned are worse than he sees them, or what other > > problems are that he hasn't addressed. > > Running the same 'git checkout' commands on multiple repos has always > been sufficient to manage the multiple repos so far - as long as you > create the same branches and tags in each repo, it's easy[1] to manage > the set of repos with a script that cd's to each one and runs whatever > git command. >A notable difference is the ability to do API updates across them or the ability to bisect across them. Also, if the infrastructure that keeps the umbrella repo in sync falls over or has a serious problem, reconstructing version-locked state in order to bisect across those regions of time seems quite challenging. So IMO, it isn't a minor inconvenience, even if it is something we could overcome.> So it's a pretty minor inconvenience today to have the multiple repos in > the case where you want to check out all of them. > > OTOH, if all of the repos are combined into one, you have to do work > when you only want some of them. In my experience, this is basically > always - between my various machines and projects I have a several > checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of > checkouts of just llvm. I've only checked out the other repos when I was > changing APIs and needed to update them. > > I haven't tried the options jlebar has described to deal with these - > sparse checkouts and whatnot, but they seem like an equivalent amount of > work/learning curve as writing a script that cd's to several directories > and runs the same git command in each. >I actually would like to see an example of how you would checkout a common subset with the sparse checkout feature. jlebar, could you give us demo commands for this? In particular, I've had a lot of folks come up and ask me for my script to walk all the directories and run the appropriate git commands in them, and if it is easier to have the GettingStarted page document how to use the sparse checkout thing, that would be nice. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/29fad065/attachment-0001.html>
Mehdi Amini via llvm-dev
2016-Jul-21 00:53 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Jul 20, 2016, at 5:36 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> writes: >> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: >>>> I would like to (re-)open a discussion on the following specific >>> question: >>>> >>>> Assuming we are moving the llvm project to git, should we >>>> a) use multiple git repositories, linked together as subrepositories >>>> of an umbrella repo, or >>>> b) use a single git repository for most llvm subprojects. >>>> >>>> The current proposal assembled by Renato follows option (a), but I >>>> think option (b) will be significantly simpler and more effective. >>>> Moreover, I think the issues raised with option (b) are either >>>> incorrect or can be reasonably addressed. >>>> >>>> Specifically, my proposal is that all LLVM subprojects that are >>>> "version-locked" (and/or use the common CMake build system) live in a >>>> single git repository. That probably means all of the main llvm >>>> subprojects other than the test-suite and maybe libc++. From looking >>>> at the repository today that would be: llvm, clang, clang-tools-extra, >>>> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >>> >>> FWIW, I'm opposed. I'm not convinced that the problems with multiple >>> repos are any worse than the problems with a single repo, which makes >>> this more or less just change for the sake of change, IMO. >>> >> >> It would be useful to know what problems you see with a single repo that >> are more significant. In particular, either why you think the problems >> jlebar already mentioned are worse than he sees them, or what other >> problems are that he hasn't addressed. > > Running the same 'git checkout' commands on multiple repos has always > been sufficient to manage the multiple repos so far - as long as you > create the same branches and tags in each repo, it's easy[1] to manage > the set of repos with a script that cd's to each one and runs whatever > git command. > > So it's a pretty minor inconvenience today to have the multiple repos in > the case where you want to check out all of them. > > OTOH, if all of the repos are combined into one, you have to do work > when you only want some of them. In my experience, this is basically > always - between my various machines and projects I have a several > checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of > checkouts of just llvm. I've only checked out the other repos when I was > changing APIs and needed to update them. > > I haven't tried the options jlebar has described to deal with these - > sparse checkouts and whatnot, but they seem like an equivalent amount of > work/learning curve as writing a script that cd's to several directories > and runs the same git command in each. > > Thus, this also sounds like a minor inconvenience. I just don't see how > trading one for the other is worth doing, since AFAICT they're equally > inconvenient.IIUC you seem to explain that there are minor inconveniences on both side, but then I’m not sure about why you are opposed? It seems pretty equal, Also the minor inconvenience in the case of the monolithic repository is happening during the initial setup/clone/checkout, and not during day-to-day development (git pull, git checkout -b, git commit, git push), while the split model induces “minor inconveniences” in the day-to-day developer interaction. I.e. I prefer using a script to checkout and setup the repo, and then be able to use the standard git commands for interacting with it.> [1] My understanding of the "umbrella repo" thing for bisecting is that > it'll be managed automatically by a cron or checkin hooks or > whatever,That’s also something that is fragile to me without a deterministic way to reconstruct it identically from scratch using only the split repositories (which would be possible with "git notes” attached by a server-side hook for instance, but unfortunately github does not allow it, and the current split-repository proposal exclude even *discussing* the merits of other hosting services).> so the bit's in jlebar's description about updating > submodules seem like a red herring. I'm assuming that we end up in a > place where working with git is essentially the same as we work with > git-svn today.Some people manage today to have a single commit that update clang+llvm at the same time. I believe doing this in the split-repository model requires write-access to the umbrella repo. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/556030e2/attachment.html>
Justin Lebar via llvm-dev
2016-Jul-21 01:00 UTC
[llvm-dev] [RFC] One or many git repositories?
> Running the same 'git checkout' commands on multiple repos has always been sufficient to manage the multiple repos so farHuh. It definitely hasn't worked well for me. Here's the issue I face every day. I may be working on (unrelated) changes to clang and llvm. I update my llvm tree (say I checked in a patch, or I want to pull in changes someone else has checked in). Now I want to go back to hacking on my clang stuff. Because my clang branch is not connected to a specific LLVM revision, it no longer compiles. I'm trying to build an old clang against a new llvm. Now I have to pull the latest clang and rebase my patches. After I deal with rebase conflicts (not what I wanted to do at the moment!), I'm in a new state, which means when I build my ccache is no help. And when I run the clang tests, I don't know whether to expect test failures. So then I have to pop of my patches and run at head... (Maybe I have to update clang! In which case I also have to update llvm...) This would all be solved with zero work on my part if llvm and clang were in one repository. Then when I switched to working on my clang patches, I would automatically check out a version of LLVM that is compatible. I think this is the main thing that people aren't getting. Maybe because it's never been possible before to have a workflow like this. But having a git branch that you can check out and immediately build -- without any rebasing, re-syncing, or other messing around -- is incredibly powerful. Please let me know if this is still not clear -- it's kind of the key point. As I said, you can accomplish this with submodules, too, but it requires the complex hackery from my original email. To me, this is not at all a minor inconvenience. It's at least an hour of wasted time every week.> I haven't tried the options jlebar has described to deal with these - sparse checkouts and whatnot, but they seem like an equivalent amount of work/learning curve as writing a script that cd's to several directories and runs the same git command in each.I'll send sparse checkout instructions separately. But my example submodules commands are not at all equivalent to a script that cd's into several directories and runs a git command in each, and I think this is the main point of confusion. (In fact you wouldn't need to write such a script; it's just "git submodule foreach".) The submodules commands creates a single branch in the umbrella repo that encompasses the checked-out state of *all the LLVM subrepos*. So you can, at a later time, check out this branch in the umbrella repo and all the clang, llvm, etc. bits will be identical to the last time you were on the branch. If all you want is to continue using git the way you use it now, the multiple git repos gets you that (as does a sparse checkout on the single repo). My point is that, the move to git opens up a new, much more powerful workflow with branches that encompass both llvm and clang state. We can do this with or without submodules, but using submodules for this is far more awkward than using a single repo. -Justin L. On Wed, Jul 20, 2016 at 5:36 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Chandler Carruth <chandlerc at google.com> writes: >> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: >>> > I would like to (re-)open a discussion on the following specific >>> question: >>> > >>> > Assuming we are moving the llvm project to git, should we >>> > a) use multiple git repositories, linked together as subrepositories >>> > of an umbrella repo, or >>> > b) use a single git repository for most llvm subprojects. >>> > >>> > The current proposal assembled by Renato follows option (a), but I >>> > think option (b) will be significantly simpler and more effective. >>> > Moreover, I think the issues raised with option (b) are either >>> > incorrect or can be reasonably addressed. >>> > >>> > Specifically, my proposal is that all LLVM subprojects that are >>> > "version-locked" (and/or use the common CMake build system) live in a >>> > single git repository. That probably means all of the main llvm >>> > subprojects other than the test-suite and maybe libc++. From looking >>> > at the repository today that would be: llvm, clang, clang-tools-extra, >>> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >>> >>> FWIW, I'm opposed. I'm not convinced that the problems with multiple >>> repos are any worse than the problems with a single repo, which makes >>> this more or less just change for the sake of change, IMO. >>> >> >> It would be useful to know what problems you see with a single repo that >> are more significant. In particular, either why you think the problems >> jlebar already mentioned are worse than he sees them, or what other >> problems are that he hasn't addressed. > > Running the same 'git checkout' commands on multiple repos has always > been sufficient to manage the multiple repos so far - as long as you > create the same branches and tags in each repo, it's easy[1] to manage > the set of repos with a script that cd's to each one and runs whatever > git command. > > So it's a pretty minor inconvenience today to have the multiple repos in > the case where you want to check out all of them. > > OTOH, if all of the repos are combined into one, you have to do work > when you only want some of them. In my experience, this is basically > always - between my various machines and projects I have a several > checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of > checkouts of just llvm. I've only checked out the other repos when I was > changing APIs and needed to update them. > > I haven't tried the options jlebar has described to deal with these - > sparse checkouts and whatnot, but they seem like an equivalent amount of > work/learning curve as writing a script that cd's to several directories > and runs the same git command in each. > > Thus, this also sounds like a minor inconvenience. I just don't see how > trading one for the other is worth doing, since AFAICT they're equally > inconvenient. > > [1] My understanding of the "umbrella repo" thing for bisecting is that > it'll be managed automatically by a cron or checkin hooks or > whatever, so the bit's in jlebar's description about updating > submodules seem like a red herring. I'm assuming that we end up in a > place where working with git is essentially the same as we work with > git-svn today. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev