Justin Lebar via llvm-dev
2016-Jul-21 01:00 UTC
[llvm-dev] [RFC] One or many git repositories?
> Running the same 'git checkout' commands on multiple repos has always been sufficient to manage the multiple repos so farHuh. It definitely hasn't worked well for me. Here's the issue I face every day. I may be working on (unrelated) changes to clang and llvm. I update my llvm tree (say I checked in a patch, or I want to pull in changes someone else has checked in). Now I want to go back to hacking on my clang stuff. Because my clang branch is not connected to a specific LLVM revision, it no longer compiles. I'm trying to build an old clang against a new llvm. Now I have to pull the latest clang and rebase my patches. After I deal with rebase conflicts (not what I wanted to do at the moment!), I'm in a new state, which means when I build my ccache is no help. And when I run the clang tests, I don't know whether to expect test failures. So then I have to pop of my patches and run at head... (Maybe I have to update clang! In which case I also have to update llvm...) This would all be solved with zero work on my part if llvm and clang were in one repository. Then when I switched to working on my clang patches, I would automatically check out a version of LLVM that is compatible. I think this is the main thing that people aren't getting. Maybe because it's never been possible before to have a workflow like this. But having a git branch that you can check out and immediately build -- without any rebasing, re-syncing, or other messing around -- is incredibly powerful. Please let me know if this is still not clear -- it's kind of the key point. As I said, you can accomplish this with submodules, too, but it requires the complex hackery from my original email. To me, this is not at all a minor inconvenience. It's at least an hour of wasted time every week.> I haven't tried the options jlebar has described to deal with these - sparse checkouts and whatnot, but they seem like an equivalent amount of work/learning curve as writing a script that cd's to several directories and runs the same git command in each.I'll send sparse checkout instructions separately. But my example submodules commands are not at all equivalent to a script that cd's into several directories and runs a git command in each, and I think this is the main point of confusion. (In fact you wouldn't need to write such a script; it's just "git submodule foreach".) The submodules commands creates a single branch in the umbrella repo that encompasses the checked-out state of *all the LLVM subrepos*. So you can, at a later time, check out this branch in the umbrella repo and all the clang, llvm, etc. bits will be identical to the last time you were on the branch. If all you want is to continue using git the way you use it now, the multiple git repos gets you that (as does a sparse checkout on the single repo). My point is that, the move to git opens up a new, much more powerful workflow with branches that encompass both llvm and clang state. We can do this with or without submodules, but using submodules for this is far more awkward than using a single repo. -Justin L. On Wed, Jul 20, 2016 at 5:36 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Chandler Carruth <chandlerc at google.com> writes: >> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes: >>> > I would like to (re-)open a discussion on the following specific >>> question: >>> > >>> > Assuming we are moving the llvm project to git, should we >>> > a) use multiple git repositories, linked together as subrepositories >>> > of an umbrella repo, or >>> > b) use a single git repository for most llvm subprojects. >>> > >>> > The current proposal assembled by Renato follows option (a), but I >>> > think option (b) will be significantly simpler and more effective. >>> > Moreover, I think the issues raised with option (b) are either >>> > incorrect or can be reasonably addressed. >>> > >>> > Specifically, my proposal is that all LLVM subprojects that are >>> > "version-locked" (and/or use the common CMake build system) live in a >>> > single git repository. That probably means all of the main llvm >>> > subprojects other than the test-suite and maybe libc++. From looking >>> > at the repository today that would be: llvm, clang, clang-tools-extra, >>> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs. >>> >>> FWIW, I'm opposed. I'm not convinced that the problems with multiple >>> repos are any worse than the problems with a single repo, which makes >>> this more or less just change for the sake of change, IMO. >>> >> >> It would be useful to know what problems you see with a single repo that >> are more significant. In particular, either why you think the problems >> jlebar already mentioned are worse than he sees them, or what other >> problems are that he hasn't addressed. > > Running the same 'git checkout' commands on multiple repos has always > been sufficient to manage the multiple repos so far - as long as you > create the same branches and tags in each repo, it's easy[1] to manage > the set of repos with a script that cd's to each one and runs whatever > git command. > > So it's a pretty minor inconvenience today to have the multiple repos in > the case where you want to check out all of them. > > OTOH, if all of the repos are combined into one, you have to do work > when you only want some of them. In my experience, this is basically > always - between my various machines and projects I have a several > checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of > checkouts of just llvm. I've only checked out the other repos when I was > changing APIs and needed to update them. > > I haven't tried the options jlebar has described to deal with these - > sparse checkouts and whatnot, but they seem like an equivalent amount of > work/learning curve as writing a script that cd's to several directories > and runs the same git command in each. > > Thus, this also sounds like a minor inconvenience. I just don't see how > trading one for the other is worth doing, since AFAICT they're equally > inconvenient. > > [1] My understanding of the "umbrella repo" thing for bisecting is that > it'll be managed automatically by a cron or checkin hooks or > whatever, so the bit's in jlebar's description about updating > submodules seem like a red herring. I'm assuming that we end up in a > place where working with git is essentially the same as we work with > git-svn today. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Justin Bogner via llvm-dev
2016-Jul-21 01:26 UTC
[llvm-dev] [RFC] One or many git repositories?
Justin Lebar <jlebar at google.com> writes:>> Running the same 'git checkout' commands on multiple repos has >> always been sufficient to manage the multiple repos so far > > Huh. It definitely hasn't worked well for me. > > Here's the issue I face every day. I may be working on (unrelated) > changes to clang and llvm. I update my llvm tree (say I checked in a > patch, or I want to pull in changes someone else has checked in). Now > I want to go back to hacking on my clang stuff. Because my clang > branch is not connected to a specific LLVM revision, it no longer > compiles. I'm trying to build an old clang against a new llvm. > > Now I have to pull the latest clang and rebase my patches. After I > deal with rebase conflicts (not what I wanted to do at the moment!), > I'm in a new state, which means when I build my ccache is no help. > And when I run the clang tests, I don't know whether to expect test > failures. So then I have to pop of my patches and run at head... > (Maybe I have to update clang! In which case I also have to update > llvm...) > > This would all be solved with zero work on my part if llvm and clang > were in one repository. Then when I switched to working on my clang > patches, I would automatically check out a version of LLVM that is > compatible. > > I think this is the main thing that people aren't getting. Maybe > because it's never been possible before to have a workflow like this. > But having a git branch that you can check out and immediately build > -- without any rebasing, re-syncing, or other messing around -- is > incredibly powerful.I don't know man, when I create a branch to save my clang work I just create a branch with the same name in all the other repos I have checked out, then it just stays in the state I left it in as I go do other stuff. This kind of problem just hasn't really come up for me.> Please let me know if this is still not clear -- it's kind of the key point. > > As I said, you can accomplish this with submodules, too, but it > requires the complex hackery from my original email. > > To me, this is not at all a minor inconvenience. It's at least an > hour of wasted time every week. > >> I haven't tried the options jlebar has described to deal with these >> - sparse checkouts and whatnot, but they seem like an equivalent >> amount of work/learning curve as writing a script that cd's to >> several directories and runs the same git command in each. > > I'll send sparse checkout instructions separately. But my example > submodules commands are not at all equivalent to a script that cd's > into several directories and runs a git command in each, and I think > this is the main point of confusion. (In fact you wouldn't need to > write such a script; it's just "git submodule foreach".) > > The submodules commands creates a single branch in the umbrella repo > that encompasses the checked-out state of *all the LLVM subrepos*. So > you can, at a later time, check out this branch in the umbrella repo > and all the clang, llvm, etc. bits will be identical to the last time > you were on the branch. > > If all you want is to continue using git the way you use it now, the > multiple git repos gets you that (as does a sparse checkout on the > single repo). My point is that, the move to git opens up a new, much > more powerful workflow with branches that encompass both llvm and > clang state. We can do this with or without submodules, but using > submodules for this is far more awkward than using a single repo.If I do `git log` in a sparse checkout that just has LLVM, will it only show me LLVM commits? That is, how easy is it to filter out the clang/lldb/subproject-X commits from a log? Negative globs are kind of awkward.
Mehdi Amini via llvm-dev
2016-Jul-21 01:36 UTC
[llvm-dev] [RFC] One or many git repositories?
> On Jul 20, 2016, at 6:26 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Justin Lebar <jlebar at google.com <mailto:jlebar at google.com>> writes: >>> Running the same 'git checkout' commands on multiple repos has >>> always been sufficient to manage the multiple repos so far >> >> Huh. It definitely hasn't worked well for me. >> >> Here's the issue I face every day. I may be working on (unrelated) >> changes to clang and llvm. I update my llvm tree (say I checked in a >> patch, or I want to pull in changes someone else has checked in). Now >> I want to go back to hacking on my clang stuff. Because my clang >> branch is not connected to a specific LLVM revision, it no longer >> compiles. I'm trying to build an old clang against a new llvm. >> >> Now I have to pull the latest clang and rebase my patches. After I >> deal with rebase conflicts (not what I wanted to do at the moment!), >> I'm in a new state, which means when I build my ccache is no help. >> And when I run the clang tests, I don't know whether to expect test >> failures. So then I have to pop of my patches and run at head... >> (Maybe I have to update clang! In which case I also have to update >> llvm...) >> >> This would all be solved with zero work on my part if llvm and clang >> were in one repository. Then when I switched to working on my clang >> patches, I would automatically check out a version of LLVM that is >> compatible. >> >> I think this is the main thing that people aren't getting. Maybe >> because it's never been possible before to have a workflow like this. >> But having a git branch that you can check out and immediately build >> -- without any rebasing, re-syncing, or other messing around -- is >> incredibly powerful. > > I don't know man, when I create a branch to save my clang work I just > create a branch with the same name in all the other repos I have checked > out, then it just stays in the state I left it in as I go do other > stuff. This kind of problem just hasn't really come up for me. > >> Please let me know if this is still not clear -- it's kind of the key point. >> >> As I said, you can accomplish this with submodules, too, but it >> requires the complex hackery from my original email. >> >> To me, this is not at all a minor inconvenience. It's at least an >> hour of wasted time every week. >> >>> I haven't tried the options jlebar has described to deal with these >>> - sparse checkouts and whatnot, but they seem like an equivalent >>> amount of work/learning curve as writing a script that cd's to >>> several directories and runs the same git command in each. >> >> I'll send sparse checkout instructions separately. But my example >> submodules commands are not at all equivalent to a script that cd's >> into several directories and runs a git command in each, and I think >> this is the main point of confusion. (In fact you wouldn't need to >> write such a script; it's just "git submodule foreach".) >> >> The submodules commands creates a single branch in the umbrella repo >> that encompasses the checked-out state of *all the LLVM subrepos*. So >> you can, at a later time, check out this branch in the umbrella repo >> and all the clang, llvm, etc. bits will be identical to the last time >> you were on the branch. >> >> If all you want is to continue using git the way you use it now, the >> multiple git repos gets you that (as does a sparse checkout on the >> single repo). My point is that, the move to git opens up a new, much >> more powerful workflow with branches that encompass both llvm and >> clang state. We can do this with or without submodules, but using >> submodules for this is far more awkward than using a single repo. > > If I do `git log` in a sparse checkout that just has LLVM, will it only > show me LLVM commits? That is, how easy is it to filter out the > clang/lldb/subproject-X commits from a log? Negative globs are kind of > awkward.“git log” would show the full history with a sparse checkout, including the commits that are touching a subdirectory that is not checked out. From the top of the project you’d have to type “git log llvm” to have only the llvm history. I’m not sure if there is a config/alias for that, but a custom git-log script could read the sparse-checkout config to filter it by default. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/896dc527/attachment.html>
Justin Lebar via llvm-dev
2016-Jul-21 02:02 UTC
[llvm-dev] [RFC] One or many git repositories?
> I don't know man, when I create a branch to save my clang work I just > create a branch with the same name in all the other repos I have checked > out, then it just stays in the state I left it in as I go do other > stuff. This kind of problem just hasn't really come up for me.Ah, I understand your workflow now. That works, I guess. It's definitely better than what I've been doing. :) You have to write and use these scripts, of course. I think that's the main problem -- git is hard enough as it is; asking me to do most git commands completely differently when I happen to be working on llvm is asking a lot. Even asking everyone to realize that there's a better way is asking a lot. Inasmuch as we can make the commands we type every day Just Work Like Any Other Git Repository, I think that's a clear win for the community's overall productivity. Beyond that, I guess the main benefits wrt workflow of the single repo are that you can much more easily work with cross-cutting changes. You can stash them, bisect them, reorder them, commit a bunch with one command, whatever, there's nothing special about the fact that they're cross-cutting. And of course we don't get atomic commits across subprojects at all without a single repo. That really would be nice for certain kinds of changes. But I think the bigger point wrt workflows is that there's a real benefit to having fewer special snowflakes in our lives. -Justin L. On Wed, Jul 20, 2016 at 6:26 PM, Justin Bogner <mail at justinbogner.com> wrote:> Justin Lebar <jlebar at google.com> writes: >>> Running the same 'git checkout' commands on multiple repos has >>> always been sufficient to manage the multiple repos so far >> >> Huh. It definitely hasn't worked well for me. >> >> Here's the issue I face every day. I may be working on (unrelated) >> changes to clang and llvm. I update my llvm tree (say I checked in a >> patch, or I want to pull in changes someone else has checked in). Now >> I want to go back to hacking on my clang stuff. Because my clang >> branch is not connected to a specific LLVM revision, it no longer >> compiles. I'm trying to build an old clang against a new llvm. >> >> Now I have to pull the latest clang and rebase my patches. After I >> deal with rebase conflicts (not what I wanted to do at the moment!), >> I'm in a new state, which means when I build my ccache is no help. >> And when I run the clang tests, I don't know whether to expect test >> failures. So then I have to pop of my patches and run at head... >> (Maybe I have to update clang! In which case I also have to update >> llvm...) >> >> This would all be solved with zero work on my part if llvm and clang >> were in one repository. Then when I switched to working on my clang >> patches, I would automatically check out a version of LLVM that is >> compatible. >> >> I think this is the main thing that people aren't getting. Maybe >> because it's never been possible before to have a workflow like this. >> But having a git branch that you can check out and immediately build >> -- without any rebasing, re-syncing, or other messing around -- is >> incredibly powerful. > > I don't know man, when I create a branch to save my clang work I just > create a branch with the same name in all the other repos I have checked > out, then it just stays in the state I left it in as I go do other > stuff. This kind of problem just hasn't really come up for me. > >> Please let me know if this is still not clear -- it's kind of the key point. >> >> As I said, you can accomplish this with submodules, too, but it >> requires the complex hackery from my original email. >> >> To me, this is not at all a minor inconvenience. It's at least an >> hour of wasted time every week. >> >>> I haven't tried the options jlebar has described to deal with these >>> - sparse checkouts and whatnot, but they seem like an equivalent >>> amount of work/learning curve as writing a script that cd's to >>> several directories and runs the same git command in each. >> >> I'll send sparse checkout instructions separately. But my example >> submodules commands are not at all equivalent to a script that cd's >> into several directories and runs a git command in each, and I think >> this is the main point of confusion. (In fact you wouldn't need to >> write such a script; it's just "git submodule foreach".) >> >> The submodules commands creates a single branch in the umbrella repo >> that encompasses the checked-out state of *all the LLVM subrepos*. So >> you can, at a later time, check out this branch in the umbrella repo >> and all the clang, llvm, etc. bits will be identical to the last time >> you were on the branch. >> >> If all you want is to continue using git the way you use it now, the >> multiple git repos gets you that (as does a sparse checkout on the >> single repo). My point is that, the move to git opens up a new, much >> more powerful workflow with branches that encompass both llvm and >> clang state. We can do this with or without submodules, but using >> submodules for this is far more awkward than using a single repo. > > If I do `git log` in a sparse checkout that just has LLVM, will it only > show me LLVM commits? That is, how easy is it to filter out the > clang/lldb/subproject-X commits from a log? Negative globs are kind of > awkward.
Sean Silva via llvm-dev
2016-Jul-21 06:51 UTC
[llvm-dev] [RFC] One or many git repositories?
On Wed, Jul 20, 2016 at 6:26 PM, Justin Bogner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Justin Lebar <jlebar at google.com> writes: > >> Running the same 'git checkout' commands on multiple repos has > >> always been sufficient to manage the multiple repos so far > > > > Huh. It definitely hasn't worked well for me. > > > > Here's the issue I face every day. I may be working on (unrelated) > > changes to clang and llvm. I update my llvm tree (say I checked in a > > patch, or I want to pull in changes someone else has checked in). Now > > I want to go back to hacking on my clang stuff. Because my clang > > branch is not connected to a specific LLVM revision, it no longer > > compiles. I'm trying to build an old clang against a new llvm. > > > > Now I have to pull the latest clang and rebase my patches. After I > > deal with rebase conflicts (not what I wanted to do at the moment!), > > I'm in a new state, which means when I build my ccache is no help. > > And when I run the clang tests, I don't know whether to expect test > > failures. So then I have to pop of my patches and run at head... > > (Maybe I have to update clang! In which case I also have to update > > llvm...) > > > > This would all be solved with zero work on my part if llvm and clang > > were in one repository. Then when I switched to working on my clang > > patches, I would automatically check out a version of LLVM that is > > compatible. > > > > I think this is the main thing that people aren't getting. Maybe > > because it's never been possible before to have a workflow like this. > > But having a git branch that you can check out and immediately build > > -- without any rebasing, re-syncing, or other messing around -- is > > incredibly powerful. > > I don't know man, when I create a branch to save my clang work I just > create a branch with the same name in all the other repos I have checked > out, then it just stays in the state I left it in as I go do other > stuff. This kind of problem just hasn't really come up for me. >It has for me, and it is a serious problem.> > > Please let me know if this is still not clear -- it's kind of the key > point. > > > > As I said, you can accomplish this with submodules, too, but it > > requires the complex hackery from my original email. > > > > To me, this is not at all a minor inconvenience. It's at least an > > hour of wasted time every week. > > > >> I haven't tried the options jlebar has described to deal with these > >> - sparse checkouts and whatnot, but they seem like an equivalent > >> amount of work/learning curve as writing a script that cd's to > >> several directories and runs the same git command in each. > > > > I'll send sparse checkout instructions separately. But my example > > submodules commands are not at all equivalent to a script that cd's > > into several directories and runs a git command in each, and I think > > this is the main point of confusion. (In fact you wouldn't need to > > write such a script; it's just "git submodule foreach".) > > > > The submodules commands creates a single branch in the umbrella repo > > that encompasses the checked-out state of *all the LLVM subrepos*. So > > you can, at a later time, check out this branch in the umbrella repo > > and all the clang, llvm, etc. bits will be identical to the last time > > you were on the branch. > > > > If all you want is to continue using git the way you use it now, the > > multiple git repos gets you that (as does a sparse checkout on the > > single repo). My point is that, the move to git opens up a new, much > > more powerful workflow with branches that encompass both llvm and > > clang state. We can do this with or without submodules, but using > > submodules for this is far more awkward than using a single repo. > > If I do `git log` in a sparse checkout that just has LLVM, will it only > show me LLVM commits? That is, how easy is it to filter out the > clang/lldb/subproject-X commits from a log? Negative globs are kind of > awkward. >It is extremely easy (even with a full checkout): `git log llvm/` -- Sean Silva> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/43c835f5/attachment.html>
Robinson, Paul via llvm-dev
2016-Jul-21 15:00 UTC
[llvm-dev] [RFC] One or many git repositories?
> I don't know man, when I create a branch to save my clang work I just > create a branch with the same name in all the other repos I have checked > out, then it just stays in the state I left it in as I go do other > stuff. This kind of problem just hasn't really come up for me. >I find it too confusing to try to maintain several different patch threads in one place. For one thing I'd have to keep separate build directories anyway, why not just have entire separate clones and 'cd' to the right one to do whatever piece of work. Much faster than doing checkouts all the time and forgetting which build directory to use. Clones are relatively cheap, I keep ten or so lying around each with its own purpose. On another topic, the sparse-checkout feature looks cool but it's also complicated. I don't need all the projects all the time but sometimes a commit will break something and suddenly I'll need to get clang-tools-extra or lld or whatever. I don't want to bother keeping them all around all the time. Finally, the major drawback of a single huge repo IMHO: In git, to push a commit you must have it at the remote HEAD. If HEAD has changed you need to rebase/rebuild/retest/retry. With a single monster repo, a commit to 'lld' means I have to go through this pain to put in my 'clang' tweak. Why is that good? I doubt a sparse-checkout helps here. --paulr