thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Justin Lebar via llvm-dev

2016-Jul-21 01:00 UTC

[llvm-dev] [RFC] One or many git repositories?

> Running the same 'git checkout' commands on multiple repos has
always been sufficient to manage the multiple repos so far
Huh.  It definitely hasn't worked well for me.

Here's the issue I face every day.  I may be working on (unrelated)
changes to clang and llvm.  I update my llvm tree (say I checked in a
patch, or I want to pull in changes someone else has checked in).  Now
I want to go back to hacking on my clang stuff.  Because my clang
branch is not connected to a specific LLVM revision, it no longer
compiles.  I'm trying to build an old clang against a new llvm.

Now I have to pull the latest clang and rebase my patches.  After I
deal with rebase conflicts (not what I wanted to do at the moment!),
I'm in a new state, which means when I build my ccache is no help.
And when I run the clang tests, I don't know whether to expect test
failures.  So then I have to pop of my patches and run at head...
(Maybe I have to update clang!  In which case I also have to update
llvm...)

This would all be solved with zero work on my part if llvm and clang
were in one repository.  Then when I switched to working on my clang
patches, I would automatically check out a version of LLVM that is
compatible.

I think this is the main thing that people aren't getting.  Maybe
because it's never been possible before to have a workflow like this.
But having a git branch that you can check out and immediately build
-- without any rebasing, re-syncing, or other messing around -- is
incredibly powerful.

Please let me know if this is still not clear -- it's kind of the key point.

As I said, you can accomplish this with submodules, too, but it
requires the complex hackery from my original email.

To me, this is not at all a minor inconvenience.  It's at least an
hour of wasted time every week.
> I haven't tried the options jlebar has described to deal with these -
sparse checkouts and whatnot, but they seem like an equivalent amount of
work/learning curve as writing a script that cd's to several directories and
runs the same git command in each.
I'll send sparse checkout instructions separately.  But my example
submodules commands are not at all equivalent to a script that cd's
into several directories and runs a git command in each, and I think
this is the main point of confusion.  (In fact you wouldn't need to
write such a script; it's just "git submodule foreach".)

The submodules commands creates a single branch in the umbrella repo
that encompasses the checked-out state of *all the LLVM subrepos*.  So
you can, at a later time, check out this branch in the umbrella repo
and all the clang, llvm, etc. bits will be identical to the last time
you were on the branch.

If all you want is to continue using git the way you use it now, the
multiple git repos gets you that (as does a sparse checkout on the
single repo).  My point is that, the move to git opens up a new, much
more powerful workflow with branches that encompass both llvm and
clang state.  We can do this with or without submodules, but using
submodules for this is far more awkward than using a single repo.

-Justin L.

On Wed, Jul 20, 2016 at 5:36 PM, Justin Bogner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Chandler Carruth <chandlerc at google.com> writes:
>> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org>
writes:
>>> > I would like to (re-)open a discussion on the following
specific
>>> question:
>>> >
>>> >   Assuming we are moving the llvm project to git, should we
>>> >   a) use multiple git repositories, linked together as
subrepositories
>>> > of an umbrella repo, or
>>> >   b) use a single git repository for most llvm subprojects.
>>> >
>>> > The current proposal assembled by Renato follows option (a),
but I
>>> > think option (b) will be significantly simpler and more
effective.
>>> > Moreover, I think the issues raised with option (b) are either
>>> > incorrect or can be reasonably addressed.
>>> >
>>> > Specifically, my proposal is that all LLVM subprojects that
are
>>> > "version-locked" (and/or use the common CMake build
system) live in a
>>> > single git repository.  That probably means all of the main
llvm
>>> > subprojects other than the test-suite and maybe libc++.  From
looking
>>> > at the repository today that would be: llvm, clang,
clang-tools-extra,
>>> > lld, polly, lldb, llgo, compiler-rt, openmp, and
parallel-libs.
>>>
>>> FWIW, I'm opposed. I'm not convinced that the problems with
multiple
>>> repos are any worse than the problems with a single repo, which
makes
>>> this more or less just change for the sake of change, IMO.
>>>
>>
>> It would be useful to know what problems you see with a single repo
that
>> are more significant. In particular, either why you think the problems
>> jlebar already mentioned are worse than he sees them, or what other
>> problems are that he hasn't addressed.
>
> Running the same 'git checkout' commands on multiple repos has
always
> been sufficient to manage the multiple repos so far - as long as you
> create the same branches and tags in each repo, it's easy[1] to manage
> the set of repos with a script that cd's to each one and runs whatever
> git command.
>
> So it's a pretty minor inconvenience today to have the multiple repos
in
> the case where you want to check out all of them.
>
> OTOH, if all of the repos are combined into one, you have to do work
> when you only want some of them. In my experience, this is basically
> always - between my various machines and projects I have a several
> checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
> checkouts of just llvm. I've only checked out the other repos when I
was
> changing APIs and needed to update them.
>
> I haven't tried the options jlebar has described to deal with these -
> sparse checkouts and whatnot, but they seem like an equivalent amount of
> work/learning curve as writing a script that cd's to several
directories
> and runs the same git command in each.
>
> Thus, this also sounds like a minor inconvenience. I just don't see how
> trading one for the other is worth doing, since AFAICT they're equally
> inconvenient.
>
> [1] My understanding of the "umbrella repo" thing for bisecting
is that
>     it'll be managed automatically by a cron or checkin hooks or
>     whatever, so the bit's in jlebar's description about updating
>     submodules seem like a red herring. I'm assuming that we end up in
a
>     place where working with git is essentially the same as we work with
>     git-svn today.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Justin Bogner via llvm-dev

2016-Jul-21 01:26 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Justin Lebar <jlebar at google.com> writes:>> Running the same 'git checkout' commands on multiple repos has
>> always been sufficient to manage the multiple repos so far
>
> Huh.  It definitely hasn't worked well for me.
>
> Here's the issue I face every day.  I may be working on (unrelated)
> changes to clang and llvm.  I update my llvm tree (say I checked in a
> patch, or I want to pull in changes someone else has checked in).  Now
> I want to go back to hacking on my clang stuff.  Because my clang
> branch is not connected to a specific LLVM revision, it no longer
> compiles.  I'm trying to build an old clang against a new llvm.
>
> Now I have to pull the latest clang and rebase my patches.  After I
> deal with rebase conflicts (not what I wanted to do at the moment!),
> I'm in a new state, which means when I build my ccache is no help.
> And when I run the clang tests, I don't know whether to expect test
> failures.  So then I have to pop of my patches and run at head...
> (Maybe I have to update clang!  In which case I also have to update
> llvm...)
>
> This would all be solved with zero work on my part if llvm and clang
> were in one repository.  Then when I switched to working on my clang
> patches, I would automatically check out a version of LLVM that is
> compatible.
>
> I think this is the main thing that people aren't getting.  Maybe
> because it's never been possible before to have a workflow like this.
> But having a git branch that you can check out and immediately build
> -- without any rebasing, re-syncing, or other messing around -- is
> incredibly powerful.
I don't know man, when I create a branch to save my clang work I just
create a branch with the same name in all the other repos I have checked
out, then it just stays in the state I left it in as I go do other
stuff. This kind of problem just hasn't really come up for me.
> Please let me know if this is still not clear -- it's kind of the key
point.
>
> As I said, you can accomplish this with submodules, too, but it
> requires the complex hackery from my original email.
>
> To me, this is not at all a minor inconvenience.  It's at least an
> hour of wasted time every week.
>
>> I haven't tried the options jlebar has described to deal with these
>> - sparse checkouts and whatnot, but they seem like an equivalent
>> amount of work/learning curve as writing a script that cd's to
>> several directories and runs the same git command in each.
>
> I'll send sparse checkout instructions separately.  But my example
> submodules commands are not at all equivalent to a script that cd's
> into several directories and runs a git command in each, and I think
> this is the main point of confusion.  (In fact you wouldn't need to
> write such a script; it's just "git submodule foreach".)
>
> The submodules commands creates a single branch in the umbrella repo
> that encompasses the checked-out state of *all the LLVM subrepos*.  So
> you can, at a later time, check out this branch in the umbrella repo
> and all the clang, llvm, etc. bits will be identical to the last time
> you were on the branch.
>
> If all you want is to continue using git the way you use it now, the
> multiple git repos gets you that (as does a sparse checkout on the
> single repo).  My point is that, the move to git opens up a new, much
> more powerful workflow with branches that encompass both llvm and
> clang state.  We can do this with or without submodules, but using
> submodules for this is far more awkward than using a single repo.
If I do `git log` in a sparse checkout that just has LLVM, will it only
show me LLVM commits? That is, how easy is it to filter out the
clang/lldb/subproject-X commits from a log? Negative globs are kind of
awkward.

Mehdi Amini via llvm-dev

2016-Jul-21 01:36 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 20, 2016, at 6:26 PM, Justin Bogner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Justin Lebar <jlebar at google.com <mailto:jlebar at
google.com>> writes:
>>> Running the same 'git checkout' commands on multiple repos
has
>>> always been sufficient to manage the multiple repos so far
>> 
>> Huh.  It definitely hasn't worked well for me.
>> 
>> Here's the issue I face every day.  I may be working on (unrelated)
>> changes to clang and llvm.  I update my llvm tree (say I checked in a
>> patch, or I want to pull in changes someone else has checked in).  Now
>> I want to go back to hacking on my clang stuff.  Because my clang
>> branch is not connected to a specific LLVM revision, it no longer
>> compiles.  I'm trying to build an old clang against a new llvm.
>> 
>> Now I have to pull the latest clang and rebase my patches.  After I
>> deal with rebase conflicts (not what I wanted to do at the moment!),
>> I'm in a new state, which means when I build my ccache is no help.
>> And when I run the clang tests, I don't know whether to expect test
>> failures.  So then I have to pop of my patches and run at head...
>> (Maybe I have to update clang!  In which case I also have to update
>> llvm...)
>> 
>> This would all be solved with zero work on my part if llvm and clang
>> were in one repository.  Then when I switched to working on my clang
>> patches, I would automatically check out a version of LLVM that is
>> compatible.
>> 
>> I think this is the main thing that people aren't getting.  Maybe
>> because it's never been possible before to have a workflow like
this.
>> But having a git branch that you can check out and immediately build
>> -- without any rebasing, re-syncing, or other messing around -- is
>> incredibly powerful.
> 
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
> 
>> Please let me know if this is still not clear -- it's kind of the
key point.
>> 
>> As I said, you can accomplish this with submodules, too, but it
>> requires the complex hackery from my original email.
>> 
>> To me, this is not at all a minor inconvenience.  It's at least an
>> hour of wasted time every week.
>> 
>>> I haven't tried the options jlebar has described to deal with
these
>>> - sparse checkouts and whatnot, but they seem like an equivalent
>>> amount of work/learning curve as writing a script that cd's to
>>> several directories and runs the same git command in each.
>> 
>> I'll send sparse checkout instructions separately.  But my example
>> submodules commands are not at all equivalent to a script that cd's
>> into several directories and runs a git command in each, and I think
>> this is the main point of confusion.  (In fact you wouldn't need to
>> write such a script; it's just "git submodule foreach".)
>> 
>> The submodules commands creates a single branch in the umbrella repo
>> that encompasses the checked-out state of *all the LLVM subrepos*.  So
>> you can, at a later time, check out this branch in the umbrella repo
>> and all the clang, llvm, etc. bits will be identical to the last time
>> you were on the branch.
>> 
>> If all you want is to continue using git the way you use it now, the
>> multiple git repos gets you that (as does a sparse checkout on the
>> single repo).  My point is that, the move to git opens up a new, much
>> more powerful workflow with branches that encompass both llvm and
>> clang state.  We can do this with or without submodules, but using
>> submodules for this is far more awkward than using a single repo.
> 
> If I do `git log` in a sparse checkout that just has LLVM, will it only
> show me LLVM commits? That is, how easy is it to filter out the
> clang/lldb/subproject-X commits from a log? Negative globs are kind of
> awkward.
“git log” would show the full history with a sparse checkout, including the
commits that are touching a subdirectory that is not checked out.
From the top of the project you’d have to type “git log llvm” to have only the
llvm history. I’m not sure if there is a config/alias for that, but a custom
git-log script could read the sparse-checkout config to filter it by default.


— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/896dc527/attachment.html>

Justin Lebar via llvm-dev

2016-Jul-21 02:02 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
Ah, I understand your workflow now.  That works, I guess.  It's
definitely better than what I've been doing.  :)

You have to write and use these scripts, of course.  I think that's
the main problem -- git is hard enough as it is; asking me to do most
git commands completely differently when I happen to be working on
llvm is asking a lot.  Even asking everyone to realize that there's a
better way is asking a lot.  Inasmuch as we can make the commands we
type every day Just Work Like Any Other Git Repository, I think that's
a clear win for the community's overall productivity.

Beyond that, I guess the main benefits wrt workflow of the single repo
are that you can much more easily work with cross-cutting changes.
You can stash them, bisect them, reorder them, commit a bunch with one
command, whatever, there's nothing special about the fact that they're
cross-cutting.

And of course we don't get atomic commits across subprojects at all
without a single repo.  That really would be nice for certain kinds of
changes.

But I think the bigger point wrt workflows is that there's a real
benefit to having fewer special snowflakes in our lives.

-Justin L.

On Wed, Jul 20, 2016 at 6:26 PM, Justin Bogner <mail at justinbogner.com>
wrote:> Justin Lebar <jlebar at google.com> writes:
>>> Running the same 'git checkout' commands on multiple repos
has
>>> always been sufficient to manage the multiple repos so far
>>
>> Huh.  It definitely hasn't worked well for me.
>>
>> Here's the issue I face every day.  I may be working on (unrelated)
>> changes to clang and llvm.  I update my llvm tree (say I checked in a
>> patch, or I want to pull in changes someone else has checked in).  Now
>> I want to go back to hacking on my clang stuff.  Because my clang
>> branch is not connected to a specific LLVM revision, it no longer
>> compiles.  I'm trying to build an old clang against a new llvm.
>>
>> Now I have to pull the latest clang and rebase my patches.  After I
>> deal with rebase conflicts (not what I wanted to do at the moment!),
>> I'm in a new state, which means when I build my ccache is no help.
>> And when I run the clang tests, I don't know whether to expect test
>> failures.  So then I have to pop of my patches and run at head...
>> (Maybe I have to update clang!  In which case I also have to update
>> llvm...)
>>
>> This would all be solved with zero work on my part if llvm and clang
>> were in one repository.  Then when I switched to working on my clang
>> patches, I would automatically check out a version of LLVM that is
>> compatible.
>>
>> I think this is the main thing that people aren't getting.  Maybe
>> because it's never been possible before to have a workflow like
this.
>> But having a git branch that you can check out and immediately build
>> -- without any rebasing, re-syncing, or other messing around -- is
>> incredibly powerful.
>
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
>
>> Please let me know if this is still not clear -- it's kind of the
key point.
>>
>> As I said, you can accomplish this with submodules, too, but it
>> requires the complex hackery from my original email.
>>
>> To me, this is not at all a minor inconvenience.  It's at least an
>> hour of wasted time every week.
>>
>>> I haven't tried the options jlebar has described to deal with
these
>>> - sparse checkouts and whatnot, but they seem like an equivalent
>>> amount of work/learning curve as writing a script that cd's to
>>> several directories and runs the same git command in each.
>>
>> I'll send sparse checkout instructions separately.  But my example
>> submodules commands are not at all equivalent to a script that cd's
>> into several directories and runs a git command in each, and I think
>> this is the main point of confusion.  (In fact you wouldn't need to
>> write such a script; it's just "git submodule foreach".)
>>
>> The submodules commands creates a single branch in the umbrella repo
>> that encompasses the checked-out state of *all the LLVM subrepos*.  So
>> you can, at a later time, check out this branch in the umbrella repo
>> and all the clang, llvm, etc. bits will be identical to the last time
>> you were on the branch.
>>
>> If all you want is to continue using git the way you use it now, the
>> multiple git repos gets you that (as does a sparse checkout on the
>> single repo).  My point is that, the move to git opens up a new, much
>> more powerful workflow with branches that encompass both llvm and
>> clang state.  We can do this with or without submodules, but using
>> submodules for this is far more awkward than using a single repo.
>
> If I do `git log` in a sparse checkout that just has LLVM, will it only
> show me LLVM commits? That is, how easy is it to filter out the
> clang/lldb/subproject-X commits from a log? Negative globs are kind of
> awkward.

Sean Silva via llvm-dev

2016-Jul-21 06:51 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Wed, Jul 20, 2016 at 6:26 PM, Justin Bogner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Justin Lebar <jlebar at google.com> writes:
> >> Running the same 'git checkout' commands on multiple repos
has
> >> always been sufficient to manage the multiple repos so far
> >
> > Huh.  It definitely hasn't worked well for me.
> >
> > Here's the issue I face every day.  I may be working on
(unrelated)
> > changes to clang and llvm.  I update my llvm tree (say I checked in a
> > patch, or I want to pull in changes someone else has checked in).  Now
> > I want to go back to hacking on my clang stuff.  Because my clang
> > branch is not connected to a specific LLVM revision, it no longer
> > compiles.  I'm trying to build an old clang against a new llvm.
> >
> > Now I have to pull the latest clang and rebase my patches.  After I
> > deal with rebase conflicts (not what I wanted to do at the moment!),
> > I'm in a new state, which means when I build my ccache is no help.
> > And when I run the clang tests, I don't know whether to expect
test
> > failures.  So then I have to pop of my patches and run at head...
> > (Maybe I have to update clang!  In which case I also have to update
> > llvm...)
> >
> > This would all be solved with zero work on my part if llvm and clang
> > were in one repository.  Then when I switched to working on my clang
> > patches, I would automatically check out a version of LLVM that is
> > compatible.
> >
> > I think this is the main thing that people aren't getting.  Maybe
> > because it's never been possible before to have a workflow like
this.
> > But having a git branch that you can check out and immediately build
> > -- without any rebasing, re-syncing, or other messing around -- is
> > incredibly powerful.
>
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
>
It has for me, and it is a serious problem.

>
> > Please let me know if this is still not clear -- it's kind of the
key
> point.
> >
> > As I said, you can accomplish this with submodules, too, but it
> > requires the complex hackery from my original email.
> >
> > To me, this is not at all a minor inconvenience.  It's at least an
> > hour of wasted time every week.
> >
> >> I haven't tried the options jlebar has described to deal with
these
> >> - sparse checkouts and whatnot, but they seem like an equivalent
> >> amount of work/learning curve as writing a script that cd's to
> >> several directories and runs the same git command in each.
> >
> > I'll send sparse checkout instructions separately.  But my example
> > submodules commands are not at all equivalent to a script that
cd's
> > into several directories and runs a git command in each, and I think
> > this is the main point of confusion.  (In fact you wouldn't need
to
> > write such a script; it's just "git submodule foreach".)
> >
> > The submodules commands creates a single branch in the umbrella repo
> > that encompasses the checked-out state of *all the LLVM subrepos*.  So
> > you can, at a later time, check out this branch in the umbrella repo
> > and all the clang, llvm, etc. bits will be identical to the last time
> > you were on the branch.
> >
> > If all you want is to continue using git the way you use it now, the
> > multiple git repos gets you that (as does a sparse checkout on the
> > single repo).  My point is that, the move to git opens up a new, much
> > more powerful workflow with branches that encompass both llvm and
> > clang state.  We can do this with or without submodules, but using
> > submodules for this is far more awkward than using a single repo.
>
> If I do `git log` in a sparse checkout that just has LLVM, will it only
> show me LLVM commits? That is, how easy is it to filter out the
> clang/lldb/subproject-X commits from a log? Negative globs are kind of
> awkward.
>
It is extremely easy (even with a full checkout): `git log llvm/`

-- Sean Silva

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/43c835f5/attachment.html>

Robinson, Paul via llvm-dev

2016-Jul-21 15:00 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
> 
I find it too confusing to try to maintain several different patch
threads in one place.  For one thing I'd have to keep separate build
directories anyway, why not just have entire separate clones and 'cd'
to the right one to do whatever piece of work.  Much faster than doing
checkouts all the time and forgetting which build directory to use.
Clones are relatively cheap, I keep ten or so lying around each with
its own purpose.

On another topic, the sparse-checkout feature looks cool but it's
also complicated.  I don't need all the projects all the time but
sometimes a commit will break something and suddenly I'll need to get
clang-tools-extra or lld or whatever. I don't want to bother keeping
them all around all the time.

Finally, the major drawback of a single huge repo IMHO:
In git, to push a commit you must have it at the remote HEAD.
If HEAD has changed you need to rebase/rebuild/retest/retry.
With a single monster repo, a commit to 'lld' means I have to
go through this pain to put in my 'clang' tweak.  Why is that good?
I doubt a sparse-checkout helps here.
--paulr

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Maybe Matching Threads