thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Justin Lebar via llvm-dev

2016-Jul-20 23:39 UTC

[llvm-dev] [RFC] One or many git repositories?

Dear all,

I would like to (re-)open a discussion on the following specific question:

  Assuming we are moving the llvm project to git, should we
  a) use multiple git repositories, linked together as subrepositories
of an umbrella repo, or
  b) use a single git repository for most llvm subprojects.

The current proposal assembled by Renato follows option (a), but I
think option (b) will be significantly simpler and more effective.
Moreover, I think the issues raised with option (b) are either
incorrect or can be reasonably addressed.

Specifically, my proposal is that all LLVM subprojects that are
"version-locked" (and/or use the common CMake build system) live in a
single git repository.  That probably means all of the main llvm
subprojects other than the test-suite and maybe libc++.  From looking
at the repository today that would be: llvm, clang, clang-tools-extra,
lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

Let's first talk about the advantages of a single repository.  Then
we'll address the disadvantages raised.

At a high level, one repository is simpler than multiple repos that
must be kept in sync using an external mechanism.  The submodules
solution requires nontrivial automation to maintain the history of
commits in the umbrella repo (which we need if we want to bisect, or
even just build an old revision of clang), but no such mechanisms are
required if we have a single repo.

Similarly, it's possible to make atomic API changes across subprojects
in a single repo; we simply can't do with the submodules proposal.
And working with llvm release branches becomes much simpler.

In addition, the single repository approach ties branches that contain
changes to subprojects (e.g. clang) to a specific version of llvm
proper.  This means that when you switch between two branches that
contain changes to clang, you'll automatically check out the right
llvm bits.

Although we can do this with submodules too, a single repository makes
it much easier.

As a concrete example, suppose you are working on some changes in
clang.  You want to commit the changes, then switch to a new branch
based on tip of head and make some new changes.  Finally you want to
switch back to your original branch.  And when you switch between
branches, you want to get an llvm that's in sync with the clang in
your working copy.

Here's how I'd do it with a monolithic git repository, option (b):

  git commit # old-branch
  git fetch
  git checkout -b new-branch origin/master
  # hack hack hack
  git commit # new-branch
  git checkout old-branch

Here's how I'd do it with option (a), submodules.  I've used git -C
here to make it explicit which repo we're working in, but in real life
I'd probably use cd.

  # First, commit to two branches, one in your clang repo and one in your
  # master repo.
  git -C tools/clang commit # old-branch, clang submodule
  git commit # old-branch, master repo
  # Now fetch the submodule and check out head.  Start a new branch in the
  # umbrella repo.
  git submodule foreach fetch
  git checkout -b origin/master new-branch
  git submodule update
  # Start a new branch in the clang repo pointing to the current head.
  git checkout -b -C tools/clang new-branch
  # hack hack hack
  # Commit both branches.
  git commit -C tools/clang # new-branch
  git commit # new-branch
  # Check out the old branch.
  git checkout old-branch
  git submodule update

This is twice as many git commands, and almost three times as much
typing, to do the same thing.

Indeed, this is so complicated I expect that many developers wouldn't
bother, and will continue to develop the way we currently do.  They
would thus continue to be unable to create clang branches that include
an llvm revision.  :(

There are real simplifications and productivity advantages to be had
by using a single repository.  They will affect essentially every
developer who makes changes to subprojects other than LLVM proper,
cares about release branches, bisects our code, or builds old
revisions.


So that's the first part, what we have to gain by using a monolithic
repository.  Let's address the downsides.

If you'll bear with a hypothetical: Imagine you could somehow make the
monolithic repository behave exactly like the N separate repositories
work today.  If so, that would be the best of both worlds: Those of us
who want a monolithic repository could have one, and those of us who
don't would be unaffected.  Whatever downsides you were worried about
would evaporate in a mist of rainbows and puppies.

It turns out this hypothetical is very close to reality.  The key is
git sparse checkouts [1], which let you check out only some files or
directories from a repository.  Using this facility, if you don't like
the switch to a monolithic repository, you can set up your git so
you're (almost) entirely unaffected by it.

If you want to check out only llvm and clang, no problem. Just set up
your .git/info/sparse-checkout file appropriately.  Done.

If you want to be able to have two different revisions of llvm and
clang checked out at once (maybe you want to update your clang bits
more often than you update your llvm bits), you can do that too.  Make
one sparse checkout just of llvm, and make another sparse checkout
just of clang.  Symlink the clang checkout to llvm/tools/clang.
That's it.  The two checkouts can even share a common .git dir, so you
don't have to fetch and store everything twice.

As far as I can tell, the only overhead of the monolithic repository
is the extra storage in .git.  But this is quite small in the scheme
of things.

The .git dir for the existing monolithic repository [2] is 1.2GB.  By
way of comparison, my objdir for a release build of llvm and clang is
3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
0.65G.

If the 1.2G really is a problem for you (or more likely, your
automated infrastructure), a shallow clone [3] takes this down to 90M.

The critical point to me in all this is that it's easy to set up the
monolithic repository to appear like it's a bunch of separate repos.
But it is impossible, insofar as I can tell, to do the opposite.  That
is, option (b) is strictly more powerful than option (a).


Renato has understandably pointed out that the current proposal is
pretty far along, so please speak up now if you want to make this
happen.  I think we can.

Regards,
-Justin

[1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
info, see
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/.
As far as I can tell, sparse checkouts work fine on Windows, but you
have to use git-bash, see http://stackoverflow.com/q/23289006.
[2] https://github.com/llvm-project/llvm-project
[3] git clone --depth=1 https://github.com/llvm-project/llvm-project.git

Justin Bogner via llvm-dev

2016-Jul-21 00:02 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org>
writes:> I would like to (re-)open a discussion on the following specific question:
>
>   Assuming we are moving the llvm project to git, should we
>   a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>   b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live
in a
> single git repository.  That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++.  From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
FWIW, I'm opposed. I'm not convinced that the problems with multiple
repos are any worse than the problems with a single repo, which makes
this more or less just change for the sake of change, IMO.

Chandler Carruth via llvm-dev

2016-Jul-21 00:06 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > I would like to (re-)open a discussion on the following specific
> question:
> >
> >   Assuming we are moving the llvm project to git, should we
> >   a) use multiple git repositories, linked together as subrepositories
> > of an umbrella repo, or
> >   b) use a single git repository for most llvm subprojects.
> >
> > The current proposal assembled by Renato follows option (a), but I
> > think option (b) will be significantly simpler and more effective.
> > Moreover, I think the issues raised with option (b) are either
> > incorrect or can be reasonably addressed.
> >
> > Specifically, my proposal is that all LLVM subprojects that are
> > "version-locked" (and/or use the common CMake build system)
live in a
> > single git repository.  That probably means all of the main llvm
> > subprojects other than the test-suite and maybe libc++.  From looking
> > at the repository today that would be: llvm, clang, clang-tools-extra,
> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>
> FWIW, I'm opposed. I'm not convinced that the problems with
multiple
> repos are any worse than the problems with a single repo, which makes
> this more or less just change for the sake of change, IMO.
>
It would be useful to know what problems you see with a single repo that
are more significant. In particular, either why you think the problems
jlebar already mentioned are worse than he sees them, or what other
problems are that he hasn't addressed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/1f54f3ae/attachment.html>

Sanjoy Das via llvm-dev

2016-Jul-21 00:23 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Hi Justin,

On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> FWIW, I'm opposed. I'm not convinced that the problems with
multiple
> repos are any worse than the problems with a single repo, which makes
> this more or less just change for the sake of change, IMO.
Right now we *are* in a monorepo, with sequential revision numbers
across llvm and clang, so I'd say trying to move to separate repos is
actually the "change" here.  :)

-- Sanjoy

Dean Michael Berris via llvm-dev

2016-Jul-21 00:29 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 21 Jul 2016, at 09:39, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Dear all,
> 
> I would like to (re-)open a discussion on the following specific question:
> 
>  Assuming we are moving the llvm project to git, should we
>  a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>  b) use a single git repository for most llvm subprojects.
> 
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
> 
+1 to everything Justin points out here (and the rest of the email, which
I've snipped for brevity).

Before anything else, I've been through a few of these conversions from SVN
to git in other projects. In most of the ones I've seen going to submodules
of multiple repo's, a lot of automation is required just to keep things
manageable. That's hard to do on a cross-platform basis (do you script in
Python, shell script, one per OS, etc.) and is really more trouble than it's
worth -- especially when adding new submodules and/or removing them. They're
not impossible to do, but they're also much more work than a single repo.

Just to point out some devil's advocate positions:

- Keeping the current structure will be less churn to existing consumers that
have "out of tree" builds based on the current structure. Asking them
to change their workflow with SVN significantly (since moving to GitHub is
mostly swayed by the SVN interface) will probably be non-trivial amounts of
work. We probably need to document this well enough or show that the switch
won't affect them too badly.

- Some people value keeping the history of the commits in SVN and the Git
counterpart once the move happens (for a lot of valid reasons). Making sure we
can merge the histories of all the subproject repositories into a single one
should be addressed to preserve "provenance".

- Some people like isolation of workflows and concerns. As a git-native convert,
I'm not sold on this, but there's some good reasons to be able to do
this (maintainers of certain projects will probably enforce different
constraints on when/who/how changes can/should/must be made). Making it possible
to do so in a monorepo should be explained well (i.e. does this need any special
configs on the repo on the server side, on GitHub, etc.).

All in all I think optimising for the case of the everyday developer working on
multiple projects (in my case LLVM, Clang, and compiler-rt, and maybe
potentially XRay as a subproject too) is a good cause. Whether this translates
to every special consumer of the current set-up is less clear at least to me --
so I'd like to know what other stakeholders here think.

Cheers

Sean Silva via llvm-dev

2016-Jul-21 02:41 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > I would like to (re-)open a discussion on the following specific
> question:
> >
> >   Assuming we are moving the llvm project to git, should we
> >   a) use multiple git repositories, linked together as subrepositories
> > of an umbrella repo, or
> >   b) use a single git repository for most llvm subprojects.
> >
> > The current proposal assembled by Renato follows option (a), but I
> > think option (b) will be significantly simpler and more effective.
> > Moreover, I think the issues raised with option (b) are either
> > incorrect or can be reasonably addressed.
> >
> > Specifically, my proposal is that all LLVM subprojects that are
> > "version-locked" (and/or use the common CMake build system)
live in a
> > single git repository.  That probably means all of the main llvm
> > subprojects other than the test-suite and maybe libc++.  From looking
> > at the repository today that would be: llvm, clang, clang-tools-extra,
> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>
> FWIW, I'm opposed. I'm not convinced that the problems with
multiple
> repos are any worse than the problems with a single repo, which makes
> this more or less just change for the sake of change, IMO.
>
Just my experience, but having worked extensively with both, the single
integrated repository is *much* nicer.

-- Sean Silva

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160720/ca4b2f67/attachment-0001.html>

Richard Smith via llvm-dev

2016-Jul-22 20:08 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Having read through the entire thread and thought about this for a while,
here are my thoughts:

 * A single monolithic repository has quite a lot of advantages, some
because of what it is (for instance, you can make atomic cross-project
commits), and some because of what it isn't (keeping the repositories
separate creates synchronization problems for version-locked components,
and it's not clear to me that we have a good answer for these problems)

 * A single repository from which we can build a complete LLVM toolchain,
without requiring checking out a dozen components in seemingly-random
locations, would be valuable. The default behavior for someone checking out
and building the LLVM project should be that they get a complete,
fully-functional toolchain.

 * We need to preserve and maintain the easy ability to mix and match LLVM
components with other components (other C runtime libraries, C++ ABI
libraries, C++ standard libraries, linkers, debuggers, ...). That means
that it needs to be obvious what the boundaries of the optional components
are, which means that the current project layout (the one implied by the
build system) is not good enough for a monolithic repository (LLVM tests
will fail if you don't check out llvm/tools/opt, but we presumably want to
explicitly support not checking out llvm/tools/clang) -- unless we have
extensive documentation covering this, and even then there are likely to be
discoverability issues.

However, the move to git and the reorganization need not be done at the
same time, and it seems vastly easier to reorganize *after* we move to a
monolithic git repository -- it would then be essentially trivial for each
person with organizational ideas to move the code around in their
monolithic git repository, push it somewhere where we can all look at it,
and for us to then make an informed choice about the layout, with a
concrete example in front of us. Then we push the selected new layout; git
supports this really nicely if all the parts are already in a single
repository.

So here's what I would suggest:

- we move to a monolithic git repository on github

- this monolithic repository contains all the LLVM subprojects necessary to
build a complete toolchain, including libc++ and other pieces that are not
version-locked to llvm or clang

- the initial structure exactly matches the current layout implied by the
build system (clang in tools/clang, lld in tools/lld, compiler-rt in
runtimes/compiler-rt, libc++ in projects/libcxx, and so on)

- after we transition to git, interested parties assemble and upload to
github patches reorganizing the project structure, and we have another
discussion about principles for the restructuring (including forming solid
guidance for how to organize future additions to LLVM), with reference to
the patches so we can look at the proposed new layout; we pick one and
commit it

The goal would be to have the new layout entirely settled by the time 4.0
branches.

On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Dear all,
>
> I would like to (re-)open a discussion on the following specific question:
>
>   Assuming we are moving the llvm project to git, should we
>   a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>   b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live
in a
> single git repository.  That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++.  From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>
> Let's first talk about the advantages of a single repository.  Then
> we'll address the disadvantages raised.
>
> At a high level, one repository is simpler than multiple repos that
> must be kept in sync using an external mechanism.  The submodules
> solution requires nontrivial automation to maintain the history of
> commits in the umbrella repo (which we need if we want to bisect, or
> even just build an old revision of clang), but no such mechanisms are
> required if we have a single repo.
>
> Similarly, it's possible to make atomic API changes across subprojects
> in a single repo; we simply can't do with the submodules proposal.
> And working with llvm release branches becomes much simpler.
>
> In addition, the single repository approach ties branches that contain
> changes to subprojects (e.g. clang) to a specific version of llvm
> proper.  This means that when you switch between two branches that
> contain changes to clang, you'll automatically check out the right
> llvm bits.
>
> Although we can do this with submodules too, a single repository makes
> it much easier.
>
> As a concrete example, suppose you are working on some changes in
> clang.  You want to commit the changes, then switch to a new branch
> based on tip of head and make some new changes.  Finally you want to
> switch back to your original branch.  And when you switch between
> branches, you want to get an llvm that's in sync with the clang in
> your working copy.
>
> Here's how I'd do it with a monolithic git repository, option (b):
>
>   git commit # old-branch
>   git fetch
>   git checkout -b new-branch origin/master
>   # hack hack hack
>   git commit # new-branch
>   git checkout old-branch
>
> Here's how I'd do it with option (a), submodules.  I've used
git -C
> here to make it explicit which repo we're working in, but in real life
> I'd probably use cd.
>
>   # First, commit to two branches, one in your clang repo and one in your
>   # master repo.
>   git -C tools/clang commit # old-branch, clang submodule
>   git commit # old-branch, master repo
>   # Now fetch the submodule and check out head.  Start a new branch in the
>   # umbrella repo.
>   git submodule foreach fetch
>   git checkout -b origin/master new-branch
>   git submodule update
>   # Start a new branch in the clang repo pointing to the current head.
>   git checkout -b -C tools/clang new-branch
>   # hack hack hack
>   # Commit both branches.
>   git commit -C tools/clang # new-branch
>   git commit # new-branch
>   # Check out the old branch.
>   git checkout old-branch
>   git submodule update
>
> This is twice as many git commands, and almost three times as much
> typing, to do the same thing.
>
> Indeed, this is so complicated I expect that many developers wouldn't
> bother, and will continue to develop the way we currently do.  They
> would thus continue to be unable to create clang branches that include
> an llvm revision.  :(
>
> There are real simplifications and productivity advantages to be had
> by using a single repository.  They will affect essentially every
> developer who makes changes to subprojects other than LLVM proper,
> cares about release branches, bisects our code, or builds old
> revisions.
>
>
> So that's the first part, what we have to gain by using a monolithic
> repository.  Let's address the downsides.
>
> If you'll bear with a hypothetical: Imagine you could somehow make the
> monolithic repository behave exactly like the N separate repositories
> work today.  If so, that would be the best of both worlds: Those of us
> who want a monolithic repository could have one, and those of us who
> don't would be unaffected.  Whatever downsides you were worried about
> would evaporate in a mist of rainbows and puppies.
>
> It turns out this hypothetical is very close to reality.  The key is
> git sparse checkouts [1], which let you check out only some files or
> directories from a repository.  Using this facility, if you don't like
> the switch to a monolithic repository, you can set up your git so
> you're (almost) entirely unaffected by it.
>
> If you want to check out only llvm and clang, no problem. Just set up
> your .git/info/sparse-checkout file appropriately.  Done.
>
> If you want to be able to have two different revisions of llvm and
> clang checked out at once (maybe you want to update your clang bits
> more often than you update your llvm bits), you can do that too.  Make
> one sparse checkout just of llvm, and make another sparse checkout
> just of clang.  Symlink the clang checkout to llvm/tools/clang.
> That's it.  The two checkouts can even share a common .git dir, so you
> don't have to fetch and store everything twice.
>
> As far as I can tell, the only overhead of the monolithic repository
> is the extra storage in .git.  But this is quite small in the scheme
> of things.
>
> The .git dir for the existing monolithic repository [2] is 1.2GB.  By
> way of comparison, my objdir for a release build of llvm and clang is
> 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
> 0.65G.
>
> If the 1.2G really is a problem for you (or more likely, your
> automated infrastructure), a shallow clone [3] takes this down to 90M.
>
> The critical point to me in all this is that it's easy to set up the
> monolithic repository to appear like it's a bunch of separate repos.
> But it is impossible, insofar as I can tell, to do the opposite.  That
> is, option (b) is strictly more powerful than option (a).
>
>
> Renato has understandably pointed out that the current proposal is
> pretty far along, so please speak up now if you want to make this
> happen.  I think we can.
>
> Regards,
> -Justin
>
> [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
> info, see
> http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
> .
> As far as I can tell, sparse checkouts work fine on Windows, but you
> have to use git-bash, see http://stackoverflow.com/q/23289006.
> [2] https://github.com/llvm-project/llvm-project
> [3] git clone --depth=1 https://github.com/llvm-project/llvm-project.git
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/80d80564/attachment.html>

Hal Finkel via llvm-dev

2016-Jul-22 20:17 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

----- Original Message -----
> From: "Richard Smith via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Justin Lebar" <jlebar at google.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, July 22, 2016 3:08:18 PM
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
> Having read through the entire thread and thought about this for a
> while, here are my thoughts:
> * A single monolithic repository has quite a lot of advantages, some
> because of what it is (for instance, you can make atomic
> cross-project commits), and some because of what it isn't (keeping
> the repositories separate creates synchronization problems for
> version-locked components, and it's not clear to me that we have a
> good answer for these problems)
> * A single repository from which we can build a complete LLVM
> toolchain, without requiring checking out a dozen components in
> seemingly-random locations, would be valuable. The default behavior
> for someone checking out and building the LLVM project should be
> that they get a complete, fully-functional toolchain.
> * We need to preserve and maintain the easy ability to mix and match
> LLVM components with other components (other C runtime libraries,
> C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...).
> That means that it needs to be obvious what the boundaries of the
> optional components are, which means that the current project layout
> (the one implied by the build system) is not good enough for a
> monolithic repository (LLVM tests will fail if you don't check out
> llvm/tools/opt, but we presumably want to explicitly support not
> checking out llvm/tools/clang) -- unless we have extensive
> documentation covering this, and even then there are likely to be
> discoverability issues.
> However, the move to git and the reorganization need not be done at
> the same time, and it seems vastly easier to reorganize *after* we
> move to a monolithic git repository -- it would then be essentially
> trivial for each person with organizational ideas to move the code
> around in their monolithic git repository, push it somewhere where
> we can all look at it, and for us to then make an informed choice
> about the layout, with a concrete example in front of us. Then we
> push the selected new layout; git supports this really nicely if all
> the parts are already in a single repository.
> So here's what I would suggest:
> - we move to a monolithic git repository on github
> - this monolithic repository contains all the LLVM subprojects
> necessary to build a complete toolchain, including libc++ and other
> pieces that are not version-locked to llvm or clang
> - the initial structure exactly matches the current layout implied by
> the build system (clang in tools/clang, lld in tools/lld,
> compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and
> so on)
> - after we transition to git, interested parties assemble and upload
> to github patches reorganizing the project structure, and we have
> another discussion about principles for the restructuring (including
> forming solid guidance for how to organize future additions to
> LLVM), with reference to the patches so we can look at the proposed
> new layout; we pick one and commit itI agree with all of this. 

I think that we should still keep the test-suite in a separate repository (both
because it is very large, should be even larger, and because it follows a very
different licensing policy).

-Hal 
> The goal would be to have the new layout entirely settled by the time
> 4.0 branches.
> On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > Dear all,
> 
> > I would like to (re-)open a discussion on the following specific
> > question:
> 
> > Assuming we are moving the llvm project to git, should we
> 
> > a) use multiple git repositories, linked together as
> > subrepositories
> 
> > of an umbrella repo, or
> 
> > b) use a single git repository for most llvm subprojects.
> 
> > The current proposal assembled by Renato follows option (a), but I
> 
> > think option (b) will be significantly simpler and more effective.
> 
> > Moreover, I think the issues raised with option (b) are either
> 
> > incorrect or can be reasonably addressed.
> 
> > Specifically, my proposal is that all LLVM subprojects that are
> 
> > "version-locked" (and/or use the common CMake build system)
live in
> > a
> 
> > single git repository. That probably means all of the main llvm
> 
> > subprojects other than the test-suite and maybe libc++. From
> > looking
> 
> > at the repository today that would be: llvm, clang,
> > clang-tools-extra,
> 
> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
> 
> > Let's first talk about the advantages of a single repository. Then
> 
> > we'll address the disadvantages raised.
> 
> > At a high level, one repository is simpler than multiple repos that
> 
> > must be kept in sync using an external mechanism. The submodules
> 
> > solution requires nontrivial automation to maintain the history of
> 
> > commits in the umbrella repo (which we need if we want to bisect,
> > or
> 
> > even just build an old revision of clang), but no such mechanisms
> > are
> 
> > required if we have a single repo.
> 
> > Similarly, it's possible to make atomic API changes across
> > subprojects
> 
> > in a single repo; we simply can't do with the submodules proposal.
> 
> > And working with llvm release branches becomes much simpler.
> 
> > In addition, the single repository approach ties branches that
> > contain
> 
> > changes to subprojects (e.g. clang) to a specific version of llvm
> 
> > proper. This means that when you switch between two branches that
> 
> > contain changes to clang, you'll automatically check out the right
> 
> > llvm bits.
> 
> > Although we can do this with submodules too, a single repository
> > makes
> 
> > it much easier.
> 
> > As a concrete example, suppose you are working on some changes in
> 
> > clang. You want to commit the changes, then switch to a new branch
> 
> > based on tip of head and make some new changes. Finally you want to
> 
> > switch back to your original branch. And when you switch between
> 
> > branches, you want to get an llvm that's in sync with the clang in
> 
> > your working copy.
> 
> > Here's how I'd do it with a monolithic git repository, option
(b):
> 
> > git commit # old-branch
> 
> > git fetch
> 
> > git checkout -b new-branch origin/master
> 
> > # hack hack hack
> 
> > git commit # new-branch
> 
> > git checkout old-branch
> 
> > Here's how I'd do it with option (a), submodules. I've
used git -C
> 
> > here to make it explicit which repo we're working in, but in real
> > life
> 
> > I'd probably use cd.
> 
> > # First, commit to two branches, one in your clang repo and one in
> > your
> 
> > # master repo.
> 
> > git -C tools/clang commit # old-branch, clang submodule
> 
> > git commit # old-branch, master repo
> 
> > # Now fetch the submodule and check out head. Start a new branch in
> > the
> 
> > # umbrella repo.
> 
> > git submodule foreach fetch
> 
> > git checkout -b origin/master new-branch
> 
> > git submodule update
> 
> > # Start a new branch in the clang repo pointing to the current
> > head.
> 
> > git checkout -b -C tools/clang new-branch
> 
> > # hack hack hack
> 
> > # Commit both branches.
> 
> > git commit -C tools/clang # new-branch
> 
> > git commit # new-branch
> 
> > # Check out the old branch.
> 
> > git checkout old-branch
> 
> > git submodule update
> 
> > This is twice as many git commands, and almost three times as much
> 
> > typing, to do the same thing.
> 
> > Indeed, this is so complicated I expect that many developers
> > wouldn't
> 
> > bother, and will continue to develop the way we currently do. They
> 
> > would thus continue to be unable to create clang branches that
> > include
> 
> > an llvm revision. :(
> 
> > There are real simplifications and productivity advantages to be
> > had
> 
> > by using a single repository. They will affect essentially every
> 
> > developer who makes changes to subprojects other than LLVM proper,
> 
> > cares about release branches, bisects our code, or builds old
> 
> > revisions.
> 
> > So that's the first part, what we have to gain by using a
> > monolithic
> 
> > repository. Let's address the downsides.
> 
> > If you'll bear with a hypothetical: Imagine you could somehow make
> > the
> 
> > monolithic repository behave exactly like the N separate
> > repositories
> 
> > work today. If so, that would be the best of both worlds: Those of
> > us
> 
> > who want a monolithic repository could have one, and those of us
> > who
> 
> > don't would be unaffected. Whatever downsides you were worried
> > about
> 
> > would evaporate in a mist of rainbows and puppies.
> 
> > It turns out this hypothetical is very close to reality. The key is
> 
> > git sparse checkouts [1], which let you check out only some files
> > or
> 
> > directories from a repository. Using this facility, if you don't
> > like
> 
> > the switch to a monolithic repository, you can set up your git so
> 
> > you're (almost) entirely unaffected by it.
> 
> > If you want to check out only llvm and clang, no problem. Just set
> > up
> 
> > your .git/info/sparse-checkout file appropriately. Done.
> 
> > If you want to be able to have two different revisions of llvm and
> 
> > clang checked out at once (maybe you want to update your clang bits
> 
> > more often than you update your llvm bits), you can do that too.
> > Make
> 
> > one sparse checkout just of llvm, and make another sparse checkout
> 
> > just of clang. Symlink the clang checkout to llvm/tools/clang.
> 
> > That's it. The two checkouts can even share a common .git dir, so
> > you
> 
> > don't have to fetch and store everything twice.
> 
> > As far as I can tell, the only overhead of the monolithic
> > repository
> 
> > is the extra storage in .git. But this is quite small in the scheme
> 
> > of things.
> 
> > The .git dir for the existing monolithic repository [2] is 1.2GB.
> > By
> 
> > way of comparison, my objdir for a release build of llvm and clang
> > is
> 
> > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang
> > is
> 
> > 0.65G.
> 
> > If the 1.2G really is a problem for you (or more likely, your
> 
> > automated infrastructure), a shallow clone [3] takes this down to
> > 90M.
> 
> > The critical point to me in all this is that it's easy to set up
> > the
> 
> > monolithic repository to appear like it's a bunch of separate
> > repos.
> 
> > But it is impossible, insofar as I can tell, to do the opposite.
> > That
> 
> > is, option (b) is strictly more powerful than option (a).
> 
> > Renato has understandably pointed out that the current proposal is
> 
> > pretty far along, so please speak up now if you want to make this
> 
> > happen. I think we can.
> 
> > Regards,
> 
> > -Justin
> 
> > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For
> > more
> 
> > info, see
> >
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
> > .
> 
> > As far as I can tell, sparse checkouts work fine on Windows, but
> > you
> 
> > have to use git-bash, see http://stackoverflow.com/q/23289006 .
> 
> > [2] https://github.com/llvm-project/llvm-project
> 
> > [3] git clone --depth=1
> > https://github.com/llvm-project/llvm-project.git
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/d37625a4/attachment.html>

Piotr Padlewski via llvm-dev

2016-Jul-22 20:18 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

I have one reasone why we should not moe to monolithic repository - If you
do some light stuff like clang-tidy, that don't often require syncing with
clang, but you still want to have the most recent checks, then I don't see
a solution in monolithic repository.
And this is a real issue if you only have 2 or 4 core laptop to do work.
And I guess the the build system won't solve the problem, just a small
change in some llvm file will result in recompiling many files that
clang-tidy depends on.

2016-07-22 13:08 GMT-07:00 Richard Smith via llvm-dev <
llvm-dev at lists.llvm.org>:
> Having read through the entire thread and thought about this for a while,
> here are my thoughts:
>
>  * A single monolithic repository has quite a lot of advantages, some
> because of what it is (for instance, you can make atomic cross-project
> commits), and some because of what it isn't (keeping the repositories
> separate creates synchronization problems for version-locked components,
> and it's not clear to me that we have a good answer for these problems)
>
>  * A single repository from which we can build a complete LLVM toolchain,
> without requiring checking out a dozen components in seemingly-random
> locations, would be valuable. The default behavior for someone checking out
> and building the LLVM project should be that they get a complete,
> fully-functional toolchain.
>
>  * We need to preserve and maintain the easy ability to mix and match LLVM
> components with other components (other C runtime libraries, C++ ABI
> libraries, C++ standard libraries, linkers, debuggers, ...). That means
> that it needs to be obvious what the boundaries of the optional components
> are, which means that the current project layout (the one implied by the
> build system) is not good enough for a monolithic repository (LLVM tests
> will fail if you don't check out llvm/tools/opt, but we presumably want
to
> explicitly support not checking out llvm/tools/clang) -- unless we have
> extensive documentation covering this, and even then there are likely to be
> discoverability issues.
>
> However, the move to git and the reorganization need not be done at the
> same time, and it seems vastly easier to reorganize *after* we move to a
> monolithic git repository -- it would then be essentially trivial for each
> person with organizational ideas to move the code around in their
> monolithic git repository, push it somewhere where we can all look at it,
> and for us to then make an informed choice about the layout, with a
> concrete example in front of us. Then we push the selected new layout; git
> supports this really nicely if all the parts are already in a single
> repository.
>
> So here's what I would suggest:
>
> - we move to a monolithic git repository on github
>
> - this monolithic repository contains all the LLVM subprojects necessary
> to build a complete toolchain, including libc++ and other pieces that are
> not version-locked to llvm or clang
>
> - the initial structure exactly matches the current layout implied by the
> build system (clang in tools/clang, lld in tools/lld, compiler-rt in
> runtimes/compiler-rt, libc++ in projects/libcxx, and so on)
>
> - after we transition to git, interested parties assemble and upload to
> github patches reorganizing the project structure, and we have another
> discussion about principles for the restructuring (including forming solid
> guidance for how to organize future additions to LLVM), with reference to
> the patches so we can look at the proposed new layout; we pick one and
> commit it
>
> The goal would be to have the new layout entirely settled by the time 4.0
> branches.
>
> On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Dear all,
>>
>> I would like to (re-)open a discussion on the following specific
question:
>>
>>   Assuming we are moving the llvm project to git, should we
>>   a) use multiple git repositories, linked together as subrepositories
>> of an umbrella repo, or
>>   b) use a single git repository for most llvm subprojects.
>>
>> The current proposal assembled by Renato follows option (a), but I
>> think option (b) will be significantly simpler and more effective.
>> Moreover, I think the issues raised with option (b) are either
>> incorrect or can be reasonably addressed.
>>
>> Specifically, my proposal is that all LLVM subprojects that are
>> "version-locked" (and/or use the common CMake build system)
live in a
>> single git repository.  That probably means all of the main llvm
>> subprojects other than the test-suite and maybe libc++.  From looking
>> at the repository today that would be: llvm, clang, clang-tools-extra,
>> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>>
>> Let's first talk about the advantages of a single repository.  Then
>> we'll address the disadvantages raised.
>>
>> At a high level, one repository is simpler than multiple repos that
>> must be kept in sync using an external mechanism.  The submodules
>> solution requires nontrivial automation to maintain the history of
>> commits in the umbrella repo (which we need if we want to bisect, or
>> even just build an old revision of clang), but no such mechanisms are
>> required if we have a single repo.
>>
>> Similarly, it's possible to make atomic API changes across
subprojects
>> in a single repo; we simply can't do with the submodules proposal.
>> And working with llvm release branches becomes much simpler.
>>
>> In addition, the single repository approach ties branches that contain
>> changes to subprojects (e.g. clang) to a specific version of llvm
>> proper.  This means that when you switch between two branches that
>> contain changes to clang, you'll automatically check out the right
>> llvm bits.
>>
>> Although we can do this with submodules too, a single repository makes
>> it much easier.
>>
>> As a concrete example, suppose you are working on some changes in
>> clang.  You want to commit the changes, then switch to a new branch
>> based on tip of head and make some new changes.  Finally you want to
>> switch back to your original branch.  And when you switch between
>> branches, you want to get an llvm that's in sync with the clang in
>> your working copy.
>>
>> Here's how I'd do it with a monolithic git repository, option
(b):
>>
>>   git commit # old-branch
>>   git fetch
>>   git checkout -b new-branch origin/master
>>   # hack hack hack
>>   git commit # new-branch
>>   git checkout old-branch
>>
>> Here's how I'd do it with option (a), submodules.  I've
used git -C
>> here to make it explicit which repo we're working in, but in real
life
>> I'd probably use cd.
>>
>>   # First, commit to two branches, one in your clang repo and one in
your
>>   # master repo.
>>   git -C tools/clang commit # old-branch, clang submodule
>>   git commit # old-branch, master repo
>>   # Now fetch the submodule and check out head.  Start a new branch in
the
>>   # umbrella repo.
>>   git submodule foreach fetch
>>   git checkout -b origin/master new-branch
>>   git submodule update
>>   # Start a new branch in the clang repo pointing to the current head.
>>   git checkout -b -C tools/clang new-branch
>>   # hack hack hack
>>   # Commit both branches.
>>   git commit -C tools/clang # new-branch
>>   git commit # new-branch
>>   # Check out the old branch.
>>   git checkout old-branch
>>   git submodule update
>>
>> This is twice as many git commands, and almost three times as much
>> typing, to do the same thing.
>>
>> Indeed, this is so complicated I expect that many developers
wouldn't
>> bother, and will continue to develop the way we currently do.  They
>> would thus continue to be unable to create clang branches that include
>> an llvm revision.  :(
>>
>> There are real simplifications and productivity advantages to be had
>> by using a single repository.  They will affect essentially every
>> developer who makes changes to subprojects other than LLVM proper,
>> cares about release branches, bisects our code, or builds old
>> revisions.
>>
>>
>> So that's the first part, what we have to gain by using a
monolithic
>> repository.  Let's address the downsides.
>>
>> If you'll bear with a hypothetical: Imagine you could somehow make
the
>> monolithic repository behave exactly like the N separate repositories
>> work today.  If so, that would be the best of both worlds: Those of us
>> who want a monolithic repository could have one, and those of us who
>> don't would be unaffected.  Whatever downsides you were worried
about
>> would evaporate in a mist of rainbows and puppies.
>>
>> It turns out this hypothetical is very close to reality.  The key is
>> git sparse checkouts [1], which let you check out only some files or
>> directories from a repository.  Using this facility, if you don't
like
>> the switch to a monolithic repository, you can set up your git so
>> you're (almost) entirely unaffected by it.
>>
>> If you want to check out only llvm and clang, no problem. Just set up
>> your .git/info/sparse-checkout file appropriately.  Done.
>>
>> If you want to be able to have two different revisions of llvm and
>> clang checked out at once (maybe you want to update your clang bits
>> more often than you update your llvm bits), you can do that too.  Make
>> one sparse checkout just of llvm, and make another sparse checkout
>> just of clang.  Symlink the clang checkout to llvm/tools/clang.
>> That's it.  The two checkouts can even share a common .git dir, so
you
>> don't have to fetch and store everything twice.
>>
>> As far as I can tell, the only overhead of the monolithic repository
>> is the extra storage in .git.  But this is quite small in the scheme
>> of things.
>>
>> The .git dir for the existing monolithic repository [2] is 1.2GB.  By
>> way of comparison, my objdir for a release build of llvm and clang is
>> 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
>> 0.65G.
>>
>> If the 1.2G really is a problem for you (or more likely, your
>> automated infrastructure), a shallow clone [3] takes this down to 90M.
>>
>> The critical point to me in all this is that it's easy to set up
the
>> monolithic repository to appear like it's a bunch of separate
repos.
>> But it is impossible, insofar as I can tell, to do the opposite.  That
>> is, option (b) is strictly more powerful than option (a).
>>
>>
>> Renato has understandably pointed out that the current proposal is
>> pretty far along, so please speak up now if you want to make this
>> happen.  I think we can.
>>
>> Regards,
>> -Justin
>>
>> [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
>> info, see
>>
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>> .
>> As far as I can tell, sparse checkouts work fine on Windows, but you
>> have to use git-bash, see http://stackoverflow.com/q/23289006.
>> [2] https://github.com/llvm-project/llvm-project
>> [3] git clone --depth=1
https://github.com/llvm-project/llvm-project.git
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/ce99e985/attachment-0001.html>

Chandler Carruth via llvm-dev

2016-Jul-22 20:36 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Fri, Jul 22, 2016 at 1:08 PM Richard Smith via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Having read through the entire thread and thought about this for a while,
> here are my thoughts:
>
>  * A single monolithic repository has quite a lot of advantages, some
> because of what it is (for instance, you can make atomic cross-project
> commits), and some because of what it isn't (keeping the repositories
> separate creates synchronization problems for version-locked components,
> and it's not clear to me that we have a good answer for these problems)
>
>  * A single repository from which we can build a complete LLVM toolchain,
> without requiring checking out a dozen components in seemingly-random
> locations, would be valuable. The default behavior for someone checking out
> and building the LLVM project should be that they get a complete,
> fully-functional toolchain.
>
>  * We need to preserve and maintain the easy ability to mix and match LLVM
> components with other components (other C runtime libraries, C++ ABI
> libraries, C++ standard libraries, linkers, debuggers, ...). That means
> that it needs to be obvious what the boundaries of the optional components
> are, which means that the current project layout (the one implied by the
> build system) is not good enough for a monolithic repository (LLVM tests
> will fail if you don't check out llvm/tools/opt, but we presumably want
to
> explicitly support not checking out llvm/tools/clang) -- unless we have
> extensive documentation covering this, and even then there are likely to be
> discoverability issues.
>
> However, the move to git and the reorganization need not be done at the
> same time, and it seems vastly easier to reorganize *after* we move to a
> monolithic git repository -- it would then be essentially trivial for each
> person with organizational ideas to move the code around in their
> monolithic git repository, push it somewhere where we can all look at it,
> and for us to then make an informed choice about the layout, with a
> concrete example in front of us. Then we push the selected new layout; git
> supports this really nicely if all the parts are already in a single
> repository.
>
> So here's what I would suggest:
>
> - we move to a monolithic git repository on github
>
> - this monolithic repository contains all the LLVM subprojects necessary
> to build a complete toolchain, including libc++ and other pieces that are
> not version-locked to llvm or clang
>
> - the initial structure exactly matches the current layout implied by the
> build system (clang in tools/clang, lld in tools/lld, compiler-rt in
> runtimes/compiler-rt, libc++ in projects/libcxx, and so on)
>
> - after we transition to git, interested parties assemble and upload to
> github patches reorganizing the project structure, and we have another
> discussion about principles for the restructuring (including forming solid
> guidance for how to organize future additions to LLVM), with reference to
> the patches so we can look at the proposed new layout; we pick one and
> commit it
>
> The goal would be to have the new layout entirely settled by the time 4.0
> branches.
>
Strong +1 to all of this. It was what I was trying to suggest, but more
explicitly written.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/1ff431f5/attachment.html>

Tom Stellard via llvm-dev

2016-Jul-23 00:33 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Fri, Jul 22, 2016 at 01:08:18PM -0700, Richard Smith via llvm-dev
wrote:> Having read through the entire thread and thought about this for a while,
> here are my thoughts:
> 
>  * A single monolithic repository has quite a lot of advantages, some
> because of what it is (for instance, you can make atomic cross-project
> commits), and some because of what it isn't (keeping the repositories
> separate creates synchronization problems for version-locked components,
> and it's not clear to me that we have a good answer for these problems)
> 
>  * A single repository from which we can build a complete LLVM toolchain,
> without requiring checking out a dozen components in seemingly-random
> locations, would be valuable. The default behavior for someone checking out
> and building the LLVM project should be that they get a complete,
> fully-functional toolchain.
> 
>  * We need to preserve and maintain the easy ability to mix and match LLVM
> components with other components (other C runtime libraries, C++ ABI
> libraries, C++ standard libraries, linkers, debuggers, ...). That means
> that it needs to be obvious what the boundaries of the optional components
> are, which means that the current project layout (the one implied by the
> build system) is not good enough for a monolithic repository (LLVM tests
> will fail if you don't check out llvm/tools/opt, but we presumably want
to
> explicitly support not checking out llvm/tools/clang) -- unless we have
> extensive documentation covering this, and even then there are likely to be
> discoverability issues.
> 
> However, the move to git and the reorganization need not be done at the
> same time, and it seems vastly easier to reorganize *after* we move to a
> monolithic git repository -- it would then be essentially trivial for each
> person with organizational ideas to move the code around in their
> monolithic git repository, push it somewhere where we can all look at it,
> and for us to then make an informed choice about the layout, with a
> concrete example in front of us. Then we push the selected new layout; git
> supports this really nicely if all the parts are already in a single
> repository.
> 
I am also in favor of using a monolithic repo.  We are currently
using the monolithic llvm-project repo[1] for some of our automated
testing, and it is much easier to deal with than the separate repos.
Especially, in our case were we always build a complete toolchain
(for us this means lvm, lld, and clang).

-Tom

[1] https://github.com/llvm-project/llvm-project

> So here's what I would suggest:
> 
> - we move to a monolithic git repository on github
> 
> - this monolithic repository contains all the LLVM subprojects necessary to
> build a complete toolchain, including libc++ and other pieces that are not
> version-locked to llvm or clang
> 
> - the initial structure exactly matches the current layout implied by the
> build system (clang in tools/clang, lld in tools/lld, compiler-rt in
> runtimes/compiler-rt, libc++ in projects/libcxx, and so on)
> 
> - after we transition to git, interested parties assemble and upload to
> github patches reorganizing the project structure, and we have another
> discussion about principles for the restructuring (including forming solid
> guidance for how to organize future additions to LLVM), with reference to
> the patches so we can look at the proposed new layout; we pick one and
> commit it
> 
> The goal would be to have the new layout entirely settled by the time 4.0
> branches.
> 
> On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> 
> > Dear all,
> >
> > I would like to (re-)open a discussion on the following specific
question:
> >
> >   Assuming we are moving the llvm project to git, should we
> >   a) use multiple git repositories, linked together as subrepositories
> > of an umbrella repo, or
> >   b) use a single git repository for most llvm subprojects.
> >
> > The current proposal assembled by Renato follows option (a), but I
> > think option (b) will be significantly simpler and more effective.
> > Moreover, I think the issues raised with option (b) are either
> > incorrect or can be reasonably addressed.
> >
> > Specifically, my proposal is that all LLVM subprojects that are
> > "version-locked" (and/or use the common CMake build system)
live in a
> > single git repository.  That probably means all of the main llvm
> > subprojects other than the test-suite and maybe libc++.  From looking
> > at the repository today that would be: llvm, clang, clang-tools-extra,
> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
> >
> > Let's first talk about the advantages of a single repository. 
Then
> > we'll address the disadvantages raised.
> >
> > At a high level, one repository is simpler than multiple repos that
> > must be kept in sync using an external mechanism.  The submodules
> > solution requires nontrivial automation to maintain the history of
> > commits in the umbrella repo (which we need if we want to bisect, or
> > even just build an old revision of clang), but no such mechanisms are
> > required if we have a single repo.
> >
> > Similarly, it's possible to make atomic API changes across
subprojects
> > in a single repo; we simply can't do with the submodules proposal.
> > And working with llvm release branches becomes much simpler.
> >
> > In addition, the single repository approach ties branches that contain
> > changes to subprojects (e.g. clang) to a specific version of llvm
> > proper.  This means that when you switch between two branches that
> > contain changes to clang, you'll automatically check out the right
> > llvm bits.
> >
> > Although we can do this with submodules too, a single repository makes
> > it much easier.
> >
> > As a concrete example, suppose you are working on some changes in
> > clang.  You want to commit the changes, then switch to a new branch
> > based on tip of head and make some new changes.  Finally you want to
> > switch back to your original branch.  And when you switch between
> > branches, you want to get an llvm that's in sync with the clang in
> > your working copy.
> >
> > Here's how I'd do it with a monolithic git repository, option
(b):
> >
> >   git commit # old-branch
> >   git fetch
> >   git checkout -b new-branch origin/master
> >   # hack hack hack
> >   git commit # new-branch
> >   git checkout old-branch
> >
> > Here's how I'd do it with option (a), submodules.  I've
used git -C
> > here to make it explicit which repo we're working in, but in real
life
> > I'd probably use cd.
> >
> >   # First, commit to two branches, one in your clang repo and one in
your
> >   # master repo.
> >   git -C tools/clang commit # old-branch, clang submodule
> >   git commit # old-branch, master repo
> >   # Now fetch the submodule and check out head.  Start a new branch in
the
> >   # umbrella repo.
> >   git submodule foreach fetch
> >   git checkout -b origin/master new-branch
> >   git submodule update
> >   # Start a new branch in the clang repo pointing to the current head.
> >   git checkout -b -C tools/clang new-branch
> >   # hack hack hack
> >   # Commit both branches.
> >   git commit -C tools/clang # new-branch
> >   git commit # new-branch
> >   # Check out the old branch.
> >   git checkout old-branch
> >   git submodule update
> >
> > This is twice as many git commands, and almost three times as much
> > typing, to do the same thing.
> >
> > Indeed, this is so complicated I expect that many developers
wouldn't
> > bother, and will continue to develop the way we currently do.  They
> > would thus continue to be unable to create clang branches that include
> > an llvm revision.  :(
> >
> > There are real simplifications and productivity advantages to be had
> > by using a single repository.  They will affect essentially every
> > developer who makes changes to subprojects other than LLVM proper,
> > cares about release branches, bisects our code, or builds old
> > revisions.
> >
> >
> > So that's the first part, what we have to gain by using a
monolithic
> > repository.  Let's address the downsides.
> >
> > If you'll bear with a hypothetical: Imagine you could somehow make
the
> > monolithic repository behave exactly like the N separate repositories
> > work today.  If so, that would be the best of both worlds: Those of us
> > who want a monolithic repository could have one, and those of us who
> > don't would be unaffected.  Whatever downsides you were worried
about
> > would evaporate in a mist of rainbows and puppies.
> >
> > It turns out this hypothetical is very close to reality.  The key is
> > git sparse checkouts [1], which let you check out only some files or
> > directories from a repository.  Using this facility, if you don't
like
> > the switch to a monolithic repository, you can set up your git so
> > you're (almost) entirely unaffected by it.
> >
> > If you want to check out only llvm and clang, no problem. Just set up
> > your .git/info/sparse-checkout file appropriately.  Done.
> >
> > If you want to be able to have two different revisions of llvm and
> > clang checked out at once (maybe you want to update your clang bits
> > more often than you update your llvm bits), you can do that too.  Make
> > one sparse checkout just of llvm, and make another sparse checkout
> > just of clang.  Symlink the clang checkout to llvm/tools/clang.
> > That's it.  The two checkouts can even share a common .git dir, so
you
> > don't have to fetch and store everything twice.
> >
> > As far as I can tell, the only overhead of the monolithic repository
> > is the extra storage in .git.  But this is quite small in the scheme
> > of things.
> >
> > The .git dir for the existing monolithic repository [2] is 1.2GB.  By
> > way of comparison, my objdir for a release build of llvm and clang is
> > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
> > 0.65G.
> >
> > If the 1.2G really is a problem for you (or more likely, your
> > automated infrastructure), a shallow clone [3] takes this down to 90M.
> >
> > The critical point to me in all this is that it's easy to set up
the
> > monolithic repository to appear like it's a bunch of separate
repos.
> > But it is impossible, insofar as I can tell, to do the opposite.  That
> > is, option (b) is strictly more powerful than option (a).
> >
> >
> > Renato has understandably pointed out that the current proposal is
> > pretty far along, so please speak up now if you want to make this
> > happen.  I think we can.
> >
> > Regards,
> > -Justin
> >
> > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
> > info, see
> >
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
> > .
> > As far as I can tell, sparse checkouts work fine on Windows, but you
> > have to use git-bash, see http://stackoverflow.com/q/23289006.
> > [2] https://github.com/llvm-project/llvm-project
> > [3] git clone --depth=1
https://github.com/llvm-project/llvm-project.git
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

2016-Jul-24 17:31 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 22, 2016, at 1:08 PM, Richard Smith via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Having read through the entire thread and thought about this for a while,
here are my thoughts:
> 
>  * A single monolithic repository has quite a lot of advantages, some
because of what it is (for instance, you can make atomic cross-project commits),
and some because of what it isn't (keeping the repositories separate creates
synchronization problems for version-locked components, and it's not clear
to me that we have a good answer for these problems)
> 
>  * A single repository from which we can build a complete LLVM toolchain,
without requiring checking out a dozen components in seemingly-random locations,
would be valuable. The default behavior for someone checking out and building
the LLVM project should be that they get a complete, fully-functional toolchain.
> 
>  * We need to preserve and maintain the easy ability to mix and match LLVM
components with other components (other C runtime libraries, C++ ABI libraries,
C++ standard libraries, linkers, debuggers, ...). That means that it needs to be
obvious what the boundaries of the optional components are, which means that the
current project layout (the one implied by the build system) is not good enough
for a monolithic repository (LLVM tests will fail if you don't check out
llvm/tools/opt, but we presumably want to explicitly support not checking out
llvm/tools/clang) -- unless we have extensive documentation covering this, and
even then there are likely to be discoverability issues.
> 
> However, the move to git and the reorganization need not be done at the
same time, and it seems vastly easier to reorganize *after* we move to a
monolithic git repository -- it would then be essentially trivial for each
person with organizational ideas to move the code around in their monolithic git
repository, push it somewhere where we can all look at it, and for us to then
make an informed choice about the layout, with a concrete example in front of
us. Then we push the selected new layout; git supports this really nicely if all
the parts are already in a single repository.
> 
> So here's what I would suggest:
> 
> - we move to a monolithic git repository on github
> 
> - this monolithic repository contains all the LLVM subprojects necessary to
build a complete toolchain, including libc++ and other pieces that are not
version-locked to llvm or clang
> 
> - the initial structure exactly matches the current layout implied by the
build system (clang in tools/clang, lld in tools/lld, compiler-rt in
runtimes/compiler-rt, libc++ in projects/libcxx, and so on)
It is not clear to me how this plays with your earlier claim: "That means
that it needs to be obvious what the boundaries of the optional components are,
which means that the current project layout (the one implied by the build
system) is not good enough for a monolithic repository”.

The “flat” has the merit to keep the independent component clearly separated as
they are today.

— 
Mehdi


> 
> - after we transition to git, interested parties assemble and upload to
github patches reorganizing the project structure, and we have another
discussion about principles for the restructuring (including forming solid
guidance for how to organize future additions to LLVM), with reference to the
patches so we can look at the proposed new layout; we pick one and commit it
> 
> The goal would be to have the new layout entirely settled by the time 4.0
branches.
> 
> On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Dear all,
> 
> I would like to (re-)open a discussion on the following specific question:
> 
>   Assuming we are moving the llvm project to git, should we
>   a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>   b) use a single git repository for most llvm subprojects.
> 
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
> 
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live
in a
> single git repository.  That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++.  From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
> 
> Let's first talk about the advantages of a single repository.  Then
> we'll address the disadvantages raised.
> 
> At a high level, one repository is simpler than multiple repos that
> must be kept in sync using an external mechanism.  The submodules
> solution requires nontrivial automation to maintain the history of
> commits in the umbrella repo (which we need if we want to bisect, or
> even just build an old revision of clang), but no such mechanisms are
> required if we have a single repo.
> 
> Similarly, it's possible to make atomic API changes across subprojects
> in a single repo; we simply can't do with the submodules proposal.
> And working with llvm release branches becomes much simpler.
> 
> In addition, the single repository approach ties branches that contain
> changes to subprojects (e.g. clang) to a specific version of llvm
> proper.  This means that when you switch between two branches that
> contain changes to clang, you'll automatically check out the right
> llvm bits.
> 
> Although we can do this with submodules too, a single repository makes
> it much easier.
> 
> As a concrete example, suppose you are working on some changes in
> clang.  You want to commit the changes, then switch to a new branch
> based on tip of head and make some new changes.  Finally you want to
> switch back to your original branch.  And when you switch between
> branches, you want to get an llvm that's in sync with the clang in
> your working copy.
> 
> Here's how I'd do it with a monolithic git repository, option (b):
> 
>   git commit # old-branch
>   git fetch
>   git checkout -b new-branch origin/master
>   # hack hack hack
>   git commit # new-branch
>   git checkout old-branch
> 
> Here's how I'd do it with option (a), submodules.  I've used
git -C
> here to make it explicit which repo we're working in, but in real life
> I'd probably use cd.
> 
>   # First, commit to two branches, one in your clang repo and one in your
>   # master repo.
>   git -C tools/clang commit # old-branch, clang submodule
>   git commit # old-branch, master repo
>   # Now fetch the submodule and check out head.  Start a new branch in the
>   # umbrella repo.
>   git submodule foreach fetch
>   git checkout -b origin/master new-branch
>   git submodule update
>   # Start a new branch in the clang repo pointing to the current head.
>   git checkout -b -C tools/clang new-branch
>   # hack hack hack
>   # Commit both branches.
>   git commit -C tools/clang # new-branch
>   git commit # new-branch
>   # Check out the old branch.
>   git checkout old-branch
>   git submodule update
> 
> This is twice as many git commands, and almost three times as much
> typing, to do the same thing.
> 
> Indeed, this is so complicated I expect that many developers wouldn't
> bother, and will continue to develop the way we currently do.  They
> would thus continue to be unable to create clang branches that include
> an llvm revision.  :(
> 
> There are real simplifications and productivity advantages to be had
> by using a single repository.  They will affect essentially every
> developer who makes changes to subprojects other than LLVM proper,
> cares about release branches, bisects our code, or builds old
> revisions.
> 
> 
> So that's the first part, what we have to gain by using a monolithic
> repository.  Let's address the downsides.
> 
> If you'll bear with a hypothetical: Imagine you could somehow make the
> monolithic repository behave exactly like the N separate repositories
> work today.  If so, that would be the best of both worlds: Those of us
> who want a monolithic repository could have one, and those of us who
> don't would be unaffected.  Whatever downsides you were worried about
> would evaporate in a mist of rainbows and puppies.
> 
> It turns out this hypothetical is very close to reality.  The key is
> git sparse checkouts [1], which let you check out only some files or
> directories from a repository.  Using this facility, if you don't like
> the switch to a monolithic repository, you can set up your git so
> you're (almost) entirely unaffected by it.
> 
> If you want to check out only llvm and clang, no problem. Just set up
> your .git/info/sparse-checkout file appropriately.  Done.
> 
> If you want to be able to have two different revisions of llvm and
> clang checked out at once (maybe you want to update your clang bits
> more often than you update your llvm bits), you can do that too.  Make
> one sparse checkout just of llvm, and make another sparse checkout
> just of clang.  Symlink the clang checkout to llvm/tools/clang.
> That's it.  The two checkouts can even share a common .git dir, so you
> don't have to fetch and store everything twice.
> 
> As far as I can tell, the only overhead of the monolithic repository
> is the extra storage in .git.  But this is quite small in the scheme
> of things.
> 
> The .git dir for the existing monolithic repository [2] is 1.2GB.  By
> way of comparison, my objdir for a release build of llvm and clang is
> 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
> 0.65G.
> 
> If the 1.2G really is a problem for you (or more likely, your
> automated infrastructure), a shallow clone [3] takes this down to 90M.
> 
> The critical point to me in all this is that it's easy to set up the
> monolithic repository to appear like it's a bunch of separate repos.
> But it is impossible, insofar as I can tell, to do the opposite.  That
> is, option (b) is strictly more powerful than option (a).
> 
> 
> Renato has understandably pointed out that the current proposal is
> pretty far along, so please speak up now if you want to make this
> happen.  I think we can.
> 
> Regards,
> -Justin
> 
> [1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
> info, see
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
<http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/>.
> As far as I can tell, sparse checkouts work fine on Windows, but you
> have to use git-bash, see http://stackoverflow.com/q/23289006
<http://stackoverflow.com/q/23289006>.
> [2] https://github.com/llvm-project/llvm-project
<https://github.com/llvm-project/llvm-project>
> [3] git clone --depth=1 https://github.com/llvm-project/llvm-project.git
<https://github.com/llvm-project/llvm-project.git>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160724/861baded/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Apparently Analagous Threads