thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Justin Lebar via llvm-dev

2016-Jul-25 16:54 UTC

[llvm-dev] [RFC] One or many git repositories?

Hi, all.

I feel like we've strayed pretty far from the question originally
posed in this thread.

One of the pieces of feedback I got before I started this thread was
that many people felt that, the last time the question of multiple
repos vs. monorepo was discussed, it was interspersed with other
topics, making it difficult for some people to weigh in appropriately
(or even to be aware that the discussion was occurring).  I'm afraid
that the discussion of github workflows we're having here may cause
the same problem.

Maybe we can move the discussion about github workflows into a
different thread?  Again, I don't mean to stop it, just move it.

To re-focus this thread on its original topic: It sounds to me like,
broadly speaking, we have consensus on using a single repository.  But
there are still some outstanding related questions.  Among these are:

1) Should the repository have "unified history"?  (Meaning, should I
be able to check out a single git revision from before the migration
and have it contain all of the llvm subprojects?)

2) Should the monorepo have a "nested" repository layout (e.g. clang
goes in /tools/clang) or a "flat" layout (clang goes in /clang)?

3) Assuming we want unified history, should the new canonical
repository's hashes be based on
https://github.com/llvm-project/llvm-project, or should it start
afresh?

FWIW my answers to these are:

1) Yes to unified history.  The main advantage of non-unified history
is that it's easier for people to import old branches -- it's a matter
of "git merge" instead of running the git filter-branch script I
wrote.  But this is a relatively small (~20 minute) one-time cost to
some of us, whereas our repository history is born by all of us
forever.  Moreover unified history also helps people with long-running
branches, as it lets them check out old versions of their branch and
get a compatible version of all of the other llvm subprojects.

2) Yes to nested layout.  I find Chandler and Richard Smith's
arguments compelling.

3) No to basing the new canonical repo on
https://github.com/llvm-project/llvm-project.  That repo's history is
missing svn revision numbers, and there are enough emails floating
around that reference svn revision numbers that I think we need them
in our canonical repo.  Also llvm-project/llvm-project has a flat
structure, and if we end up going with a nested layout, it would be
better to have that layout starting with the first commit.

-Justin

On Mon, Jul 25, 2016 at 8:10 AM, Bruce Hoult via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> git-imerge can run an arbitrary script to decide whether a commit is good
or
> bad. Lack of textual merge conflicts is only the most basic test -- you can
> check that it compiles, run tests .. whatever you want and have time to
> execute.
>
> On Tue, Jul 26, 2016 at 2:12 AM, Robinson, Paul via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>> > -----Original Message-----
>> > From: Renato Golin [mailto:renato.golin at linaro.org]
>> > Sent: Monday, July 25, 2016 7:11 AM
>> > To: Daniel Sanders
>> > Cc: Robinson, Paul; llvm-dev at lists.llvm.org
>> > Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>> >
>> > On 25 July 2016 at 14:55, Daniel Sanders <Daniel.Sanders at
imgtec.com>
>> > wrote:
>> > > I know of a way but it's not very nice. The gist of it is
to checkout
>> > the
>> > > downstream branch just before the bad merge and then merge
the first
>> > > 100 commits from upstream. If the result is good then merge
the next
>> > > 100, but if it's bad then 'git reset --hard' and
merge 10 instead.
>> > You'll
>> > > eventually find the commit that made it bad. Essentially, the
idea is
>> > > to
>> > > make a throwaway branch that merges more frequently. I do
something
>> > > similar to rebase my work to master since gradually rebasing
often
>> > > causes all the conflicts to go away.
>> >
>> > This is essentially what git-imerge does, you only need to define
>> > "good merge" in the form of a script or CI job.
>> >
>> > cheers,
>> > -renato
>>
>> Except I understood git-imerge to be looking for physical conflicts,
>> not "when did this test start failing."  If it does the
latter also,
>> that would be awesome.
>> --paulr
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

NAKAMURA Takumi via llvm-dev

2016-Jul-25 17:34 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Tue, Jul 26, 2016 at 1:54 AM Justin Lebar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> To re-focus this thread on its original topic: It sounds to me like,
> broadly speaking, we have consensus on using a single repository.  But
> there are still some outstanding related questions.  Among these are:
>
> 1) Should the repository have "unified history"?  (Meaning,
should I
> be able to check out a single git revision from before the migration
> and have it contain all of the llvm subprojects?)
>
Yes, I suggest we may provide the unified tree, as an option.
I don't agree for us to move to the unified tree.

2) Should the monorepo have a "nested" repository layout (e.g.
clang> goes in /tools/clang) or a "flat" layout (clang goes in /clang)?
>
No, I prefer a flat tree.
That said, I don't object if anyone released a nested tree.

Regarding to technical reason, the flat tree contains root trees in each
sub repository.
For example, both llvm.git and llvm-project(tree).git have the
tree 9697e220d778081eef8d0c507dea35b53042ea9e .

I suppose the nested layout may be by historical reason.
(IMO, clang and other subprojects might be moved to projects from tools)

> 3) Assuming we want unified history, should the new canonical
> repository's hashes be based on
> https://github.com/llvm-project/llvm-project, or should it start
> afresh?
>
Yes and no.
In past, I rebased the tree when I add new projects.

I still wonder what I could do when I add/remove projects.
I'd like to add some projects in near future.
.
I could remove dragonegg and klee. But I think the tree may contain all
existing subproject.git.

FWIW my answers to these are:>
> 1) Yes to unified history.  The main advantage of non-unified history
> is that it's easier for people to import old branches -- it's a
matter
> of "git merge" instead of running the git filter-branch script I
> wrote.  But this is a relatively small (~20 minute) one-time cost to
> some of us, whereas our repository history is born by all of us
> forever.  Moreover unified history also helps people with long-running
> branches, as it lets them check out old versions of their branch and
> get a compatible version of all of the other llvm subprojects.
>
> 2) Yes to nested layout.  I find Chandler and Richard Smith's
> arguments compelling.
>
> 3) No to basing the new canonical repo on
> https://github.com/llvm-project/llvm-project.  That repo's history is
> missing svn revision numbers, and there are enough emails floating
> around that reference svn revision numbers that I think we need them
> in our canonical repo.  Also llvm-project/llvm-project has a flat
> structure, and if we end up going with a nested layout, it would be
> better to have that layout starting with the first commit.
>
It has refs/notes/commits.

  fetch = +refs/notes/commits:refs/notes/commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/25f5a49a/attachment.html>

Chandler Carruth via llvm-dev

2016-Jul-25 18:34 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Mon, Jul 25, 2016 at 9:54 AM Justin Lebar via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> FWIW my answers to these are:
>
> 1) Yes to unified history.  The main advantage of non-unified history
> is that it's easier for people to import old branches -- it's a
matter
> of "git merge" instead of running the git filter-branch script I
> wrote.  But this is a relatively small (~20 minute) one-time cost to
> some of us, whereas our repository history is born by all of us
> forever.  Moreover unified history also helps people with long-running
> branches, as it lets them check out old versions of their branch and
> get a compatible version of all of the other llvm subprojects.
>
I strongly agree about this.

>
> 2) Yes to nested layout.  I find Chandler and Richard Smith's
> arguments compelling.
>
I think it is important to note what Richard pointed out: *we will almost
certainly restructure the tree to make more sense in a monorepo*.

I think the result is actually very likely to look much more flat than the
current layout, and to also be significantly superior to any of the current
layouts.

I just don't want people to think this locks us into a particular nested
layout for all time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/9bdaf1fd/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-25 20:21 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 25, 2016, at 9:54 AM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi, all.
> 
> I feel like we've strayed pretty far from the question originally
> posed in this thread.
> 
> One of the pieces of feedback I got before I started this thread was
> that many people felt that, the last time the question of multiple
> repos vs. monorepo was discussed, it was interspersed with other
> topics, making it difficult for some people to weigh in appropriately
> (or even to be aware that the discussion was occurring).  I'm afraid
> that the discussion of github workflows we're having here may cause
> the same problem.
> 
> Maybe we can move the discussion about github workflows into a
> different thread?  Again, I don't mean to stop it, just move it.
> 
> To re-focus this thread on its original topic: It sounds to me like,
> broadly speaking, we have consensus on using a single repository.  But
> there are still some outstanding related questions.  Among these are:
> 
> 1) Should the repository have "unified history"?  (Meaning,
should I
> be able to check out a single git revision from before the migration
> and have it contain all of the llvm subprojects?)
> 2) Should the monorepo have a "nested" repository layout (e.g.
clang
> goes in /tools/clang) or a "flat" layout (clang goes in /clang)?
> 
> 3) Assuming we want unified history, should the new canonical
> repository's hashes be based on
> https://github.com/llvm-project/llvm-project, or should it start
> afresh?
> 
> FWIW my answers to these are:
> 
> 1) Yes to unified history.  The main advantage of non-unified history
> is that it's easier for people to import old branches -- it's a
matter
> of "git merge" instead of running the git filter-branch script I
> wrote.  But this is a relatively small (~20 minute) one-time cost to
> some of us, whereas our repository history is born by all of us
> forever.  Moreover unified history also helps people with long-running
> branches, as it lets them check out old versions of their branch and
> get a compatible version of all of the other llvm subprojects.
I think this is a nice property to have (unified rewritten history).
The fact that the existing hashes in the official git repo won’t exist in this
new repository can be efficiently counter-balanced by continuing to update
“forever” the existing official read-only repositories. This way clients that
are already based off these won’t have to change anything to their workflow
(unless they need the git-svn id).
> 
> 2) Yes to nested layout.  I find Chandler and Richard Smith's
> arguments compelling.
The flat layout is less disruptive and I haven't read a compelling argument
to me to not adopt it.
As they mentioned: "we will almost certainly restructure the tree to make
more sense in a monorepo* and "the result is actually very likely to look
much more flat than the current layout, and to also be significantly superior to
any of the current layouts.”.
This makes me think that on the contrary, starting with a flat repo and then
moving pieces where they make sense while adjusting the build system is more
likely to converge with a more “ideal” layout. This is also a less disruptive
process.
Finally, starting with a nested layout requires immediate changes to the build
system. And I haven’t seen a proposal for a non trivial interface to not disrupt
the existing flow (i.e. cmake only builds LLVM by default and nothing else).

> 3) No to basing the new canonical repo on
> https://github.com/llvm-project/llvm-project.  That repo's history is
> missing svn revision numbers, and there are enough emails floating
> around that reference svn revision numbers that I think we need them
> in our canonical repo.  Also llvm-project/llvm-project has a flat
> structure, and if we end up going with a nested layout, it would be
> better to have that layout starting with the first commit.
I don’t see a reason to preserve the history of this repo (but someone may?).
Also this repo has Klee and dragonegg that I don’t see we want to keep (or at
least I haven’t see anyone asking for support for these).
At the same time I don’t really care about the git-svn id in the commit message
either, it seems this mapping could be archived outside the repo itself, they
won’t make much sense as the time goes by, but will stay there forever.
Maybe an argument to keep them is that the commit message will match the
existing official git repos?

— 
Mehdi

> 
> -Justin
> 
> On Mon, Jul 25, 2016 at 8:10 AM, Bruce Hoult via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> git-imerge can run an arbitrary script to decide whether a commit is
good or
>> bad. Lack of textual merge conflicts is only the most basic test -- you
can
>> check that it compiles, run tests .. whatever you want and have time to
>> execute.
>> 
>> On Tue, Jul 26, 2016 at 2:12 AM, Robinson, Paul via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Renato Golin [mailto:renato.golin at linaro.org]
>>>> Sent: Monday, July 25, 2016 7:11 AM
>>>> To: Daniel Sanders
>>>> Cc: Robinson, Paul; llvm-dev at lists.llvm.org
>>>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>>>> 
>>>> On 25 July 2016 at 14:55, Daniel Sanders <Daniel.Sanders at
imgtec.com>
>>>> wrote:
>>>>> I know of a way but it's not very nice. The gist of it
is to checkout
>>>> the
>>>>> downstream branch just before the bad merge and then merge
the first
>>>>> 100 commits from upstream. If the result is good then merge
the next
>>>>> 100, but if it's bad then 'git reset --hard'
and merge 10 instead.
>>>> You'll
>>>>> eventually find the commit that made it bad. Essentially,
the idea is
>>>>> to
>>>>> make a throwaway branch that merges more frequently. I do
something
>>>>> similar to rebase my work to master since gradually
rebasing often
>>>>> causes all the conflicts to go away.
>>>> 
>>>> This is essentially what git-imerge does, you only need to
define
>>>> "good merge" in the form of a script or CI job.
>>>> 
>>>> cheers,
>>>> -renato
>>> 
>>> Except I understood git-imerge to be looking for physical
conflicts,
>>> not "when did this test start failing."  If it does the
latter also,
>>> that would be awesome.
>>> --paulr
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2016-Jul-25 20:40 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 25 July 2016 at 17:54, Justin Lebar via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> To re-focus this thread on its original topic: It sounds to me like,
> broadly speaking, we have consensus on using a single repository.
In this thread, yes. :)

Can you write up a document on docs/Proposals and add a review?

I think the survey should be about both of them, and allow people to
digress. I'm willing to take the time to read all of them and try to
collate a "big-picture", but I'd appreciate if others could help
me.
:)

I don't think we'll ever get *consensus* on any email thread, as
people do digress too much.

> 1) Should the repository have "unified history"?  (Meaning,
should I
> be able to check out a single git revision from before the migration
> and have it contain all of the llvm subprojects?)
Yes.

Every VCS migration I've been involved so far involved doing this, as
it makes it much easier to deal with comparisons between new and old.
Applying patches, trying benchmarks, etc. all become much easier if
you use a single tool / source to checkout from.

> 2) Should the monorepo have a "nested" repository layout (e.g.
clang
> goes in /tools/clang) or a "flat" layout (clang goes in /clang)?
Slight preference to nested, but I don't mind either way.

We internally checkout separate and create the worktrees as nested, so
nested is slightly better for us.

> 3) Assuming we want unified history, should the new canonical
> repository's hashes be based on
> https://github.com/llvm-project/llvm-project, or should it start
> afresh?
Fresh, please.

https://github.com/joker-eph/llvm-unified

That one seems good, nested or not, but I'm not sure about polly.
Maybe OpenMP (like parallel-libs?).

cheers,
--renato

Mehdi Amini via llvm-dev

2016-Jul-25 20:55 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 25, 2016, at 1:40 PM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 25 July 2016 at 17:54, Justin Lebar via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> To re-focus this thread on its original topic: It sounds to me like,
>> broadly speaking, we have consensus on using a single repository.
> 
> In this thread, yes. :)
> 
> Can you write up a document on docs/Proposals and add a review?
> 
> I think the survey should be about both of them, and allow people to
> digress. I'm willing to take the time to read all of them and try to
> collate a "big-picture", but I'd appreciate if others could
help me.
> :)
> 
> I don't think we'll ever get *consensus* on any email thread, as
> people do digress too much.
> 
> 
>> 1) Should the repository have "unified history"?  (Meaning,
should I
>> be able to check out a single git revision from before the migration
>> and have it contain all of the llvm subprojects?)
> 
> Yes.
> 
> Every VCS migration I've been involved so far involved doing this, as
> it makes it much easier to deal with comparisons between new and old.
> Applying patches, trying benchmarks, etc. all become much easier if
> you use a single tool / source to checkout from.
> 
> 
>> 2) Should the monorepo have a "nested" repository layout
(e.g. clang
>> goes in /tools/clang) or a "flat" layout (clang goes in
/clang)?
> 
> Slight preference to nested, but I don't mind either way.
> 
> We internally checkout separate and create the worktrees as nested, so
> nested is slightly better for us.
> 
> 
>> 3) Assuming we want unified history, should the new canonical
>> repository's hashes be based on
>> https://github.com/llvm-project/llvm-project, or should it start
>> afresh?
> 
> Fresh, please.
> 
> https://github.com/joker-eph/llvm-unified
> 
> That one seems good, nested or not
To clarify: this repo especially is what happen when answering “No” to question
1) (you answered yes above), and is not “Fresh” as it is preserving hashes from
previous repository.
(Leaving aside the nested part)

— 
Mehdi

> , but I'm not sure about polly.
> Maybe OpenMP (like parallel-libs?).
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Y Knight via llvm-dev

2016-Jul-25 21:06 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Mon, Jul 25, 2016 at 2:34 PM, Chandler Carruth via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> 2) Yes to nested layout.  I find Chandler and Richard Smith's
>> arguments compelling.
>>
>
> I think it is important to note what Richard pointed out: *we will almost
> certainly restructure the tree to make more sense in a monorepo*.
>
> I think the result is actually very likely to look much more flat than the
> current layout, and to also be significantly superior to any of the current
> layouts.
>
> I just don't want people to think this locks us into a particular
nested
> layout for all time.
>
I do not find that argument compelling. In particular, while it is
certainly *true* that we can restructure everything later, there is a
significant advantage to making it look like <root>/{llvm,clang} now, if
we
think it will end up looking like that in the near future.

Namely: commands like "git log clang" will actually work to give you
the
history of files in the clang subdir. While this can be worked around, it
must be worked around by everyone who ever invokes the command in the
future. There's no reason to cause that pain, if we know up front that
we're going to want to immediately move to another layout.

We're be better off figuring out the layout now, and switching to it as
part of the migration.

Basically: Yes, we're not locked in "for all time" -- but at least
let's
try to figure out the layout we'd like to have in 6 months, and just do
that up front.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/8455f93b/attachment.html>

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?