thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Jul-29 01:41 UTC

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Aaaand I'm (mostly) caught up. Phew.
> 
> FWIW Chris B is right: I had been put off commenting on this thread by the
length, and the number of git discussions that have come before this. He
convinced me to make the effort to put my 2 cents in though - thanks Chris.
> 
> So - for my use-case I don't have strong feelings one way or the other*
<https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something about
the discussion so far strikes me as dissonant: If we're going to break out
some sub-projects (the test-suite for licensing reasons, the runtimes for
modularity) then it's not really a mono-repo any more. It's a multi-repo
where we've collapsed some (but not all) of the existing repos.
This a narrow view IMO: the criteria #1 Chris mentioned to include projects in
the monorepo was " must be tightly coupled to specific versions”.
It means that even with the test suite (and possibly some runtime) out of the
monorepo, all the software that is tightly coupled would be in the monorepo, and
that alone would be enough to alleviate the needs for (most of the)
tooling/infrastructure.

> To the extent that we have to build tooling to support multiple-repos
(auto-mergers for test bots, command line utils for devs who want the main repo
plus tests plus ...), could we re-use that to keep the existing modular project
setup?
I find it a fairly different scale to clone 3 repos on a bot versus having to
keep multiple repositories *in sync* (i.e. cross repository synchronization).

> This might be a fairly low-benefit proposition if the tools we develop were
only usable by in-tree projects, but there are many other users of LLVM (Swift
leaps to mind since I'm at Apple, but there are many others) who might
appreciate the ability to use LLVM-provided tools to pick-and-mix LLVM projects
into their repos. Otherwise, every downstream user will have to roll some
version of these tools themselves.
Different problems, different tools… I’m against artificially creating
“problems" for upstream developers only because the tooling to solve them
works for downstream users.

— 
Mehdi

> 
> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com
<mailto:beanz at apple.com>> wrote:
> > It is worth pointing out the Jenkins job that runs that is a
playground I setup for myself. It is nowhere near production ready, and it will
fail frequently as I iterate messing around with it.
> 
> Sure, I think that's implied.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/3b014ae4/attachment.html>

Lang Hames via llvm-dev

2016-Jul-29 02:32 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Hi Mehdi,

This a narrow view IMO: the criteria #1 Chris mentioned to include
projects> in the monorepo was " must be tightly coupled to specific versions”.
> It means that even with the test suite (and possibly some runtime) out of
> the monorepo, all the software that is tightly coupled would be in the
> monorepo, and that alone would be enough to alleviate the needs for (most
> of the) tooling/infrastructure.

Fair point, but coupling isn't binary: even the test-suite is coupled to
the versions of clang that can compile it, it's just relatively loose
compared to LLVM/clang.

I find it a fairly different scale to clone 3 repos on a bot versus
having> to keep multiple repositories *in sync* (i.e. cross repository
> synchronization).

I think it depends on the nature of the tools that are required. Bots are
relatively simple since they're only reading from the repos, not writing.
They're not the only use-case I have in mind though.

Different problems, different tools… I’m against artificially
creating> “problems" for upstream developers only because the tooling to solve
them
> works for downstream users.

I don't think these are actually different problems: I would guess that the
problem of collecting some subset of the LLVM projects into a usable
source-tree is shared by many downstream users, and it's common in my
workflows (e.g. just checking out llvm and lld). It will have to be solved
by someone, since downstream users need it even if we adopted a mono-repo.
A shared solution (if it's possible) may be an opportunity to both share
infrastructure with downstream projects and adopt a more modular approach
to the LLVM project sources.

I'm staying deliberately light on specifics here. As I said I don't have
strong feelings yet -- I'm still digesting all the ideas in this thread. To
the extent that I have a gut feeling though, this feels like it introduces
very strong coupling between LLVM project sources (more than is required by
the projects APIs) for the sake of convenience, so I'm trying to consider
the alternatives.

Cheers,
Lang.


On Thu, Jul 28, 2016 at 6:41 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Aaaand I'm (mostly) caught up. Phew.
>
> FWIW Chris B is right: I had been put off commenting on this thread by the
> length, and the number of git discussions that have come before this. He
> convinced me to make the effort to put my 2 cents in though - thanks Chris.
>
> So - for my use-case I don't have strong feelings one way or the other*
> <https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something
about
> the discussion so far strikes me as dissonant: If we're going to break
out
> some sub-projects (the test-suite for licensing reasons, the runtimes for
> modularity) then it's not really a mono-repo any more. It's a
multi-repo
> where we've collapsed some (but not all) of the existing repos.
>
>
> This a narrow view IMO: the criteria #1 Chris mentioned to include
> projects in the monorepo was " must be tightly coupled to specific
> versions”.
> It means that even with the test suite (and possibly some runtime) out of
> the monorepo, all the software that is tightly coupled would be in the
> monorepo, and that alone would be enough to alleviate the needs for (most
> of the) tooling/infrastructure.
>
>
> To the extent that we have to build tooling to support multiple-repos
> (auto-mergers for test bots, command line utils for devs who want the main
> repo plus tests plus ...), could we re-use that to keep the existing
> modular project setup?
>
>
> I find it a fairly different scale to clone 3 repos on a bot versus having
> to keep multiple repositories *in sync* (i.e. cross repository
> synchronization).
>
>
> This might be a fairly low-benefit proposition if the tools we develop
> were only usable by in-tree projects, but there are many other users of
> LLVM (Swift leaps to mind since I'm at Apple, but there are many
others)
> who might appreciate the ability to use LLVM-provided tools to pick-and-mix
> LLVM projects into their repos. Otherwise, every downstream user will have
> to roll some version of these tools themselves.
>
>
> Different problems, different tools… I’m against artificially creating
> “problems" for upstream developers only because the tooling to solve
them
> works for downstream users.
>
> —
> Mehdi
>
>
>
> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com>
wrote:
>> > It is worth pointing out the Jenkins job that runs that is a
playground
>> I setup for myself. It is nowhere near production ready, and it will
fail
>> frequently as I iterate messing around with it.
>>
>> Sure, I think that's implied.
>>
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/565c5843/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-29 04:11 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 7:32 PM, Lang Hames <lhames at gmail.com> wrote:
> 
> Hi Mehdi,
> 
> This a narrow view IMO: the criteria #1 Chris mentioned to include projects
in the monorepo was " must be tightly coupled to specific versions”.
> It means that even with the test suite (and possibly some runtime) out of
the monorepo, all the software that is tightly coupled would be in the monorepo,
and that alone would be enough to alleviate the needs for (most of the)
tooling/infrastructure.
> 
> Fair point, but coupling isn't binary: even the test-suite is coupled
to the versions of clang that can compile it, it's just relatively loose
compared to LLVM/clang.
> 
> I find it a fairly different scale to clone 3 repos on a bot versus having
to keep multiple repositories *in sync* (i.e. cross repository synchronization).
> 
> I think it depends on the nature of the tools that are required. Bots are
relatively simple since they're only reading from the repos, not writing.
They're not the only use-case I have in mind though.
> 
> Different problems, different tools… I’m against artificially creating
“problems" for upstream developers only because the tooling to solve them
works for downstream users.
> 
> I don't think these are actually different problems: I would guess that
the problem of collecting some subset of the LLVM projects into a usable
source-tree is shared by many downstream users, and it's common in my
workflows (e.g. just checking out llvm and lld). It will have to be solved by
someone, since downstream users need it even if we adopted a mono-repo.
What I meant by “different problem" is that “downstream users” for instance
don’t need to commit, that makes their problem/workflow quite different from an
upstream developer (for instance it is fairly easy to maintain a read-only view
of the existing individual git repo currently on llvm.org
<http://llvm.org/>).

Also while we can create scripts for (almost) every scenarios, one have to put
in balance the script that is run once at checkout time vs the set of scripts
required for day-to-day development: for example what if I want to switch my
tree to my work-in-progress branch where I changed a LLVM library to use the new
"Error checking” API and adapted all the other projects that using this
API, and then I want to rebase this branch on master for every projects so that
I can get ready to push. My impression is that a single repo makes this use-case
trivial with a standard set of git commands.

I believe a repo like https://github.com/llvm-project/llvm-project
<https://github.com/llvm-project/llvm-project> solves most of the
workflows (both for developers and downstream users) with little to no tooling
required. Providing a read-only export from this repo is also fairly easy, and
can be done asynchronously in a deterministic way (contrary to the submodule
umbrella update that requires some server-side hooks).
The only two unanswered drawbacks that I got from this thread are:

1) A "major drawback of a single huge repo IMHO: In git, to push a commit
you must have it at the remote HEAD. If HEAD has changed you need to
rebase/rebuild/retest/retry. With a single monster repo, a commit to
'lld' means I have to go through this pain to put in my 'clang'
tweak.”,  http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html>
2) Chris Bienemann: What about a *contributor* only wanting to contribute to
compiler-rt? He has to pay the price of cloning the full repo.
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html>

I haven’t seen a good answer for 1), and for 2) it’ll come down to a balance of
“how much a burden it is in 2016 to download 500MB once to contribute to a
project”, and how many people (and number of commits) does this represent?
> A shared solution (if it's possible) may be an opportunity to both
share infrastructure with downstream projects and adopt a more modular approach
to the LLVM project sources.
I had the impression that the current situation is that sources are “modular”,
and that’s painful when you work cross-projects (luckily I have been focused on
LLVM itself lately…).
On the opposite of a “more modular approach to the LLVM project sources”, I’d
favor a goal toward "a more coherent approach to maintaining the LLVM
projects sources”.
> I'm staying deliberately light on specifics here. As I said I don't
have strong feelings yet -- I'm still digesting all the ideas in this
thread.
The other thread on the submodules proposal driven by Renato has also a lot of
ideas/workflow descriptions if you’re looking for inspiration.

— 
Mehdi


> To the extent that I have a gut feeling though, this feels like it
introduces very strong coupling between LLVM project sources (more than is
required by the projects APIs) for the sake of convenience, so I'm trying to
consider the alternatives.
> 
> Cheers,
> Lang.
> 
> 
> On Thu, Jul 28, 2016 at 6:41 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Aaaand I'm (mostly) caught up. Phew.
>> 
>> FWIW Chris B is right: I had been put off commenting on this thread by
the length, and the number of git discussions that have come before this. He
convinced me to make the effort to put my 2 cents in though - thanks Chris.
>> 
>> So - for my use-case I don't have strong feelings one way or the
other* <https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something
about the discussion so far strikes me as dissonant: If we're going to break
out some sub-projects (the test-suite for licensing reasons, the runtimes for
modularity) then it's not really a mono-repo any more. It's a multi-repo
where we've collapsed some (but not all) of the existing repos.
> 
> This a narrow view IMO: the criteria #1 Chris mentioned to include projects
in the monorepo was " must be tightly coupled to specific versions”.
> It means that even with the test suite (and possibly some runtime) out of
the monorepo, all the software that is tightly coupled would be in the monorepo,
and that alone would be enough to alleviate the needs for (most of the)
tooling/infrastructure.
> 
> 
>> To the extent that we have to build tooling to support multiple-repos
(auto-mergers for test bots, command line utils for devs who want the main repo
plus tests plus ...), could we re-use that to keep the existing modular project
setup?
> 
> I find it a fairly different scale to clone 3 repos on a bot versus having
to keep multiple repositories *in sync* (i.e. cross repository synchronization).
> 
> 
>> This might be a fairly low-benefit proposition if the tools we develop
were only usable by in-tree projects, but there are many other users of LLVM
(Swift leaps to mind since I'm at Apple, but there are many others) who
might appreciate the ability to use LLVM-provided tools to pick-and-mix LLVM
projects into their repos. Otherwise, every downstream user will have to roll
some version of these tools themselves.
> 
> Different problems, different tools… I’m against artificially creating
“problems" for upstream developers only because the tooling to solve them
works for downstream users.
> 
> — 
> Mehdi
> 
> 
>> 
>> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com
<mailto:beanz at apple.com>> wrote:
>> > It is worth pointing out the Jenkins job that runs that is a
playground I setup for myself. It is nowhere near production ready, and it will
fail frequently as I iterate messing around with it.
>> 
>> Sure, I think that's implied.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/e201be49/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Apparently Analagous Threads