thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Jul-29 04:11 UTC

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 7:32 PM, Lang Hames <lhames at gmail.com> wrote:
> 
> Hi Mehdi,
> 
> This a narrow view IMO: the criteria #1 Chris mentioned to include projects
in the monorepo was " must be tightly coupled to specific versions”.
> It means that even with the test suite (and possibly some runtime) out of
the monorepo, all the software that is tightly coupled would be in the monorepo,
and that alone would be enough to alleviate the needs for (most of the)
tooling/infrastructure.
> 
> Fair point, but coupling isn't binary: even the test-suite is coupled
to the versions of clang that can compile it, it's just relatively loose
compared to LLVM/clang.
> 
> I find it a fairly different scale to clone 3 repos on a bot versus having
to keep multiple repositories *in sync* (i.e. cross repository synchronization).
> 
> I think it depends on the nature of the tools that are required. Bots are
relatively simple since they're only reading from the repos, not writing.
They're not the only use-case I have in mind though.
> 
> Different problems, different tools… I’m against artificially creating
“problems" for upstream developers only because the tooling to solve them
works for downstream users.
> 
> I don't think these are actually different problems: I would guess that
the problem of collecting some subset of the LLVM projects into a usable
source-tree is shared by many downstream users, and it's common in my
workflows (e.g. just checking out llvm and lld). It will have to be solved by
someone, since downstream users need it even if we adopted a mono-repo.
What I meant by “different problem" is that “downstream users” for instance
don’t need to commit, that makes their problem/workflow quite different from an
upstream developer (for instance it is fairly easy to maintain a read-only view
of the existing individual git repo currently on llvm.org
<http://llvm.org/>).

Also while we can create scripts for (almost) every scenarios, one have to put
in balance the script that is run once at checkout time vs the set of scripts
required for day-to-day development: for example what if I want to switch my
tree to my work-in-progress branch where I changed a LLVM library to use the new
"Error checking” API and adapted all the other projects that using this
API, and then I want to rebase this branch on master for every projects so that
I can get ready to push. My impression is that a single repo makes this use-case
trivial with a standard set of git commands.

I believe a repo like https://github.com/llvm-project/llvm-project
<https://github.com/llvm-project/llvm-project> solves most of the
workflows (both for developers and downstream users) with little to no tooling
required. Providing a read-only export from this repo is also fairly easy, and
can be done asynchronously in a deterministic way (contrary to the submodule
umbrella update that requires some server-side hooks).
The only two unanswered drawbacks that I got from this thread are:

1) A "major drawback of a single huge repo IMHO: In git, to push a commit
you must have it at the remote HEAD. If HEAD has changed you need to
rebase/rebuild/retest/retry. With a single monster repo, a commit to
'lld' means I have to go through this pain to put in my 'clang'
tweak.”,  http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/102656.html>
2) Chris Bienemann: What about a *contributor* only wanting to contribute to
compiler-rt? He has to pay the price of cloning the full repo.
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/103052.html>

I haven’t seen a good answer for 1), and for 2) it’ll come down to a balance of
“how much a burden it is in 2016 to download 500MB once to contribute to a
project”, and how many people (and number of commits) does this represent?
> A shared solution (if it's possible) may be an opportunity to both
share infrastructure with downstream projects and adopt a more modular approach
to the LLVM project sources.
I had the impression that the current situation is that sources are “modular”,
and that’s painful when you work cross-projects (luckily I have been focused on
LLVM itself lately…).
On the opposite of a “more modular approach to the LLVM project sources”, I’d
favor a goal toward "a more coherent approach to maintaining the LLVM
projects sources”.
> I'm staying deliberately light on specifics here. As I said I don't
have strong feelings yet -- I'm still digesting all the ideas in this
thread.
The other thread on the submodules proposal driven by Renato has also a lot of
ideas/workflow descriptions if you’re looking for inspiration.

— 
Mehdi


> To the extent that I have a gut feeling though, this feels like it
introduces very strong coupling between LLVM project sources (more than is
required by the projects APIs) for the sake of convenience, so I'm trying to
consider the alternatives.
> 
> Cheers,
> Lang.
> 
> 
> On Thu, Jul 28, 2016 at 6:41 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> Aaaand I'm (mostly) caught up. Phew.
>> 
>> FWIW Chris B is right: I had been put off commenting on this thread by
the length, and the number of git discussions that have come before this. He
convinced me to make the effort to put my 2 cents in though - thanks Chris.
>> 
>> So - for my use-case I don't have strong feelings one way or the
other* <https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something
about the discussion so far strikes me as dissonant: If we're going to break
out some sub-projects (the test-suite for licensing reasons, the runtimes for
modularity) then it's not really a mono-repo any more. It's a multi-repo
where we've collapsed some (but not all) of the existing repos.
> 
> This a narrow view IMO: the criteria #1 Chris mentioned to include projects
in the monorepo was " must be tightly coupled to specific versions”.
> It means that even with the test suite (and possibly some runtime) out of
the monorepo, all the software that is tightly coupled would be in the monorepo,
and that alone would be enough to alleviate the needs for (most of the)
tooling/infrastructure.
> 
> 
>> To the extent that we have to build tooling to support multiple-repos
(auto-mergers for test bots, command line utils for devs who want the main repo
plus tests plus ...), could we re-use that to keep the existing modular project
setup?
> 
> I find it a fairly different scale to clone 3 repos on a bot versus having
to keep multiple repositories *in sync* (i.e. cross repository synchronization).
> 
> 
>> This might be a fairly low-benefit proposition if the tools we develop
were only usable by in-tree projects, but there are many other users of LLVM
(Swift leaps to mind since I'm at Apple, but there are many others) who
might appreciate the ability to use LLVM-provided tools to pick-and-mix LLVM
projects into their repos. Otherwise, every downstream user will have to roll
some version of these tools themselves.
> 
> Different problems, different tools… I’m against artificially creating
“problems" for upstream developers only because the tooling to solve them
works for downstream users.
> 
> — 
> Mehdi
> 
> 
>> 
>> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com
<mailto:beanz at apple.com>> wrote:
>> > It is worth pointing out the Jenkins job that runs that is a
playground I setup for myself. It is nowhere near production ready, and it will
fail frequently as I iterate messing around with it.
>> 
>> Sure, I think that's implied.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/e201be49/attachment.html>

David Chisnall via llvm-dev

2016-Jul-29 08:47 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

<html><head></head><body
class="ApplePlainTextBody" dir="auto" style="word-wrap:
break-word; -webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;">On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev
<llvm-dev@lists.llvm.org> wrote:<br><blockquote
type="cite"><br>What I meant by “different problem" is
that “downstream users” for instance don’t need to commit, that makes their
problem/workflow quite different from an upstream developer (for instance it is
fairly easy to maintain a read-only view of the existing individual git repo
currently on llvm.org).<br></blockquote><br>I’m not convinced
by this distinction.  A lot of downstream developers need to patch LLVM and we
benefit when they upstream their changes.  We should not make it harder for them
to do this.  To give a couple of example downstream projects, both FreeBSD and
Swift have patches on LLVM / Clang in their versions that they gradually filter
upstream.  Both projects have LLVM committers among their members.  If the
workflow that we recommend for them makes upstreaming easy then they benefit
(maintaining a fork is effort) and LLVM benefits (having people provide bug
fixes makes our code better).<br><br>The workflow that we want to
recommend to these people is:<br><br>- Fork the repo that you’re
interested in from the LLVM GitHub organisation<br>- Make your
changes<br>- Send pull requests for anything that you think is of interest
to upstream<br><br>This makes the barrier to entry for sending code
back upstream *much* lower than it currently is, to the benefit of all.  If the
alternative is:<br><br>- Fork a read-only repo that you’re
interested in from the LLVM GitHub organisation<br>- Make your
changes<br>- Fork a different repo from the LLVM GitHub
organisation<br>- Run a script to filter some of your changes into that
one<br>- Send a pull request from that<br>- Deal with merging
between the two yourself<br><br>I strongly suspect that we’ll get a
lot fewer useful contributions from downstream.  Or downstream people will just
work on the monorepo and eat the cost.<br><br>If someone is working
on a downstream LLVM project and becoming familiar with our codebase, then we
want them to be subtly nudging their workflow so that they eventually become
LLVM contributors without
noticing!<br><br>David<br><br><br></body></html>

David Chisnall via llvm-dev

2016-Jul-29 09:19 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> What I meant by “different problem" is that “downstream users” for
instance don’t need to commit, that makes their problem/workflow quite different
from an upstream developer (for instance it is fairly easy to maintain a
read-only view of the existing individual git repo currently on llvm.org).
I’m not convinced by this distinction.  A lot of downstream developers need to
patch LLVM and we benefit when they upstream their changes.  We should not make
it harder for them to do this.  To give a couple of example downstream projects,
both FreeBSD and Swift have patches on LLVM / Clang in their versions that they
gradually filter upstream.  Both projects have LLVM committers among their
members.  If the workflow that we recommend for them makes upstreaming easy then
they benefit (maintaining a fork is effort) and LLVM benefits (having people
provide bug fixes makes our code better).

The workflow that we want to recommend to these people is:

- Fork the repo that you’re interested in from the LLVM GitHub organisation
- Make your changes
- Send pull requests for anything that you think is of interest to upstream

This makes the barrier to entry for sending code back upstream *much* lower than
it currently is, to the benefit of all.  If the alternative is:

- Fork a read-only repo that you’re interested in from the LLVM GitHub
organisation
- Make your changes
- Fork a different repo from the LLVM GitHub organisation
- Run a script to filter some of your changes into that one
- Send a pull request from that
- Deal with merging between the two yourself

I strongly suspect that we’ll get a lot fewer useful contributions from
downstream.  Or downstream people will just work on the monorepo and eat the
cost.

If someone is working on a downstream LLVM project and becoming familiar with
our codebase, then we want them to be subtly nudging their workflow so that they
eventually become LLVM contributors without noticing!

David


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3719 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/058c9465/attachment.bin>

Dean Michael Berris via llvm-dev

2016-Jul-29 11:35 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 29 Jul 2016, at 19:19, David Chisnall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> What I meant by “different problem" is that “downstream users” for
instance don’t need to commit, that makes their problem/workflow quite different
from an upstream developer (for instance it is fairly easy to maintain a
read-only view of the existing individual git repo currently on llvm.org).
> 
> I’m not convinced by this distinction.  A lot of downstream developers need
to patch LLVM and we benefit when they upstream their changes.  We should not
make it harder for them to do this.  To give a couple of example downstream
projects, both FreeBSD and Swift have patches on LLVM / Clang in their versions
that they gradually filter upstream.  Both projects have LLVM committers among
their members.  If the workflow that we recommend for them makes upstreaming
easy then they benefit (maintaining a fork is effort) and LLVM benefits (having
people provide bug fixes makes our code better).
> 
> The workflow that we want to recommend to these people is:
> 
> - Fork the repo that you’re interested in from the LLVM GitHub organisation
> - Make your changes
> - Send pull requests for anything that you think is of interest to upstream
> 
I understand this, but why isn't "the repo you're interested
in" just the megarepo (or monorepo) where every LLVM project resides?
> This makes the barrier to entry for sending code back upstream *much* lower
than it currently is, to the benefit of all.  If the alternative is:
> 
> - Fork a read-only repo that you’re interested in from the LLVM GitHub
organisation
> - Make your changes
> - Fork a different repo from the LLVM GitHub organisation
> - Run a script to filter some of your changes into that one
> - Send a pull request from that
> - Deal with merging between the two yourself
> 
> I strongly suspect that we’ll get a lot fewer useful contributions from
downstream.  Or downstream people will just work on the monorepo and eat the
cost.
> 
It isn't -- for downstream users of any of the LLVM projects, I suspect the
answer will just be "instead of forking N repositories to get the benefit
from these N projects, just fork the megarepo".

If I was a downstream user, this sounds like a simpler proposition *even if
I'm only interested in one part of the overall LLVM project*.
> If someone is working on a downstream LLVM project and becoming familiar
with our codebase, then we want them to be subtly nudging their workflow so that
they eventually become LLVM contributors without noticing!
> 
Indeed. The best way I think, all things considered, is that we have a single
megarepo where everything LLVM is in there. That way in case anybody wants to
make any changes to any part of it, it's a simpler process _especially
compared to the status quo_.

Cheers

Tom Honermann via llvm-dev

2016-Jul-29 15:36 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 7/29/2016 5:21 AM, David Chisnall via llvm-dev wrote:> ...  A lot of downstream developers
> need to patch LLVM and we benefit when they upstream their changes.  We
> should not make it harder for them to do this.  To give a couple of
> example downstream projects, both FreeBSD and Swift have patches on LLVM
> / Clang in their versions that they gradually filter upstream.  Both
> projects have LLVM committers among their members.  If the workflow that
> we recommend for them makes upstreaming easy then they benefit
> (maintaining a fork is effort) and LLVM benefits (having people provide
> bug fixes makes our code better).
While I agree with all of the above, I think the cost difference in the 
mechanics of how changes are upstreamed is negligible.  The much larger 
cost of upstreaming changes is the engagement with the community itself. 
  In our local repos, we can skimp on code quality and testing (to our 
own detriment of course).  When we upstream, we need to ensure code 
style is consistent and that unit tests are in place.  We need to find 
someone with commit access that is willing to commit on our behalf, go 
through code review with them and address any requests for changes, and 
then resolve conflicts when the upstreamed changes get pulled back into 
our repo (we #ifdef our local customizations for auditing and testing 
purposes, so there are always conflicts for us).  This is all exactly as 
it should be.  The only reason I mention it is to say, don't dwell on 
the mechanics of upstreaming changes too much as the cost differences 
just aren't that significant *unless* those mechanics significantly 
reduce the code review and commit process costs.

Tom.

Mehdi Amini via llvm-dev

2016-Jul-29 17:01 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 29, 2016, at 2:19 AM, David Chisnall <david.chisnall at
cl.cam.ac.uk> wrote:
> 
> On 29 Jul 2016, at 05:11, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> What I meant by “different problem" is that “downstream users” for
instance don’t need to commit, that makes their problem/workflow quite different
from an upstream developer (for instance it is fairly easy to maintain a
read-only view of the existing individual git repo currently on llvm.org).
> 
> I’m not convinced by this distinction.  A lot of downstream developers need
to patch LLVM and we benefit when they upstream their changes.
I made a difference between downstream users and developers. I.e. someone that
just need to get and build compiler-rt vs someone that want to *commit* to LLVM.
Note that even by getting a single repo you can still send a patch to the
mailing list and someone can commit it for you (including correct author
attribution contrary to SVN).
> We should not make it harder for them to do this.  To give a couple of
example downstream projects, both FreeBSD and Swift have patches on LLVM / Clang
in their versions that they gradually filter upstream.  Both projects have LLVM
committers among their members.  If the workflow that we recommend for them
makes upstreaming easy then they benefit (maintaining a fork is effort) and LLVM
benefits (having people provide bug fixes makes our code better).
> 
> The workflow that we want to recommend to these people is:
> 
> - Fork the repo that you’re interested in from the LLVM GitHub organisation
> - Make your changes
> - Send pull requests for anything that you think is of interest to upstream

Note that the workflow you describe above still requires to export their patch
and import it in this clone before pushing.
(Note also that we accept patches on the mailing list, so one does not even need
to clone the official repo).
> This makes the barrier to entry for sending code back upstream *much* lower
than it currently is,
I don’t understand this statement. As of today you can send a diff to the
mailing list, I don’t see how lower the bar can be.

> to the benefit of all.  If the alternative is:
> 
> - Fork a read-only repo that you’re interested in from the LLVM GitHub
organisation
> - Make your changes
Why? If you know you want to *push* commits upstream, fork the only useful repo
for that in the first place.
> - Fork a different repo from the LLVM GitHub organisation
> - Run a script to filter some of your changes into that one
I don’t know why you think there is a need for a script, or why it is different
from today.
Let say I’m working on a fork of the compiler-rt read-only repo and I want to
upstream a patch at some point:

Today:

- cd /path/to/compiler_rt-forked
- git format-patch …
- cd /path/to/compiler_rt-upstream
- git am  /path/to/compiler_rt-forked/0001-My-awesome-changes.patch
- git svn dcommit
- done

Tomorrow with a monorepo:

- cd /path/to/compiler_rt-forked
- git format-patch …
- cd /path/to/unifiedrepo-upstream
- git am  /path/to/compiler_rt-forked/0001-My-awesome-changes.patch
—directory=compiler-rt
- git push
- done

Alternatively, if I’m upstream a patch once a year, I don’t really need to push
it myself.

- cd /path/to/compiler_rt-forked
- git format-patch …
- email the patch.

> - Send a pull request from that
Note that I think we deferred any change to the workflow for future discussions
(pull-request are not part of our workflow today).
> - Deal with merging between the two yourself
I don’t know what you mean by dealing with the merging, I don’t expect any
difficulties, you need to elaborate.
> 
> I strongly suspect that we’ll get a lot fewer useful contributions from
downstream.  Or downstream people will just work on the monorepo and eat the
cost.
> 
> If someone is working on a downstream LLVM project and becoming familiar
with our codebase, then we want them to be subtly nudging their workflow so that
they eventually become LLVM contributors without noticing!
Sure. The distinction between “downstream users” and “developers” was made in
response to “there exists many user that just download and build a subproject”.
These are not people that are *developing* on a downstream fork.

— 
Mehdi

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Reasonably Related Threads