thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Simon Taylor via llvm-dev

2016-Jul-26 09:09 UTC

[llvm-dev] [RFC] One or many git repositories?

Hi Duncan,
>  […]
> 2. Those working on projects *outside* the monolithic repo will get the
downsides of both: a monolithic repo that they are only using parts of, and
multiple repos that are somehow version-locked.
> 
> 3. For many (most?) developers, changing to a monolithic git repo is a
*bigger* workflow change than switching to separate git repos. Many people (and
at least some downstream infrastructure) use the git mirrors exclusively, aside
from git-svn for committing.
I believe the idea is to continue to maintain the read-only independent git
repos for each project. The only change is instead of sourcing those commits
from the official upstream(independent) svn repos, they will be sourced from the
official upstream monorepo.

Thus downstream developers can continue to use the read-only view of the
independent projects if that is easier for them; but people hacking on
llvm/clang itself get the benefits of easier checkout, patching, bisection,
atomic commits between projects, etc that come from using a monorepo as the
official repository.

Simon

Renato Golin via llvm-dev

2016-Jul-26 09:15 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 26 July 2016 at 10:09, Simon Taylor via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Thus downstream developers can continue to use the read-only view of the
independent projects if that is easier for them; but people hacking on
llvm/clang itself get the benefits of easier checkout, patching, bisection,
atomic commits between projects, etc that come from using a monorepo as the
official repository.
Would this read-only repositories remain with the synchronous version
stream? I think this was one of the points against pure-git without
sub-modules and without monolithic repository.

cheers,
--renato

Simon Taylor via llvm-dev

2016-Jul-26 11:08 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 26 Jul 2016, at 10:15, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 26 July 2016 at 10:09, Simon Taylor via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Thus downstream developers can continue to use the read-only view of
the independent projects if that is easier for them; but people hacking on
llvm/clang itself get the benefits of easier checkout, patching, bisection,
atomic commits between projects, etc that come from using a monorepo as the
official repository.
> 
> Would this read-only repositories remain with the synchronous version
> stream? I think this was one of the points against pure-git without
> sub-modules and without monolithic repository.
Which “synchronous version stream” are you referring to?

My understanding is that currently the official repos are in SVN but are
separate for each project.

That situation could of course be recreated exactly with git.

The downside is that some of the projects have cross-dependencies (clang rev x
will only work with llvm rev y) and I don’t believe these cross-repository
dependencies are currently stored anywhere.

If my understanding is mistaken, then apologies, I don’t do any day-to-day work
with LLVM.

git submodules would let you add an umbrella repository that would ensure the
submodules were at mutually compatible versions.

An alternative is to use a monorepo as the ultimate source of truth and the
“official upstream”, which would ensure all projects are mutually compatible,
and make cross-project patches / bisection etc easier.

With a monorepo upstream it would still be possible to maintain read-only views
of parts of the repository (ie individual projects) by projecting commits from
the monorepo.

Say a patch is committed to the monorepo that touches libc++, clang, and llvm.
Those 3 individual read-only repos would then get be updated with the changes in
the commit that affects their files. The commit message would be the same as
from the monorepo, but would have a line added referencing the monorepo commit
(in the same way the git repos currently list the svn rev in their commit
messages). It would also be possible to maintain a read-only umbrella repo that
references the individual ones as submodules; that would also receive a commit
updating the versions of the individual git repos.

These read-only projections of the monorepo would be entirely deterministic -
every commit in the monorepo would generate a matching commit in any project
that it touches (and a commit in the umbrella submodule-based repo too if
desired). The read-only views could be regenerated from scratch from the
monorepo [and optionally a starting state, so you could keep the existing hashes
in the current individual git repo views].

Whether or not to go with a monorepo really depends for me on how intertwined
the modules are, and how often cross-repo commits happen. That’s not something I
know personally, so I won’t make any recommendations either way.

Simon

Mehdi Amini via llvm-dev

2016-Jul-26 16:34 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 26, 2016, at 2:09 AM, Simon Taylor via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Duncan,
> 
>> […]
>> 2. Those working on projects *outside* the monolithic repo will get the
downsides of both: a monolithic repo that they are only using parts of, and
multiple repos that are somehow version-locked.
>> 
>> 3. For many (most?) developers, changing to a monolithic git repo is a
*bigger* workflow change than switching to separate git repos. Many people (and
at least some downstream infrastructure) use the git mirrors exclusively, aside
from git-svn for committing.
> 
> I believe the idea is to continue to maintain the read-only independent git
repos for each project. The only change is instead of sourcing those commits
from the official upstream(independent) svn repos, they will be sourced from the
official upstream monorepo.
> 
> Thus downstream developers can continue to use the read-only view of the
independent projects if that is easier for them; but people hacking on
llvm/clang itself get the benefits of easier checkout, patching, bisection,
atomic commits between projects, etc that come from using a monorepo as the
official repository.
It is true that downstream should be able continue to work easily based of these
official independent *git* repos ( i.e.: http://llvm.org/git/llvm.git
<http://llvm.org/git/llvm.git> will continue to exist and be updated
without changing its history).
However for individual developers that are continuously upstreaming their work
it does not seem sustainable, how do you pull from the individual repos, do you
work, commit as today, and push upstream? Today it is possible with git-svn,
while with the monorepo it won’t be possible.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160726/6b5cf3d4/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-26 16:36 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 26, 2016, at 2:15 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 26 July 2016 at 10:09, Simon Taylor via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Thus downstream developers can continue to use the read-only view of
the independent projects if that is easier for them; but people hacking on
llvm/clang itself get the benefits of easier checkout, patching, bisection,
atomic commits between projects, etc that come from using a monorepo as the
official repository.
> 
> Would this read-only repositories remain with the synchronous version
> stream?
It is possible to continue adding the equivalent of git-svn-id in the commit
message if it is what you’re referring to.

— 
Mehdi

> I think this was one of the points against pure-git without
> sub-modules and without monolithic repository.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Robinson, Paul via llvm-dev

2016-Jul-26 17:42 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

>> 3. For many (most?) developers, changing to a monolithic git repo is a
>> *bigger* workflow change than switching to separate git repos. Many
>> people (and at least some downstream infrastructure) use the git
>> mirrors exclusively, aside from git-svn for committing.
>>
>> I believe the idea is to continue to maintain the read-only independent
>> git repos for each project. The only change is instead of sourcing
those
>> commits from the official upstream(independent) svn repos, they will be
>> sourced from the official upstream monorepo.
>>
>> Thus downstream developers can continue to use the read-only view of
>> the independent projects if that is easier for them; but people hacking
>> on llvm/clang itself get the benefits of easier checkout, patching,
>> bisection, atomic commits between projects, etc that come from using a
>> monorepo as the official repository.
>
> It is true that downstream should be able continue to work easily based
> of these official independent *git* repos ( i.e.:
> http://llvm.org/git/llvm.git will continue to exist and be updated
> without changing its history). 
> However for individual developers that are continuously upstreaming their
> work it does not seem sustainable, how do you pull from the individual
> repos, do you work, commit as today, and push upstream? Today it is
> possible with git-svn, while with the monorepo it won’t be possible.
Speaking only for myself, the problem you suggest simply does not exist.
Or another way of saying it:  I already can't work the way you imagine.
Our downstream repo is fed by the upstream projects, but it's simply
impossible to directly commit work from there into upstream.  I create
a patch based on local work, apply it to a separate "vanilla" upstream
checkout and commit from there.  The upstream version of the patch 
cycles back into our repo and we use it to supersede the local version.
All done.  Git or svn, makes no difference.  Monorepo or submodules,
makes no difference.  Those are all implementation details.

Even if it were possible, I would still keep my upstream checkout
separate just as a safety measure, to keep from sending private stuff
upstream by accident.
--paulr

Chris Bieneman via llvm-dev

2016-Jul-27 16:47 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

I’m just now catching up on this massive thread after being on vacation last
week, and I have a few thoughts I’d like to share.

First and foremost please don’t consider lack of dissent on the thread as
presence of consensus. The various git-related threads on LLVM-dev lately have
been so active and contentious that I think a lot of people are zoning out on
the conversations. As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.

I think it would be great for us to have several different proposals for how the
git-transition could work, and have a survey to get people’s opinions. I know
this has been discussed repeatedly, and I want to put in my vote in favor of
having a survey that takes into account multiple different approaches.

WRT the actual proposal in this thread, I’m strongly opposed to a
mono-repository. While I understand the argument that the full clone’s cost on
disk space is minimal compared to an LLVM object directory, what about for
contributors that contribute to the smaller runtimes projects but *not* to LLVM
or Clang. A contributor that only contributes to libcxx or compiler-rt being
forced to do a full clone of all the LLVM projects in order to push a patch
kinda sucks.

I want to point out a few workflows people may not be considering.

Clang can be built against an installed LLVM. I know this workflow is used by
some people because I’ve broken it in the past and had to fix it. With a
mono-repo this workflow gets a bit more complicated because you’d need to do
sparse checkouts, and it probably means we should just nuke the workflow
entirely because there is no real value added by having it.

Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
common use case maintaining sparse repository mirrors would limit impact of this
on users, should any GCC user want to contribute to Compiler-RT, you’re forcing
them to clone a much larger repository than necessary.

The same problem with Compiler-RT’s sanitizers also applies to libcxx,
libcxxabi, libunwind, and potentially any other runtime library projects that we
may create in the future.

Beyond all that I want to point out that the git multi-repository story is
basically the same thing we have today with SVN except for the absence of a
monotonically increasing number that corresponds across repositories. While
admittedly you do get a linear history with using the mono-repository, that
isn’t the only way to solve the problem, and I don’t really think that the
benefit (not needing to write some tooling) justifies the increased burden
applied to contributors that don’t use the full LLVM family of projects.

I think we have some pretty strong evidence in the form of the github fork
counts (https://github.com/llvm-mirror/ <https://github.com/llvm-mirror/>)
that most people aren’t using all of the LLVM projects. In fact, by that
evidence Clang (the second most popular project) is forked less than 2/3 as many
times as LLVM.

-Chris

> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>> Even if it were possible, I would still keep my upstream checkout
>>> separate just as a safety measure, to keep from sending private
stuff
>>> upstream by accident.
>> 
>> Just FYI, this is our (Azul's) workflow as well, and for similar
>> reasons.
> 
> Same here.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/f88b9c73/attachment-0001.html>

Krzysztof Parzyszek via llvm-dev

2016-Jul-27 17:03 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 7/27/2016 11:47 AM, Chris Bieneman via llvm-dev
wrote:> First and foremost please don’t consider lack of dissent on the thread
> as presence of consensus. The various git-related threads on LLVM-dev
> lately have been so active and contentious that I think a lot of people
> are zoning out on the conversations. As supporting evidence of this, I
> was discussing this thread yesterday around the office yesterday and had
> quite a few people responding something along the lines of “they’re
> proposing what?”.
If there is a time and place for building a consensus, this is it.

If people want to have their voices heard, they should participate in 
the discussions.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Chris Bieneman via llvm-dev

2016-Jul-27 17:17 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 10:03 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
> On 7/27/2016 11:47 AM, Chris Bieneman via llvm-dev wrote:
>> First and foremost please don’t consider lack of dissent on the thread
>> as presence of consensus. The various git-related threads on LLVM-dev
>> lately have been so active and contentious that I think a lot of people
>> are zoning out on the conversations. As supporting evidence of this, I
>> was discussing this thread yesterday around the office yesterday and
had
>> quite a few people responding something along the lines of “they’re
>> proposing what?”.
> 
> If there is a time and place for building a consensus, this is it.
> 
> If people want to have their voices heard, they should participate in the
discussions.
This is a really bad argument for large influential changes like this.
Governance by the loudest voices isn’t generally desirable.

I suspect this is why the idea of having a survey or vote has received
significant support.

-Chris
> 
> -Krzysztof
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Justin Lebar via llvm-dev

2016-Jul-27 17:21 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Thanks for your thoughts, Chris.
> As supporting evidence of this, I was discussing this thread yesterday
around the office yesterday and had quite a few people responding something
along the lines of “they’re proposing what?”.
I hope they'll join us in this thread.

Ultimately a survey is going to be strongly biased in favor of "don't
change anything".  There is a strong psychological bias to weight
losses more than gains, so if one doesn't engage with the issue, it's
only natural to conclude "keep it as similar as possible to what it is
today -- that is safe."  But that line of thinking does not
necessarily lead us to the best outcome.

We've heard in thread from a lot of developers about how a monorepo
would improve their workflow.  I would love to hear from some
developers who are actually affected in the way you describe, rather
than just considering the hypothetical.

My expectation is that the effect of the monorepo on said developers
would be relatively small -- we're talking about 1gb of disk space.  I
understand that there's a "yuck" factor to this, but inasmuch as
there
aren't other concrete effects, this is just change aversion.  And
essentially all of the other effects of the monorepo can be hidden via
sparse checkouts, as we've discussed.

Maybe I am wrong.  But I don't think we're going to get to the bottom
of it without actually engaging with people who are actually affected
in the way you posit.
> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
I think the trade-off you're considering here (cost to developers who
use llvm plus a version-locked subrepo vs. cost to developers who
don't want an llvm clone) is the right one.  But as someone who has
extensively used git submodules and repo (a wrapper script), I
strongly disagree with the judgement that a monorepo would not be a
significant improvement.

Our primary disagreement, I think, is over how much cost there is to
"writing some tooling".  To me, this is a significant barrier standing
in the way of developer productivity.  Here at Google I did a quick
survey, and more than half of us don't have scripts of the sort that
Justin Bogner described.  We are all just floundering around rebasing
clang and llvm until it compiles.  It *sucks*.

I suggest that saying that all of these developers are "doing it
wrong" is not helpful.  Not everyone has the git and python/bash chops
to write the necessary scripts.  Not everyone has the personality to
obsessively script around stuff, or the desire to maintain said
scripts.  Not everyone works on llvm/clang so much that it's worth
adopting a special-snowflake workflow.  And some of us -- myself
included -- have extensive git scripts which work with the standard
git workflow but would be completely broken by adding a custom level
of indirection around git.

When put this way, maybe it's clear that it's actually a niche set of
people for whom "script around the brokenness" is a good solution.

As I've said a bunch of times above, we have to weigh a cost paid by
all of us every time we type a command that starts with "git" --
something we do tens or hundreds of times a day -- versus the one-time
cost of asking people to download 1gb of data.

On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I’m just now catching up on this massive thread after being on vacation
last
> week, and I have a few thoughts I’d like to share.
>
> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus. The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations. As supporting evidence of this, I was discussing
> this thread yesterday around the office yesterday and had quite a few
people
> responding something along the lines of “they’re proposing what?”.
>
> I think it would be great for us to have several different proposals for
how
> the git-transition could work, and have a survey to get people’s opinions.
I
> know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
>
> WRT the actual proposal in this thread, I’m strongly opposed to a
> mono-repository. While I understand the argument that the full clone’s cost
> on disk space is minimal compared to an LLVM object directory, what about
> for contributors that contribute to the smaller runtimes projects but *not*
> to LLVM or Clang. A contributor that only contributes to libcxx or
> compiler-rt being forced to do a full clone of all the LLVM projects in
> order to push a patch kinda sucks.
>
> I want to point out a few workflows people may not be considering.
>
> Clang can be built against an installed LLVM. I know this workflow is used
> by some people because I’ve broken it in the past and had to fix it. With a
> mono-repo this workflow gets a bit more complicated because you’d need to
do
> sparse checkouts, and it probably means we should just nuke the workflow
> entirely because there is no real value added by having it.
>
> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
> common use case maintaining sparse repository mirrors would limit impact of
> this on users, should any GCC user want to contribute to Compiler-RT,
you’re
> forcing them to clone a much larger repository than necessary.
>
> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
> libcxxabi, libunwind, and potentially any other runtime library projects
> that we may create in the future.
>
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
> I think we have some pretty strong evidence in the form of the github fork
> counts (https://github.com/llvm-mirror/) that most people aren’t using all
> of the LLVM projects. In fact, by that evidence Clang (the second most
> popular project) is forked less than 2/3 as many times as LLVM.
>
> -Chris
>
>
> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Even if it were possible, I would still keep my upstream checkout
> separate just as a safety measure, to keep from sending private stuff
> upstream by accident.
>
>
> Just FYI, this is our (Azul's) workflow as well, and for similar
> reasons.
>
>
> Same here.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Krzysztof Parzyszek via llvm-dev

2016-Jul-27 17:37 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 7/27/2016 12:17 PM, Chris Bieneman wrote:>
> This is a really bad argument for large influential changes like this.
Quite the contrary---anybody can participate and anybody can express 
their concerns, explain their goals, their workflow, etc.  For a large 
influential changes like this, "zoning out" is a poor choice of
action.
> I suspect this is why the idea of having a survey or vote has received
significant support.
I haven't seen any support for voting or for a survey.  Both are 
strictly worse, as neither provides an interactive forum where the final 
decision is built, instead of selected.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Renato Golin via llvm-dev

2016-Jul-27 18:00 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 27 July 2016 at 17:47, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus.
Hi Chris,

First things first: I give you my word that I will be yelling louder
than others if this ever happens. (I can be *very* loud! :)

People can push and yell all they want, changes like this are not done
over mailing list discussions.

I have volunteered to "take on" the discussion and try to make it fair
and sound, and I'll do my best to include all opinions in the end.

Also, I will not decide, nor push towards any one decision. I hope you
trust me that there is no bias from my part. For example, what was
considered "my" proposal was actually not what I would have wanted or
benefited me.

But we do have limited time to discuss (and work on the compiler at
the same time), and I don't want to drag this for years (I don't have
the stamina).

So, the current "plan" is to formalise all proposals in around a
month's time by uploading them as documents to docs/Proposals/*.rst,
then put the survey up and let people take their time to answer
(another month), than take some time to analyse the results, sharing
the results with the community. If all goes well, we can do a session
on US LLVM, where we take all the survey feedback into account and
with a large group of people, take some decision.

Of course, any decision will leave people supporting the N-1 other
workflows wanting, and there's no way to avoid this. But the current
solution is *already* letting  a lot of people down, so I don't see a
way out where everyone will be happy.

> The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations.
I know... :(

> I think it would be great for us to have several different proposals for
how
> the git-transition could work, and have a survey to get people’s opinions.
Yup.

> I know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
Yup.

Barring time and survey size limitations, we can have as many as we want.

I personally feel two is minimum, three is good, four is too much.

I also think we should include "stay as it is" as an option, even if I
don't think there will be that many votes towards it.

If you want to discuss specifically about the survey, please get
involved in the llvm-foundation's thread "Voting".

cheers,
--renato

Bruce Hoult via llvm-dev

2016-Jul-27 18:03 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Thu, Jul 28, 2016 at 4:47 AM, Chris Bieneman via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
What do you believe is this increased burden?

The entire commit history of all llvm projects in a mono-repository is a
449 MB .git directory. It can be downloaded in about two minutes on a
typical domestic internet connection (50 Mbps).

If you download only a snapshot of the current HEAD commit then the .git
repository is 88 MB and takes under a minute. Any other individual commit
should be similar.

This doesn't seem like a big burden to me.

The checked out llvm source directory -- which you say is all that many
people want -- is 202 MB. That's without even building it.

Why is this burden unacceptable? It seems rather small to me.

For comparison, using svn to checkout llvm using ...

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm

... took me 1 minute 28 seconds and gives a 222 MB .svn directory, 428 MB
total (so 206 MB for the source files checked out).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/97493dae/attachment.html>

Chris Bieneman via llvm-dev

2016-Jul-27 19:50 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
> 
> Thanks for your thoughts, Chris.
> 
>> As supporting evidence of this, I was discussing this thread yesterday
around the office yesterday and had quite a few people responding something
along the lines of “they’re proposing what?”.
> 
> I hope they'll join us in this thread.
> 
> Ultimately a survey is going to be strongly biased in favor of
"don't
> change anything".  There is a strong psychological bias to weight
> losses more than gains, so if one doesn't engage with the issue,
it's
> only natural to conclude "keep it as similar as possible to what it is
> today -- that is safe."  But that line of thinking does not
> necessarily lead us to the best outcome.
I don’t agree with this assertion. I believe that if you put forth multiple
proposals, and have an articulate discussion of the merits and costs of each
solution you can create a survey that can help inform decision making. I suppose
we can agree to disagree.
> 
> We've heard in thread from a lot of developers about how a monorepo
> would improve their workflow.  I would love to hear from some
> developers who are actually affected in the way you describe, rather
> than just considering the hypothetical.
> 
> My expectation is that the effect of the monorepo on said developers
> would be relatively small -- we're talking about 1gb of disk space.  I
> understand that there's a "yuck" factor to this, but inasmuch
as there
> aren't other concrete effects, this is just change aversion.  And
> essentially all of the other effects of the monorepo can be hidden via
> sparse checkouts, as we've discussed.
> 
> Maybe I am wrong.  But I don't think we're going to get to the
bottom
> of it without actually engaging with people who are actually affected
> in the way you posit.
Ok, let me describe a few workflows I’ve used in the last year that are (in my
mind) adversely impacted by a mono-repo.

Case Study 1 - Simple development on a sub-project

I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.

With a mono-repo rebasing the checkout means rebasing the whole tree. So, either
I have to wrangle some crazy git or CMake foo, or when I run “ninja compiler-rt”
after the rebase it will rebuild LLVM and Clang too. That kinda sucks.

What this example illustrates to me is that today we have loosely coupled
projects with an occasional rev lock. Moving to a mono-repo enforces a tight
coupling that isn’t strictly required today.

Case Study 2 - Working on a sub-project in isolation across many platforms

I did a lot of work on Compiler-RT last year that had no direct dependency on
any other LLVM project. During the development I was working with a Compiler-RT
checkout and a build directory of just Compiler-RT. Every once in a while (or
every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:

(1) Spin up a VM on a VPS that closely matched the configuration I broke
(2) Checkout Compiler-RT
(3) Reproduce, debug, fix the failure
(4) Commit the patch from the VM

In a mono-repository doing this would require checking out *all* sub-projects,
not just Compiler-RT. I imagine this probably isn’t a common workflow, but it is
one I use that would be adversely impacted by needing to checkout a full LLVM.
Now, you might say I could check out the sub-project mirror, but then I can’t
commit from the VM, which kinda sucks.

> 
>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
> 
> I think the trade-off you're considering here (cost to developers who
> use llvm plus a version-locked subrepo vs. cost to developers who
> don't want an llvm clone) is the right one.  
I actually think there are *a lot* more considerations we need to be making for
an infrastructure change like this. While it is true that our SCM hosting
strategy primarily impacts developers, it also impacts our users. We should be
conscious of the impact to downstream users in making infrastructure changes
like this. That is part of why the idea of a survey holds appeal to me; it would
give us the opportunity to get feedback from a much wider audience than the
current “people on llvm-dev who haven’t been scared away”.
> But as someone who has
> extensively used git submodules and repo (a wrapper script), I
> strongly disagree with the judgement that a monorepo would not be a
> significant improvement.
> 
> Our primary disagreement, I think, is over how much cost there is to
> "writing some tooling".  To me, this is a significant barrier
standing
> in the way of developer productivity.  Here at Google I did a quick
> survey, and more than half of us don't have scripts of the sort that
> Justin Bogner described.  We are all just floundering around rebasing
> clang and llvm until it compiles.  It *sucks*.
I actually think we’re both talking about solutions that require tooling, and
while we *could* be disagreeing over how much effort each tooling initiative
would require (I think they’re pretty close, so I don’t care to have that
argument), my actual disagreement with your proposal is that it is a change that
impacts developers and users universally and I don’t think that it is justified.
Simply put, I don’t feel that the benefits are substantial enough to warrant the
kind of disruptive change you’re proposing.
> 
> I suggest that saying that all of these developers are "doing it
> wrong" is not helpful.
Maybe I’m missing something, but I don’t think I said anyone was “doing it
wrong”. Bisecting across multiple git repositories isn’t a great experience. But
neither is bisecting across a half dozen separate folders in an SVN repository.
Both the submodule solution and the mono-repo solution solve this problem
equivalently well.
>  Not everyone has the git and python/bash chops
> to write the necessary scripts.  Not everyone has the personality to
> obsessively script around stuff, or the desire to maintain said
> scripts.  Not everyone works on llvm/clang so much that it's worth
> adopting a special-snowflake workflow.  And some of us -- myself
> included -- have extensive git scripts which work with the standard
> git workflow but would be completely broken by adding a custom level
> of indirection around git.
> 
> When put this way, maybe it's clear that it's actually a niche set
of
> people for whom "script around the brokenness" is a good
solution.
I’m not sure what “brokenness” you’re referring to. We have a collection of
loosely connected projects by design. As a result of that intentional design
certain workflows will be impacted. I don’t think that is brokenness. I think
our loose coupling is a feature even if it makes some workflows harder.

-Chris
> 
> As I've said a bunch of times above, we have to weigh a cost paid by
> all of us every time we type a command that starts with "git" --
> something we do tens or hundreds of times a day -- versus the one-time
> cost of asking people to download 1gb of data.
> 
> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> I’m just now catching up on this massive thread after being on vacation
last
>> week, and I have a few thoughts I’d like to share.
>> 
>> First and foremost please don’t consider lack of dissent on the thread
as
>> presence of consensus. The various git-related threads on LLVM-dev
lately
>> have been so active and contentious that I think a lot of people are
zoning
>> out on the conversations. As supporting evidence of this, I was
discussing
>> this thread yesterday around the office yesterday and had quite a few
people
>> responding something along the lines of “they’re proposing what?”.
>> 
>> I think it would be great for us to have several different proposals
for how
>> the git-transition could work, and have a survey to get people’s
opinions. I
>> know this has been discussed repeatedly, and I want to put in my vote
in
>> favor of having a survey that takes into account multiple different
>> approaches.
>> 
>> WRT the actual proposal in this thread, I’m strongly opposed to a
>> mono-repository. While I understand the argument that the full clone’s
cost
>> on disk space is minimal compared to an LLVM object directory, what
about
>> for contributors that contribute to the smaller runtimes projects but
*not*
>> to LLVM or Clang. A contributor that only contributes to libcxx or
>> compiler-rt being forced to do a full clone of all the LLVM projects in
>> order to push a patch kinda sucks.
>> 
>> I want to point out a few workflows people may not be considering.
>> 
>> Clang can be built against an installed LLVM. I know this workflow is
used
>> by some people because I’ve broken it in the past and had to fix it.
With a
>> mono-repo this workflow gets a bit more complicated because you’d need
to do
>> sparse checkouts, and it probably means we should just nuke the
workflow
>> entirely because there is no real value added by having it.
>> 
>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for
the
>> common use case maintaining sparse repository mirrors would limit
impact of
>> this on users, should any GCC user want to contribute to Compiler-RT,
you’re
>> forcing them to clone a much larger repository than necessary.
>> 
>> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
>> libcxxabi, libunwind, and potentially any other runtime library
projects
>> that we may create in the future.
>> 
>> Beyond all that I want to point out that the git multi-repository story
is
>> basically the same thing we have today with SVN except for the absence
of a
>> monotonically increasing number that corresponds across repositories.
While
>> admittedly you do get a linear history with using the mono-repository,
that
>> isn’t the only way to solve the problem, and I don’t really think that
the
>> benefit (not needing to write some tooling) justifies the increased
burden
>> applied to contributors that don’t use the full LLVM family of
projects.
>> 
>> I think we have some pretty strong evidence in the form of the github
fork
>> counts (https://github.com/llvm-mirror/) that most people aren’t using
all
>> of the LLVM projects. In fact, by that evidence Clang (the second most
>> popular project) is forked less than 2/3 as many times as LLVM.
>> 
>> -Chris
>> 
>> 
>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> Even if it were possible, I would still keep my upstream checkout
>> separate just as a safety measure, to keep from sending private stuff
>> upstream by accident.
>> 
>> 
>> Just FYI, this is our (Azul's) workflow as well, and for similar
>> reasons.
>> 
>> 
>> Same here.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

Chris Bieneman via llvm-dev

2016-Jul-27 20:02 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 11:03 AM, Bruce Hoult <bruce at hoult.org> wrote:
> 
> On Thu, Jul 28, 2016 at 4:47 AM, Chris Bieneman via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Beyond all that I want to point out that the git multi-repository story is
basically the same thing we have today with SVN except for the absence of a
monotonically increasing number that corresponds across repositories. While
admittedly you do get a linear history with using the mono-repository, that
isn’t the only way to solve the problem, and I don’t really think that the
benefit (not needing to write some tooling) justifies the increased burden
applied to contributors that don’t use the full LLVM family of projects.
> 
> What do you believe is this increased burden?
> 
> The entire commit history of all llvm projects in a mono-repository is a
449 MB .git directory. It can be downloaded in about two minutes on a typical
domestic internet connection (50 Mbps).
> 
> If you download only a snapshot of the current HEAD commit then the .git
repository is 88 MB and takes under a minute. Any other individual commit should
be similar.
> 
> This doesn't seem like a big burden to me.
> 
> The checked out llvm source directory -- which you say is all that many
people want -- is 202 MB. That's without even building it.
> 
> Why is this burden unacceptable? It seems rather small to me.
It is a small burden to LLVM contributors to include everything because LLVM is
large. Compiler-RT’s entire git repository is under 18MB, LibCXX is around 20MB,
LibCXXABI is under 3MB. Those projects are frequently used without LLVM and do
not have tight coupling. Forcing developers on those projects to be bound to
LLVM is, IMO, a huge burden.
> 
> For comparison, using svn to checkout llvm using ...
> 
> svn co http://llvm.org/svn/llvm-project/llvm/trunk
<http://llvm.org/svn/llvm-project/llvm/trunk> llvm
>  
> ... took me 1 minute 28 seconds and gives a 222 MB .svn directory, 428 MB
total (so 206 MB for the source files checked out).
> 
I am not advocating that we stay on SVN. I use Git-SVN today. My entire reason
for commenting on this thread is to point out problems I see with this proposal
as compared to the submodule proposal that Renato graciously assembled from lots
of community feedback.

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/38592172/attachment-0001.html>

Bruce Hoult via llvm-dev

2016-Jul-27 20:15 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Sorry, I thought you were talking about people who worked on optimisers in
llvm and nothing else.

Sure, I agree with libc++, compiler-rt, libunwind being separate (and
included in the "mono-rep" as submodules).

Logically they are completely different projects that llvm or clang just
happen to use on some platforms, much like zlib. And they are used by
projects that don't use llvm or clang at all. it's just a historical
accident that llvm people needed them and they didn't exist in suitable
form (functionality/license), so they created them.


On Thu, Jul 28, 2016 at 8:02 AM, Chris Bieneman <beanz at apple.com>
wrote:
>
> On Jul 27, 2016, at 11:03 AM, Bruce Hoult <bruce at hoult.org> wrote:
>
> On Thu, Jul 28, 2016 at 4:47 AM, Chris Bieneman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Beyond all that I want to point out that the git multi-repository story
>> is basically the same thing we have today with SVN except for the
absence
>> of a monotonically increasing number that corresponds across
repositories.
>> While admittedly you do get a linear history with using the
>> mono-repository, that isn’t the only way to solve the problem, and I
don’t
>> really think that the benefit (not needing to write some tooling)
justifies
>> the increased burden applied to contributors that don’t use the full
LLVM
>> family of projects.
>>
>
> What do you believe is this increased burden?
>
> The entire commit history of all llvm projects in a mono-repository is a
> 449 MB .git directory. It can be downloaded in about two minutes on a
> typical domestic internet connection (50 Mbps).
>
> If you download only a snapshot of the current HEAD commit then the .git
> repository is 88 MB and takes under a minute. Any other individual commit
> should be similar.
>
> This doesn't seem like a big burden to me.
>
> The checked out llvm source directory -- which you say is all that many
> people want -- is 202 MB. That's without even building it.
>
> Why is this burden unacceptable? It seems rather small to me.
>
>
> It is a small burden to LLVM contributors to include everything because
> LLVM is large. Compiler-RT’s entire git repository is under 18MB, LibCXX is
> around 20MB, LibCXXABI is under 3MB. Those projects are frequently used
> without LLVM and do not have tight coupling. Forcing developers on those
> projects to be bound to LLVM is, IMO, a huge burden.
>
>
> For comparison, using svn to checkout llvm using ...
>
> svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
>
> ... took me 1 minute 28 seconds and gives a 222 MB .svn directory, 428 MB
> total (so 206 MB for the source files checked out).
>
>
> I am not advocating that we stay on SVN. I use Git-SVN today. My entire
> reason for commenting on this thread is to point out problems I see with
> this proposal as compared to the submodule proposal that Renato graciously
> assembled from lots of community feedback.
>
> -Chris
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>,
and is
> believed to be clean.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/0024632b/attachment.html>

Mehdi Amini via llvm-dev

2016-Jul-27 20:30 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com
<mailto:jlebar at google.com>> wrote:
>> 
>> Thanks for your thoughts, Chris.
>> 
>>> As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.
>> 
>> I hope they'll join us in this thread.
>> 
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
> 
> I don’t agree with this assertion. I believe that if you put forth multiple
proposals, and have an articulate discussion of the merits and costs of each
solution you can create a survey that can help inform decision making. I suppose
we can agree to disagree.
> 
>> 
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>> 
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>> 
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
> 
> Ok, let me describe a few workflows I’ve used in the last year that are (in
my mind) adversely impacted by a mono-repo.
> 
> Case Study 1 - Simple development on a sub-project
> 
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.
> 
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
either I have to wrangle some crazy git or CMake foo, or when I run “ninja
compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
sucks.
> 
> What this example illustrates to me is that today we have loosely coupled
projects with an occasional rev lock. Moving to a mono-repo enforces a tight
coupling that isn’t strictly required today.
> 
> Case Study 2 - Working on a sub-project in isolation across many platforms
> 
> I did a lot of work on Compiler-RT last year that had no direct dependency
on any other LLVM project. During the development I was working with a
Compiler-RT checkout and a build directory of just Compiler-RT. Every once in a
while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
> 
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
> 
> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
> 
> 
>> 
>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>> 
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.  
> 
> I actually think there are *a lot* more considerations we need to be making
for an infrastructure change like this. While it is true that our SCM hosting
strategy primarily impacts developers, it also impacts our users. We should be
conscious of the impact to downstream users in making infrastructure changes
like this. That is part of why the idea of a survey holds appeal to me; it would
give us the opportunity to get feedback from a much wider audience than the
current “people on llvm-dev who haven’t been scared away”.
Since you make the difference between “downstream users” and “developers” here,
can you elaborate why the existing read-only views of the sub-projects are not
enough for "downstream users” that aren’t “developers”? Otherwise I fail to
see how they would be impacted at all.

— 
Mehdi

> 
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>> 
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
> 
> I actually think we’re both talking about solutions that require tooling,
and while we *could* be disagreeing over how much effort each tooling initiative
would require (I think they’re pretty close, so I don’t care to have that
argument), my actual disagreement with your proposal is that it is a change that
impacts developers and users universally and I don’t think that it is justified.
Simply put, I don’t feel that the benefits are substantial enough to warrant the
kind of disruptive change you’re proposing.
> 
>> 
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
> 
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
wrong”. Bisecting across multiple git repositories isn’t a great experience. But
neither is bisecting across a half dozen separate folders in an SVN repository.
Both the submodule solution and the mono-repo solution solve this problem
equivalently well.
> 
>> Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>> 
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
> 
> I’m not sure what “brokenness” you’re referring to. We have a collection of
loosely connected projects by design. As a result of that intentional design
certain workflows will be impacted. I don’t think that is brokenness. I think
our loose coupling is a feature even if it makes some workflows harder.
> 
> -Chris
> 
>> 
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>> 
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I’m just now catching up on this massive thread after being on
vacation last
>>> week, and I have a few thoughts I’d like to share.
>>> 
>>> First and foremost please don’t consider lack of dissent on the
thread as
>>> presence of consensus. The various git-related threads on LLVM-dev
lately
>>> have been so active and contentious that I think a lot of people
are zoning
>>> out on the conversations. As supporting evidence of this, I was
discussing
>>> this thread yesterday around the office yesterday and had quite a
few people
>>> responding something along the lines of “they’re proposing what?”.
>>> 
>>> I think it would be great for us to have several different
proposals for how
>>> the git-transition could work, and have a survey to get people’s
opinions. I
>>> know this has been discussed repeatedly, and I want to put in my
vote in
>>> favor of having a survey that takes into account multiple different
>>> approaches.
>>> 
>>> WRT the actual proposal in this thread, I’m strongly opposed to a
>>> mono-repository. While I understand the argument that the full
clone’s cost
>>> on disk space is minimal compared to an LLVM object directory, what
about
>>> for contributors that contribute to the smaller runtimes projects
but *not*
>>> to LLVM or Clang. A contributor that only contributes to libcxx or
>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>> order to push a patch kinda sucks.
>>> 
>>> I want to point out a few workflows people may not be considering.
>>> 
>>> Clang can be built against an installed LLVM. I know this workflow
is used
>>> by some people because I’ve broken it in the past and had to fix
it. With a
>>> mono-repo this workflow gets a bit more complicated because you’d
need to do
>>> sparse checkouts, and it probably means we should just nuke the
workflow
>>> entirely because there is no real value added by having it.
>>> 
>>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While
for the
>>> common use case maintaining sparse repository mirrors would limit
impact of
>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>> forcing them to clone a much larger repository than necessary.
>>> 
>>> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>>> libcxxabi, libunwind, and potentially any other runtime library
projects
>>> that we may create in the future.
>>> 
>>> Beyond all that I want to point out that the git multi-repository
story is
>>> basically the same thing we have today with SVN except for the
absence of a
>>> monotonically increasing number that corresponds across
repositories. While
>>> admittedly you do get a linear history with using the
mono-repository, that
>>> isn’t the only way to solve the problem, and I don’t really think
that the
>>> benefit (not needing to write some tooling) justifies the increased
burden
>>> applied to contributors that don’t use the full LLVM family of
projects.
>>> 
>>> I think we have some pretty strong evidence in the form of the
github fork
>>> counts (https://github.com/llvm-mirror/) that most people aren’t
using all
>>> of the LLVM projects. In fact, by that evidence Clang (the second
most
>>> popular project) is forked less than 2/3 as many times as LLVM.
>>> 
>>> -Chris
>>> 
>>> 
>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Even if it were possible, I would still keep my upstream checkout
>>> separate just as a safety measure, to keep from sending private
stuff
>>> upstream by accident.
>>> 
>>> 
>>> Just FYI, this is our (Azul's) workflow as well, and for
similar
>>> reasons.
>>> 
>>> 
>>> Same here.
>>> 
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/d2f607b8/attachment.html>

Justin Lebar via llvm-dev

2016-Jul-27 20:32 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Thanks for elaborating, Chris.
> Case Study 1 - Simple development on a sub-project
I explicitly addressed this workflow in my original e-mail.  I know it
was a while ago, but it sounds like it may be worth a read if you
haven't checked it out.

In the mail I described how to use sparse checkouts to create a
repository structure that functions virtually identically to what you
have today.  It takes a few copy-pastable commands to set up.  If
these few commands are a pain, we can write a script and check it in
to llvm.
> Case Study 2 - Working on a sub-project in isolation across many platforms
I am less clear on what exactly this is about, but it seems to me that
a sparse checkout would mitigate most or all of the issues you raise
here, as well.  Again, a sparse checkout is three copy-pasteable
commands.
> We should be conscious of the impact to downstream users in making
infrastructure changes like this.
I agree.  The proposal to continue the read-only llvm-mirror
repositories will help minimize the effect on read-only downstream
consumers.
> I think our loose coupling is a feature even if it makes some workflows
harder.
If this is something that you want in your checkout of the monorepo,
it is something you can have using sparse checkouts.  It takes a small
amount of one-time work on your part when you clone the repo.  If it's
a problem, we can reduce to running a single command.

I understand that running a single command still isn't zero cost to
you.  I also understand that you may not see the benefit that others
see in the monorepo.  That's cool.  But those of us who do want a
monorepo have no way to get it today, whereas those who want a
multirepo can get something that behaves very similar by configuring
their monorepo.

On Wed, Jul 27, 2016 at 12:50 PM, Chris Bieneman <beanz at apple.com>
wrote:>
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
>>
>> Thanks for your thoughts, Chris.
>>
>>> As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.
>>
>> I hope they'll join us in this thread.
>>
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
>
> I don’t agree with this assertion. I believe that if you put forth multiple
proposals, and have an articulate discussion of the merits and costs of each
solution you can create a survey that can help inform decision making. I suppose
we can agree to disagree.
>
>>
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>>
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>>
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
>
> Ok, let me describe a few workflows I’ve used in the last year that are (in
my mind) adversely impacted by a mono-repo.
>
> Case Study 1 - Simple development on a sub-project
>
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.
>
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
either I have to wrangle some crazy git or CMake foo, or when I run “ninja
compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
sucks.
>
> What this example illustrates to me is that today we have loosely coupled
projects with an occasional rev lock. Moving to a mono-repo enforces a tight
coupling that isn’t strictly required today.
>
> Case Study 2 - Working on a sub-project in isolation across many platforms
>
> I did a lot of work on Compiler-RT last year that had no direct dependency
on any other LLVM project. During the development I was working with a
Compiler-RT checkout and a build directory of just Compiler-RT. Every once in a
while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
>
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
>
> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
>
>
>>
>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>>
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.
>
> I actually think there are *a lot* more considerations we need to be making
for an infrastructure change like this. While it is true that our SCM hosting
strategy primarily impacts developers, it also impacts our users. We should be
conscious of the impact to downstream users in making infrastructure changes
like this. That is part of why the idea of a survey holds appeal to me; it would
give us the opportunity to get feedback from a much wider audience than the
current “people on llvm-dev who haven’t been scared away”.
>
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>>
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
>
> I actually think we’re both talking about solutions that require tooling,
and while we *could* be disagreeing over how much effort each tooling initiative
would require (I think they’re pretty close, so I don’t care to have that
argument), my actual disagreement with your proposal is that it is a change that
impacts developers and users universally and I don’t think that it is justified.
Simply put, I don’t feel that the benefits are substantial enough to warrant the
kind of disruptive change you’re proposing.
>
>>
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
>
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
wrong”. Bisecting across multiple git repositories isn’t a great experience. But
neither is bisecting across a half dozen separate folders in an SVN repository.
Both the submodule solution and the mono-repo solution solve this problem
equivalently well.
>
>>  Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>>
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
>
> I’m not sure what “brokenness” you’re referring to. We have a collection of
loosely connected projects by design. As a result of that intentional design
certain workflows will be impacted. I don’t think that is brokenness. I think
our loose coupling is a feature even if it makes some workflows harder.
>
> -Chris
>
>>
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>>
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I’m just now catching up on this massive thread after being on
vacation last
>>> week, and I have a few thoughts I’d like to share.
>>>
>>> First and foremost please don’t consider lack of dissent on the
thread as
>>> presence of consensus. The various git-related threads on LLVM-dev
lately
>>> have been so active and contentious that I think a lot of people
are zoning
>>> out on the conversations. As supporting evidence of this, I was
discussing
>>> this thread yesterday around the office yesterday and had quite a
few people
>>> responding something along the lines of “they’re proposing what?”.
>>>
>>> I think it would be great for us to have several different
proposals for how
>>> the git-transition could work, and have a survey to get people’s
opinions. I
>>> know this has been discussed repeatedly, and I want to put in my
vote in
>>> favor of having a survey that takes into account multiple different
>>> approaches.
>>>
>>> WRT the actual proposal in this thread, I’m strongly opposed to a
>>> mono-repository. While I understand the argument that the full
clone’s cost
>>> on disk space is minimal compared to an LLVM object directory, what
about
>>> for contributors that contribute to the smaller runtimes projects
but *not*
>>> to LLVM or Clang. A contributor that only contributes to libcxx or
>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>> order to push a patch kinda sucks.
>>>
>>> I want to point out a few workflows people may not be considering.
>>>
>>> Clang can be built against an installed LLVM. I know this workflow
is used
>>> by some people because I’ve broken it in the past and had to fix
it. With a
>>> mono-repo this workflow gets a bit more complicated because you’d
need to do
>>> sparse checkouts, and it probably means we should just nuke the
workflow
>>> entirely because there is no real value added by having it.
>>>
>>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While
for the
>>> common use case maintaining sparse repository mirrors would limit
impact of
>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>> forcing them to clone a much larger repository than necessary.
>>>
>>> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>>> libcxxabi, libunwind, and potentially any other runtime library
projects
>>> that we may create in the future.
>>>
>>> Beyond all that I want to point out that the git multi-repository
story is
>>> basically the same thing we have today with SVN except for the
absence of a
>>> monotonically increasing number that corresponds across
repositories. While
>>> admittedly you do get a linear history with using the
mono-repository, that
>>> isn’t the only way to solve the problem, and I don’t really think
that the
>>> benefit (not needing to write some tooling) justifies the increased
burden
>>> applied to contributors that don’t use the full LLVM family of
projects.
>>>
>>> I think we have some pretty strong evidence in the form of the
github fork
>>> counts (https://github.com/llvm-mirror/) that most people aren’t
using all
>>> of the LLVM projects. In fact, by that evidence Clang (the second
most
>>> popular project) is forked less than 2/3 as many times as LLVM.
>>>
>>> -Chris
>>>
>>>
>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> Even if it were possible, I would still keep my upstream checkout
>>> separate just as a safety measure, to keep from sending private
stuff
>>> upstream by accident.
>>>
>>>
>>> Just FYI, this is our (Azul's) workflow as well, and for
similar
>>> reasons.
>>>
>>>
>>> Same here.
>>>
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>

Mehdi Amini via llvm-dev

2016-Jul-27 20:33 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 1:02 PM, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Jul 27, 2016, at 11:03 AM, Bruce Hoult <bruce at hoult.org
<mailto:bruce at hoult.org>> wrote:
>> 
>> On Thu, Jul 28, 2016 at 4:47 AM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>> Beyond all that I want to point out that the git multi-repository story
is basically the same thing we have today with SVN except for the absence of a
monotonically increasing number that corresponds across repositories. While
admittedly you do get a linear history with using the mono-repository, that
isn’t the only way to solve the problem, and I don’t really think that the
benefit (not needing to write some tooling) justifies the increased burden
applied to contributors that don’t use the full LLVM family of projects.
>> 
>> What do you believe is this increased burden?
>> 
>> The entire commit history of all llvm projects in a mono-repository is
a 449 MB .git directory. It can be downloaded in about two minutes on a typical
domestic internet connection (50 Mbps).
>> 
>> If you download only a snapshot of the current HEAD commit then the
.git repository is 88 MB and takes under a minute. Any other individual commit
should be similar.
>> 
>> This doesn't seem like a big burden to me.
>> 
>> The checked out llvm source directory -- which you say is all that many
people want -- is 202 MB. That's without even building it.
>> 
>> Why is this burden unacceptable? It seems rather small to me.
> 
> It is a small burden to LLVM contributors to include everything because
LLVM is large. Compiler-RT’s entire git repository is under 18MB, LibCXX is
around 20MB, LibCXXABI is under 3MB. Those projects are frequently used without
LLVM and do not have tight coupling. Forcing developers on those projects to be
bound to LLVM is, IMO, a huge burden.
I’m pretty sure there are a lot of use-cases to checkout only libCXX, but I’d
expect most to not require commit access.
What data do you have on how much "frequently used” these projects are in a
configuration that requires commit access and where cloning the full repo would
be a burden?

— 
Mehdi

> 
>> 
>> For comparison, using svn to checkout llvm using ...
>> 
>> svn co http://llvm.org/svn/llvm-project/llvm/trunk
<http://llvm.org/svn/llvm-project/llvm/trunk> llvm
>>  
>> ... took me 1 minute 28 seconds and gives a 222 MB .svn directory, 428
MB total (so 206 MB for the source files checked out).
>> 
> 
> I am not advocating that we stay on SVN. I use Git-SVN today. My entire
reason for commenting on this thread is to point out problems I see with this
proposal as compared to the submodule proposal that Renato graciously assembled
from lots of community feedback.
> 
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/99b95a10/attachment-0001.html>

Chris Bieneman via llvm-dev

2016-Jul-28 17:21 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
> > This does not apply to libc++.  We support building the entire LLVM
suite with other C++ standard library implementations (at least libstdc++, and I
think also with Visual Studio’s implementation), so there is no dependency of
anything on libc++.  Similarly, we support building libc++ with other compilers
(in FreeBSD, we currently build it with gcc 6.1 for RISC-V, for example, where
the LLVM toolchain is not quite useable).
> >
> > The same applies to libunwind, to an even greater degree (where libc++
implements a standard API, libunwind implements a standard ABI).
> 
> I think the dependencies of lib* in LLVM are more conceptual than version
lock, but they're still there.
> 
> I agree with you in all other points, mind you, but RT needs an unwind
library as much as it needs clang. Without them, RT "can" (and indeed
does) work, but we're not providing a complete solution.
> 
> I won't *push* to bundle libunwind, libcxxabi (and ultimately libcxx)
on those merits alone, but my opinion is that we should. I can't see much
use in RT without them. That's why we're still defaulting to libgcc on
Linux.
> Renato, I just want to point out that the Compiler-RT story is *WAY* more
complicated than it might seem from your comments here. Compiler-RT is really
two or three conceptually different things that happen to be in the same
project, and parts of it are very useful without libunwind, libcxxabi, and
libcxx.

For example, the Compiler-RT sanitizers are used with GCC and libgcc. They can
be built to be used with libstdc++ as well as libc++ (although I do think that
loses some features).

I would not object to a mono-repo that included LLVM, Clang, LLD, and
Clang-Tools-Extra. I strongly object to any mono-repo that includes any of the
runtime library projects. I also think that once you move away from the
“mono-repo including all” you need to identify criteria for how you determine
which projects get included, and potentially how you evaluate adding projects to
the mono-repo.

As a straw man I would suggest the following criteria for inclusion into the
mono-repo:

(1) Projects in the mono-repo must be tightly coupled to specific versions or
commits of other projects in the mono-repo
(2) The projects in the mono-repo most provide wide benefit to the community
such that the overall community benefit outweighs the impacts of the project
being in the repo
(3) Projects in the mono-repo must conform to some defined set of standards.
LLVM’s coding standards might be a bit much, but something along those lines.

Thoughts?

-Chris
> My tuppence.
> 
> Cheers, 
> Renato
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/39c2bb92/attachment.html>

Renato Golin via llvm-dev

2016-Jul-28 17:42 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 28 July 2016 at 18:21, Chris Bieneman <beanz at apple.com>
wrote:> Renato, I just want to point out that the Compiler-RT story is *WAY* more
> complicated than it might seem from your comments here. Compiler-RT is
> really two or three conceptually different things that happen to be in the
> same project, and parts of it are very useful without libunwind, libcxxabi,
> and libcxx.
Indeed! I didn't seem to imply that was the *only* use. Just *my* main use.
:)

> I would not object to a mono-repo that included LLVM, Clang, LLD, and
> Clang-Tools-Extra. I strongly object to any mono-repo that includes any of
> the runtime library projects. I also think that once you move away from the
> “mono-repo including all” you need to identify criteria for how you
> determine which projects get included, and potentially how you evaluate
> adding projects to the mono-repo.
I agree.

I'd much prefer a mono repo that *doesn't* have RT than one that does
have it, but not libunwind/c++abi.

Some proposals said RT would be in, unwind/c++ would be out, and
that's what I found confusing.

> (1) Projects in the mono-repo must be tightly coupled to specific versions
> or commits of other projects in the mono-repo
Yup. That's the hard line we cannot cross. LLVM and Clang are
obviously in that group. Extra, RT and LLD, are fuzzy. Others are a
lot less fuzzy (in relation to LLVM only).

Parts of RT (usage) are heavily associated with libunwind and
libc++abi, and their alternatives don't all have the same cut, so
mixing and matching them is complicated. But that's orthogonal to the
monorepo decisions, and it can very well be bundled again or unbundled
even more. This one is for a future discussion.

> (2) The projects in the mono-repo most provide wide benefit to the
community
> such that the overall community benefit outweighs the impacts of the
project
> being in the repo
Yup. I (personally) think RT's builtins should fulfill that role, as
LLVM back-ends depend on the run-time library to work, but not in its
current (disorganised) form.

> (3) Projects in the mono-repo must conform to some defined set of
standards.
> LLVM’s coding standards might be a bit much, but something along those
> lines.
I'd say all projects in the LLVM official project should conform to
the coding standards and developer policies, inside or outside of the
monorepo.

I think the different between being in the monorepo or not is more
practical and logical than social.

cheers,
--renato

Justin Lebar via llvm-dev

2016-Jul-28 17:53 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Thanks again for your thoughts, Chris.
> As a straw man I would suggest the following criteria for inclusion into
the mono-repo:
>
> (1) Projects in the mono-repo must be tightly coupled to specific versions
or commits of other projects in the mono-repo
I'm fine with that, fwiw.  That was in fact the original proposal.
I'm also fine if we decide to put everything inside the monorepo.  I
think Richard Smith had some good arguments for why they belong
together.

But I am really surprised that you think this is such a big deal that
you would object to the whole monorepo if this decision doesn't go
your way.  The decision of whether or not to include these projects
affects only read-write consumers of these projects -- of which there
are relatively few people.  Read-only consumers *are entirely
unaffected by the decision*, as they can continue to use the read-only
subproject mirrors exactly as today.
> (2) The projects in the mono-repo most provide wide benefit to the
community such that the overall community benefit outweighs the impacts of the
project being in the repo
> (3) Projects in the mono-repo must conform to some defined set of
standards. LLVM’s coding standards might be a bit much, but something along
those lines.
Would you mind explaining why you think the criteria for inclusion in
the monorepo should be different than the criteria for inclusion as an
LLVM subproject?

I think these are fine criteria -- for inclusion of code as an LLVM
subproject.  But it seems to me -- and maybe I'm wrong -- that the
reason you're proposing them is that there exist today LLVM
subprojects that are version-locked to other projects but you think do
not meet these criteria, and therefore you want to exclude them from
the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
your list above.

I understand that lldb is persona non grata in some circles.  But.
It's not right to use the source code migration as a tool to revisit
an old decision like this.  That is procedurally unjust.  The relevant
decision should be, "is LLDB an LLVM subproject that is version-locked
to other subprojects, or not?"

If you feel strongly that we should reevaluate every project on the
basis of these last two criteria before including them in the
monorepo, would you mind elaborating on what exactly are the harms of
including a project that isn't up to snuff?  If you are aesthetically
displeased by a project, you can hide it using sparse checkouts.  And
nobody is going to make you build it.  At that point, the only cost I
can think of from including a project is the bytes on disk.  But since
the full history of all LLVM subprojects (excluding test-suite) is
500mb (*), surely you're not going to argue for the exclusion of (say)
lldb on the grounds of saving 25mb (or whatever)?

-Justin

(*) I'd called it 1.2gb before, but Bruce Hoult set me straight.

On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
> <llvm-dev at lists.llvm.org> wrote:
>> This does not apply to libc++.  We support building the entire LLVM
suite
>> with other C++ standard library implementations (at least libstdc++,
and I
>> think also with Visual Studio’s implementation), so there is no
dependency
>> of anything on libc++.  Similarly, we support building libc++ with
other
>> compilers (in FreeBSD, we currently build it with gcc 6.1 for RISC-V,
for
>> example, where the LLVM toolchain is not quite useable).
>>
>> The same applies to libunwind, to an even greater degree (where libc++
>> implements a standard API, libunwind implements a standard ABI).
>
> I think the dependencies of lib* in LLVM are more conceptual than version
> lock, but they're still there.
>
> I agree with you in all other points, mind you, but RT needs an unwind
> library as much as it needs clang. Without them, RT "can" (and
indeed does)
> work, but we're not providing a complete solution.
>
> I won't *push* to bundle libunwind, libcxxabi (and ultimately libcxx)
on
> those merits alone, but my opinion is that we should. I can't see much
use
> in RT without them. That's why we're still defaulting to libgcc on
Linux.
>
> Renato, I just want to point out that the Compiler-RT story is *WAY* more
> complicated than it might seem from your comments here. Compiler-RT is
> really two or three conceptually different things that happen to be in the
> same project, and parts of it are very useful without libunwind, libcxxabi,
> and libcxx.
>
> For example, the Compiler-RT sanitizers are used with GCC and libgcc. They
> can be built to be used with libstdc++ as well as libc++ (although I do
> think that loses some features).
>
> I would not object to a mono-repo that included LLVM, Clang, LLD, and
> Clang-Tools-Extra. I strongly object to any mono-repo that includes any of
> the runtime library projects. I also think that once you move away from the
> “mono-repo including all” you need to identify criteria for how you
> determine which projects get included, and potentially how you evaluate
> adding projects to the mono-repo.
>
> As a straw man I would suggest the following criteria for inclusion into
the
> mono-repo:
>
> (1) Projects in the mono-repo must be tightly coupled to specific versions
> or commits of other projects in the mono-repo
> (2) The projects in the mono-repo most provide wide benefit to the
community
> such that the overall community benefit outweighs the impacts of the
project
> being in the repo
> (3) Projects in the mono-repo must conform to some defined set of
standards.
> LLVM’s coding standards might be a bit much, but something along those
> lines.
>
> Thoughts?
>
> -Chris
>
> My tuppence.
>
> Cheers,
> Renato
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Chris Bieneman via llvm-dev

2016-Jul-28 18:28 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com>
wrote:
> 
> Thanks again for your thoughts, Chris.
> 
>> As a straw man I would suggest the following criteria for inclusion
into the mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific
versions or commits of other projects in the mono-repo
> 
> I'm fine with that, fwiw.  That was in fact the original proposal.
That is the wording of the original proposal, but I disagree that it is the
content of the original proposal. I don’t believe that Compiler-RT is tightly
coupled to LLVM at all, which is a big source of my disagreement here.
> I'm also fine if we decide to put everything inside the monorepo.  I
> think Richard Smith had some good arguments for why they belong
> together.
> 
> But I am really surprised that you think this is such a big deal that
> you would object to the whole monorepo if this decision doesn't go
> your way.  
I really hate your phrasing on this. I’m not objecting to this proposal just
because some minor decision doesn’t go my way. I think this is a very crucial
point of whether or not the monorepo solution’s benefit outweighs its cost.
> The decision of whether or not to include these projects
> affects only read-write consumers of these projects -- of which there
> are relatively few people.
Maybe there are few, but the impact is non-insignificant. Also I think the
opinions of the read-write consumers of the sub-projects being included should
count for a lot, and as a read-write consumer I don’t like this proposal if it
includes the runtime libraries.
>  Read-only consumers *are entirely
> unaffected by the decision*, as they can continue to use the read-only
> subproject mirrors exactly as today.
The existence of subproject mirrors requires someone to write and maintain the
tooling to keep those mirrors updated, and those mirrors will have all the
technical hurdles and drawbacks that a submodule repository would have.

The question here is: Do you make downstream single project users work off
potentially unreliable mirrors, or do you make the people who need a mono-repo
experience work off a potentially unreliable submodule repo?

I think the only answer anyone can reasonably give to this is that we don’t have
enough information to make a reasonable decision that maximizes the benefits to
most users while minimizing the adverse impacts. Hence why I keep saying we need
a survey to understand how *people* interact with the project and what kinds of
workflows are important. I emphasize the word “people” in that last sentence
because this decision impacts the contributors to the community, and downstream
users. We need to take all perspectives into account when making this kind of
infrastructure decision.
> 
>> (2) The projects in the mono-repo most provide wide benefit to the
community such that the overall community benefit outweighs the impacts of the
project being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of
standards. LLVM’s coding standards might be a bit much, but something along
those lines.
> 
> Would you mind explaining why you think the criteria for inclusion in
> the monorepo should be different than the criteria for inclusion as an
> LLVM subproject?
For starters, including things as LLVM subproject doesn’t require that they meet
criteria #1 in my proposal. Simply put, they don’t need to be tightly coupled to
LLVM. We have many examples of that.
> 
> I think these are fine criteria -- for inclusion of code as an LLVM
> subproject.  But it seems to me -- and maybe I'm wrong -- that the
> reason you're proposing them is that there exist today LLVM
> subprojects that are version-locked to other projects but you think do
> not meet these criteria, and therefore you want to exclude them from
> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
> your list above.
> 
> I understand that lldb is persona non grata in some circles.  But.
> It's not right to use the source code migration as a tool to revisit
> an old decision like this.  That is procedurally unjust.  The relevant
> decision should be, "is LLDB an LLVM subproject that is version-locked
> to other subprojects, or not?”
I really don’t want to debate LLDB. It is a hot issue for a lot of people, and
I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.

Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I can
tell, abandoned, but it is still an LLVM project that is tightly coupled to LLVM
versions. So it meets criteria #1. I think it fails to meet criteria #2 because
DragonEgg is basically abandoned and provides no real value to the community.
Even though the burden of a dead project on the mono-repo is minuscule, I think
there is no good reason to include DragonEgg.

Do you disagree?
> 
> If you feel strongly that we should reevaluate every project on the
> basis of these last two criteria before including them in the
> monorepo, would you mind elaborating on what exactly are the harms of
> including a project that isn't up to snuff?
Every project that is added to the mono-repo will incur a small cost to
developers in terms of the size it adds to the repository, and the tooling or
workflow adjustments to handle the change. In most cases this will be minimal,
even negligible. However I think the burden on runtime developers is
significant.
>  If you are aesthetically
> displeased by a project, you can hide it using sparse checkouts.  And
> nobody is going to make you build it.  At that point, the only cost I
> can think of from including a project is the bytes on disk.  But since
> the full history of all LLVM subprojects (excluding test-suite) is
> 500mb (*), surely you're not going to argue for the exclusion of (say)
> lldb on the grounds of saving 25mb (or whatever)?
I won’t argue over lldb at all. My arguments are from the perspective of someone
working on the runtime library projects, the burden is significant to be
included in the llvm mono-repo. While the full history of LLVM is around 500MB,
the full history of *all* the runtime projects is less than 100MB. Developers
working on libcxx or compiler-rt should not need to clone LLVM, and run commands
to do sparse checkouts. That is more burden than we should incur. Further the
setup cost of doing multiple sparse checkouts in order to approximate the
workflows we have today with decoupled projects is, IMO, unnecessary and
unreasonable.

Those arguments go away if you follow criteria that exclude runtime projects
from the mono-repo.

-Chris
> 
> -Justin
> 
> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
> 
> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>> <llvm-dev at lists.llvm.org> wrote:
>>> This does not apply to libc++.  We support building the entire LLVM
suite
>>> with other C++ standard library implementations (at least
libstdc++, and I
>>> think also with Visual Studio’s implementation), so there is no
dependency
>>> of anything on libc++.  Similarly, we support building libc++ with
other
>>> compilers (in FreeBSD, we currently build it with gcc 6.1 for
RISC-V, for
>>> example, where the LLVM toolchain is not quite useable).
>>> 
>>> The same applies to libunwind, to an even greater degree (where
libc++
>>> implements a standard API, libunwind implements a standard ABI).
>> 
>> I think the dependencies of lib* in LLVM are more conceptual than
version
>> lock, but they're still there.
>> 
>> I agree with you in all other points, mind you, but RT needs an unwind
>> library as much as it needs clang. Without them, RT "can"
(and indeed does)
>> work, but we're not providing a complete solution.
>> 
>> I won't *push* to bundle libunwind, libcxxabi (and ultimately
libcxx) on
>> those merits alone, but my opinion is that we should. I can't see
much use
>> in RT without them. That's why we're still defaulting to libgcc
on Linux.
>> 
>> Renato, I just want to point out that the Compiler-RT story is *WAY*
more
>> complicated than it might seem from your comments here. Compiler-RT is
>> really two or three conceptually different things that happen to be in
the
>> same project, and parts of it are very useful without libunwind,
libcxxabi,
>> and libcxx.
>> 
>> For example, the Compiler-RT sanitizers are used with GCC and libgcc.
They
>> can be built to be used with libstdc++ as well as libc++ (although I do
>> think that loses some features).
>> 
>> I would not object to a mono-repo that included LLVM, Clang, LLD, and
>> Clang-Tools-Extra. I strongly object to any mono-repo that includes any
of
>> the runtime library projects. I also think that once you move away from
the
>> “mono-repo including all” you need to identify criteria for how you
>> determine which projects get included, and potentially how you evaluate
>> adding projects to the mono-repo.
>> 
>> As a straw man I would suggest the following criteria for inclusion
into the
>> mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific
versions
>> or commits of other projects in the mono-repo
>> (2) The projects in the mono-repo most provide wide benefit to the
community
>> such that the overall community benefit outweighs the impacts of the
project
>> being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of
standards.
>> LLVM’s coding standards might be a bit much, but something along those
>> lines.
>> 
>> Thoughts?
>> 
>> -Chris
>> 
>> My tuppence.
>> 
>> Cheers,
>> Renato
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

Mehdi Amini via llvm-dev

2016-Jul-28 18:42 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 11:28 AM, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com>
wrote:
>> 
>> Thanks again for your thoughts, Chris.
>> 
>>> As a straw man I would suggest the following criteria for inclusion
into the mono-repo:
>>> 
>>> (1) Projects in the mono-repo must be tightly coupled to specific
versions or commits of other projects in the mono-repo
>> 
>> I'm fine with that, fwiw.  That was in fact the original proposal.
> 
> That is the wording of the original proposal, but I disagree that it is the
content of the original proposal. I don’t believe that Compiler-RT is tightly
coupled to LLVM at all, which is a big source of my disagreement here.
> 
>> I'm also fine if we decide to put everything inside the monorepo. 
I
>> think Richard Smith had some good arguments for why they belong
>> together.
>> 
>> But I am really surprised that you think this is such a big deal that
>> you would object to the whole monorepo if this decision doesn't go
>> your way.  
> 
> I really hate your phrasing on this. I’m not objecting to this proposal
just because some minor decision doesn’t go my way. I think this is a very
crucial point of whether or not the monorepo solution’s benefit outweighs its
cost.
> 
>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
> 
> Maybe there are few, but the impact is non-insignificant. Also I think the
opinions of the read-write consumers of the sub-projects being included should
count for a lot, and as a read-write consumer I don’t like this proposal if it
includes the runtime libraries.
> 
>> Read-only consumers *are entirely
>> unaffected by the decision*, as they can continue to use the read-only
>> subproject mirrors exactly as today.
> 
> The existence of subproject mirrors requires someone to write and maintain
the tooling to keep those mirrors updated, and those mirrors will have all the
technical hurdles and drawbacks that a submodule repository would have.
> 
> The question here is: Do you make downstream single project users work off
potentially unreliable mirrors, or do you make the people who need a mono-repo
experience work off a potentially unreliable submodule repo?
> 
> I think the only answer anyone can reasonably give to this is that we don’t
have enough information to make a reasonable decision that maximizes the
benefits to most users while minimizing the adverse impacts. Hence why I keep
saying we need a survey to understand how *people* interact with the project and
what kinds of workflows are important. I emphasize the word “people” in that
last sentence because this decision impacts the contributors to the community,
and downstream users. We need to take all perspectives into account when making
this kind of infrastructure decision.
You keep saying that we need a survey, but it has always been part of the plan
(I posted some reference to this yesterday in this thread) so I don’t understand
why you keep saying this indeed…
Having a survey is not contradictory with this thread. in fact I don’t believe
in a survey without having some proposals that considers how to accommodate
multiple considered workflow.

— 
Mehdi

> 
>> 
>>> (2) The projects in the mono-repo most provide wide benefit to the
community such that the overall community benefit outweighs the impacts of the
project being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of
standards. LLVM’s coding standards might be a bit much, but something along
those lines.
>> 
>> Would you mind explaining why you think the criteria for inclusion in
>> the monorepo should be different than the criteria for inclusion as an
>> LLVM subproject?
> 
> For starters, including things as LLVM subproject doesn’t require that they
meet criteria #1 in my proposal. Simply put, they don’t need to be tightly
coupled to LLVM. We have many examples of that.
> 
>> 
>> I think these are fine criteria -- for inclusion of code as an LLVM
>> subproject.  But it seems to me -- and maybe I'm wrong -- that the
>> reason you're proposing them is that there exist today LLVM
>> subprojects that are version-locked to other projects but you think do
>> not meet these criteria, and therefore you want to exclude them from
>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
>> your list above.
>> 
>> I understand that lldb is persona non grata in some circles.  But.
>> It's not right to use the source code migration as a tool to
revisit
>> an old decision like this.  That is procedurally unjust.  The relevant
>> decision should be, "is LLDB an LLVM subproject that is
version-locked
>> to other subprojects, or not?”
> 
> I really don’t want to debate LLDB. It is a hot issue for a lot of people,
and I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.
> 
> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I
can tell, abandoned, but it is still an LLVM project that is tightly coupled to
LLVM versions. So it meets criteria #1. I think it fails to meet criteria #2
because DragonEgg is basically abandoned and provides no real value to the
community. Even though the burden of a dead project on the mono-repo is
minuscule, I think there is no good reason to include DragonEgg.
> 
> Do you disagree?
> 
>> 
>> If you feel strongly that we should reevaluate every project on the
>> basis of these last two criteria before including them in the
>> monorepo, would you mind elaborating on what exactly are the harms of
>> including a project that isn't up to snuff?
> 
> Every project that is added to the mono-repo will incur a small cost to
developers in terms of the size it adds to the repository, and the tooling or
workflow adjustments to handle the change. In most cases this will be minimal,
even negligible. However I think the burden on runtime developers is
significant.
> 
>> If you are aesthetically
>> displeased by a project, you can hide it using sparse checkouts.  And
>> nobody is going to make you build it.  At that point, the only cost I
>> can think of from including a project is the bytes on disk.  But since
>> the full history of all LLVM subprojects (excluding test-suite) is
>> 500mb (*), surely you're not going to argue for the exclusion of
(say)
>> lldb on the grounds of saving 25mb (or whatever)?
> 
> I won’t argue over lldb at all. My arguments are from the perspective of
someone working on the runtime library projects, the burden is significant to be
included in the llvm mono-repo. While the full history of LLVM is around 500MB,
the full history of *all* the runtime projects is less than 100MB. Developers
working on libcxx or compiler-rt should not need to clone LLVM, and run commands
to do sparse checkouts. That is more burden than we should incur. Further the
setup cost of doing multiple sparse checkouts in order to approximate the
workflows we have today with decoupled projects is, IMO, unnecessary and
unreasonable.
> 
> Those arguments go away if you follow criteria that exclude runtime
projects from the mono-repo.
> 
> -Chris
> 
>> 
>> -Justin
>> 
>> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
>> 
>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> This does not apply to libc++.  We support building the entire
LLVM suite
>>>> with other C++ standard library implementations (at least
libstdc++, and I
>>>> think also with Visual Studio’s implementation), so there is no
dependency
>>>> of anything on libc++.  Similarly, we support building libc++
with other
>>>> compilers (in FreeBSD, we currently build it with gcc 6.1 for
RISC-V, for
>>>> example, where the LLVM toolchain is not quite useable).
>>>> 
>>>> The same applies to libunwind, to an even greater degree (where
libc++
>>>> implements a standard API, libunwind implements a standard
ABI).
>>> 
>>> I think the dependencies of lib* in LLVM are more conceptual than
version
>>> lock, but they're still there.
>>> 
>>> I agree with you in all other points, mind you, but RT needs an
unwind
>>> library as much as it needs clang. Without them, RT "can"
(and indeed does)
>>> work, but we're not providing a complete solution.
>>> 
>>> I won't *push* to bundle libunwind, libcxxabi (and ultimately
libcxx) on
>>> those merits alone, but my opinion is that we should. I can't
see much use
>>> in RT without them. That's why we're still defaulting to
libgcc on Linux.
>>> 
>>> Renato, I just want to point out that the Compiler-RT story is
*WAY* more
>>> complicated than it might seem from your comments here. Compiler-RT
is
>>> really two or three conceptually different things that happen to be
in the
>>> same project, and parts of it are very useful without libunwind,
libcxxabi,
>>> and libcxx.
>>> 
>>> For example, the Compiler-RT sanitizers are used with GCC and
libgcc. They
>>> can be built to be used with libstdc++ as well as libc++ (although
I do
>>> think that loses some features).
>>> 
>>> I would not object to a mono-repo that included LLVM, Clang, LLD,
and
>>> Clang-Tools-Extra. I strongly object to any mono-repo that includes
any of
>>> the runtime library projects. I also think that once you move away
from the
>>> “mono-repo including all” you need to identify criteria for how you
>>> determine which projects get included, and potentially how you
evaluate
>>> adding projects to the mono-repo.
>>> 
>>> As a straw man I would suggest the following criteria for inclusion
into the
>>> mono-repo:
>>> 
>>> (1) Projects in the mono-repo must be tightly coupled to specific
versions
>>> or commits of other projects in the mono-repo
>>> (2) The projects in the mono-repo most provide wide benefit to the
community
>>> such that the overall community benefit outweighs the impacts of
the project
>>> being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of
standards.
>>> LLVM’s coding standards might be a bit much, but something along
those
>>> lines.
>>> 
>>> Thoughts?
>>> 
>>> -Chris
>>> 
>>> My tuppence.
>>> 
>>> Cheers,
>>> Renato
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Bieneman via llvm-dev

2016-Jul-28 19:01 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 11:42 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> 
>> 
>> On Jul 28, 2016, at 11:28 AM, Chris Bieneman via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> 
>> 
>>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at
google.com> wrote:
>>> 
>>> Thanks again for your thoughts, Chris.
>>> 
>>>> As a straw man I would suggest the following criteria for
inclusion into the mono-repo:
>>>> 
>>>> (1) Projects in the mono-repo must be tightly coupled to
specific versions or commits of other projects in the mono-repo
>>> 
>>> I'm fine with that, fwiw.  That was in fact the original
proposal.
>> 
>> That is the wording of the original proposal, but I disagree that it is
the content of the original proposal. I don’t believe that Compiler-RT is
tightly coupled to LLVM at all, which is a big source of my disagreement here.
>> 
>>> I'm also fine if we decide to put everything inside the
monorepo.  I
>>> think Richard Smith had some good arguments for why they belong
>>> together.
>>> 
>>> But I am really surprised that you think this is such a big deal
that
>>> you would object to the whole monorepo if this decision doesn't
go
>>> your way.  
>> 
>> I really hate your phrasing on this. I’m not objecting to this proposal
just because some minor decision doesn’t go my way. I think this is a very
crucial point of whether or not the monorepo solution’s benefit outweighs its
cost.
>> 
>>> The decision of whether or not to include these projects
>>> affects only read-write consumers of these projects -- of which
there
>>> are relatively few people.
>> 
>> Maybe there are few, but the impact is non-insignificant. Also I think
the opinions of the read-write consumers of the sub-projects being included
should count for a lot, and as a read-write consumer I don’t like this proposal
if it includes the runtime libraries.
>> 
>>> Read-only consumers *are entirely
>>> unaffected by the decision*, as they can continue to use the
read-only
>>> subproject mirrors exactly as today.
>> 
>> The existence of subproject mirrors requires someone to write and
maintain the tooling to keep those mirrors updated, and those mirrors will have
all the technical hurdles and drawbacks that a submodule repository would have.
>> 
>> The question here is: Do you make downstream single project users work
off potentially unreliable mirrors, or do you make the people who need a
mono-repo experience work off a potentially unreliable submodule repo?
>> 
>> I think the only answer anyone can reasonably give to this is that we
don’t have enough information to make a reasonable decision that maximizes the
benefits to most users while minimizing the adverse impacts. Hence why I keep
saying we need a survey to understand how *people* interact with the project and
what kinds of workflows are important. I emphasize the word “people” in that
last sentence because this decision impacts the contributors to the community,
and downstream users. We need to take all perspectives into account when making
this kind of infrastructure decision.
> 
> You keep saying that we need a survey, but it has always been part of the
plan (I posted some reference to this yesterday in this thread) so I don’t
understand why you keep saying this indeed…
> Having a survey is not contradictory with this thread. in fact I don’t
believe in a survey without having some proposals that considers how to
accommodate multiple considered workflow.
I keep saying this because some of the people on this thread have advocated
*against* a survey saying it wouldn’t be useful, and because people keep making
blanket assertions that aren’t backed by data.

-Chris
> 
> — 
> Mehdi
> 
> 
>> 
>>> 
>>>> (2) The projects in the mono-repo most provide wide benefit to
the community such that the overall community benefit outweighs the impacts of
the project being in the repo
>>>> (3) Projects in the mono-repo must conform to some defined set
of standards. LLVM’s coding standards might be a bit much, but something along
those lines.
>>> 
>>> Would you mind explaining why you think the criteria for inclusion
in
>>> the monorepo should be different than the criteria for inclusion as
an
>>> LLVM subproject?
>> 
>> For starters, including things as LLVM subproject doesn’t require that
they meet criteria #1 in my proposal. Simply put, they don’t need to be tightly
coupled to LLVM. We have many examples of that.
>> 
>>> 
>>> I think these are fine criteria -- for inclusion of code as an LLVM
>>> subproject.  But it seems to me -- and maybe I'm wrong -- that
the
>>> reason you're proposing them is that there exist today LLVM
>>> subprojects that are version-locked to other projects but you think
do
>>> not meet these criteria, and therefore you want to exclude them
from
>>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't
in
>>> your list above.
>>> 
>>> I understand that lldb is persona non grata in some circles.  But.
>>> It's not right to use the source code migration as a tool to
revisit
>>> an old decision like this.  That is procedurally unjust.  The
relevant
>>> decision should be, "is LLDB an LLVM subproject that is
version-locked
>>> to other subprojects, or not?”
>> 
>> I really don’t want to debate LLDB. It is a hot issue for a lot of
people, and I’d really prefer if we didn’t start a “let’s all rag on lldb”
thread.
>> 
>> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far
as I can tell, abandoned, but it is still an LLVM project that is tightly
coupled to LLVM versions. So it meets criteria #1. I think it fails to meet
criteria #2 because DragonEgg is basically abandoned and provides no real value
to the community. Even though the burden of a dead project on the mono-repo is
minuscule, I think there is no good reason to include DragonEgg.
>> 
>> Do you disagree?
>> 
>>> 
>>> If you feel strongly that we should reevaluate every project on the
>>> basis of these last two criteria before including them in the
>>> monorepo, would you mind elaborating on what exactly are the harms
of
>>> including a project that isn't up to snuff?
>> 
>> Every project that is added to the mono-repo will incur a small cost to
developers in terms of the size it adds to the repository, and the tooling or
workflow adjustments to handle the change. In most cases this will be minimal,
even negligible. However I think the burden on runtime developers is
significant.
>> 
>>> If you are aesthetically
>>> displeased by a project, you can hide it using sparse checkouts. 
And
>>> nobody is going to make you build it.  At that point, the only cost
I
>>> can think of from including a project is the bytes on disk.  But
since
>>> the full history of all LLVM subprojects (excluding test-suite) is
>>> 500mb (*), surely you're not going to argue for the exclusion
of (say)
>>> lldb on the grounds of saving 25mb (or whatever)?
>> 
>> I won’t argue over lldb at all. My arguments are from the perspective
of someone working on the runtime library projects, the burden is significant to
be included in the llvm mono-repo. While the full history of LLVM is around
500MB, the full history of *all* the runtime projects is less than 100MB.
Developers working on libcxx or compiler-rt should not need to clone LLVM, and
run commands to do sparse checkouts. That is more burden than we should incur.
Further the setup cost of doing multiple sparse checkouts in order to
approximate the workflows we have today with decoupled projects is, IMO,
unnecessary and unreasonable.
>> 
>> Those arguments go away if you follow criteria that exclude runtime
projects from the mono-repo.
>> 
>> -Chris
>> 
>>> 
>>> -Justin
>>> 
>>> (*) I'd called it 1.2gb before, but Bruce Hoult set me
straight.
>>> 
>>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On 28 Jul 2016 8:36 a.m., "David Chisnall via
llvm-dev"
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> This does not apply to libc++.  We support building the
entire LLVM suite
>>>>> with other C++ standard library implementations (at least
libstdc++, and I
>>>>> think also with Visual Studio’s implementation), so there
is no dependency
>>>>> of anything on libc++.  Similarly, we support building
libc++ with other
>>>>> compilers (in FreeBSD, we currently build it with gcc 6.1
for RISC-V, for
>>>>> example, where the LLVM toolchain is not quite useable).
>>>>> 
>>>>> The same applies to libunwind, to an even greater degree
(where libc++
>>>>> implements a standard API, libunwind implements a standard
ABI).
>>>> 
>>>> I think the dependencies of lib* in LLVM are more conceptual
than version
>>>> lock, but they're still there.
>>>> 
>>>> I agree with you in all other points, mind you, but RT needs an
unwind
>>>> library as much as it needs clang. Without them, RT
"can" (and indeed does)
>>>> work, but we're not providing a complete solution.
>>>> 
>>>> I won't *push* to bundle libunwind, libcxxabi (and
ultimately libcxx) on
>>>> those merits alone, but my opinion is that we should. I
can't see much use
>>>> in RT without them. That's why we're still defaulting
to libgcc on Linux.
>>>> 
>>>> Renato, I just want to point out that the Compiler-RT story is
*WAY* more
>>>> complicated than it might seem from your comments here.
Compiler-RT is
>>>> really two or three conceptually different things that happen
to be in the
>>>> same project, and parts of it are very useful without
libunwind, libcxxabi,
>>>> and libcxx.
>>>> 
>>>> For example, the Compiler-RT sanitizers are used with GCC and
libgcc. They
>>>> can be built to be used with libstdc++ as well as libc++
(although I do
>>>> think that loses some features).
>>>> 
>>>> I would not object to a mono-repo that included LLVM, Clang,
LLD, and
>>>> Clang-Tools-Extra. I strongly object to any mono-repo that
includes any of
>>>> the runtime library projects. I also think that once you move
away from the
>>>> “mono-repo including all” you need to identify criteria for how
you
>>>> determine which projects get included, and potentially how you
evaluate
>>>> adding projects to the mono-repo.
>>>> 
>>>> As a straw man I would suggest the following criteria for
inclusion into the
>>>> mono-repo:
>>>> 
>>>> (1) Projects in the mono-repo must be tightly coupled to
specific versions
>>>> or commits of other projects in the mono-repo
>>>> (2) The projects in the mono-repo most provide wide benefit to
the community
>>>> such that the overall community benefit outweighs the impacts
of the project
>>>> being in the repo
>>>> (3) Projects in the mono-repo must conform to some defined set
of standards.
>>>> LLVM’s coding standards might be a bit much, but something
along those
>>>> lines.
>>>> 
>>>> Thoughts?
>>>> 
>>>> -Chris
>>>> 
>>>> My tuppence.
>>>> 
>>>> Cheers,
>>>> Renato
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/a1a92574/attachment.html>

Justin Lebar via llvm-dev

2016-Jul-28 19:05 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>
> Maybe there are few, but the impact is non-insignificant. Also I think the
opinions of the read-write consumers of the sub-projects being included should
count for a lot
I agree.
> as a read-write consumer I don’t like this proposal if it includes the
runtime libraries.
Point well-taken.
> The existence of subproject mirrors requires someone to write and maintain
the tooling to keep those mirrors updated,
I think you will find on this thread no shortage of people willing to
maintain said mirrors in exchange for getting a monorepo as the
canonical source of truth.
> and those mirrors will have all the technical hurdles and drawbacks that a
submodule repository would have.
I don't understand this.  The point of the mirrors is to allow people
to use a read-only multirepo workflow.  I agree that if one chose to
do so, one would bite all of the drawbacks of a multirepo workflow,
but...that's the point?  Maybe I'm missing something.
> The question here is: Do you make downstream single project users work off
potentially unreliable mirrors, or do you make the people who need a mono-repo
experience work off a potentially unreliable submodule repo?
I agree with the gist of this question, but I want to refine the
trade-off a bit.

With a monorepo, downstream single-project users actually have two
options.  They can work off the mirrors, or they can just download the
whole thing.  So with the monorepo, downstream single-project users
are not forced to work off noncanonical mirrors.  They are only
"forced" to do so if they are unable or unwilling to download a 500mb
repo and throw away most of it.  Which I think may actually be
relatively few people.  But what do I know?

Anyway my answer to this question has been and still is, that a
monorepo is strictly more powerful than a multirepo.

For one thing, we can atomically commit across subprojects using a
monorepo.  On IRC I've had a bunch of people just begging me for this.

Putative scripts that allow monorepo users to commit to the multirepo
would not be able to translate cross-cutting commits into a single
commit in the umbrella repository without cooperation from the script
that translates commits to the multirepos into commits in the umbrella
repository (that's the one that contains all the multirepos as git
subrepositories).  It's possible -- it's turing complete --, but it
would be very complicated.

Still more complicated would be writing a script that would allow
monorepo users to push to putative try bots that are based off the
multirepo.  Again anything is possible, but I have written and
maintained similar software in the past (for a significantly simpler
setup) and it was fragile as heck, and again this is going to require
extensive cooperation between us and the multirepo --> umbrella repo
script.

In contrast, as discussed earlier, if people want a multirepo-like
setup based on the monorepo, we can reduce this to a single command
run once when the repository is cloned.  It ends up being far less
fragile, and requiring far fewer (actually, zero) tricks on the server
side.
> Instead, let’s talk about DragonEgg.
+1.
> The DragonEgg project is, as far as I can tell, abandoned, but it is still
an LLVM project that is tightly coupled to LLVM versions. So it meets criteria
#1. I think it fails to meet criteria #2 because DragonEgg is basically
abandoned and provides no real value to the community. Even though the burden of
a dead project on the mono-repo is minuscule, I think there is no good reason to
include DragonEgg.
If DragonEgg is abandoned, I think we should keep the history in our
repository and just delete it from head.

My argument for keeping it in our history is: Suppose we go with a
monorepo, and suppose at some point in the future, some other LLVM
project -- say, lld -- became abandoned.  Would we rewrite our
monorepo history to erase all trace of lld, because it no longer
provides value to us?

No, right?  lld's history is part of our history.  We'd just delete it
from head and move on with our lives.
> My arguments are from the perspective of someone working on the runtime
library projects, the burden is significant to be included in the llvm
mono-repo. While the full history of LLVM is around 500MB, the full history of
*all* the runtime projects is less than 100MB.  Developers working on libcxx or
compiler-rt should not need to clone LLVM, and run commands to do sparse
checkouts. That is more burden than we should incur. Further the setup cost of
doing multiple sparse checkouts in order to approximate the workflows we have
today with decoupled projects is, IMO, unnecessary and unreasonable.
OK, just to make sure I understand your point here, because this is
important, you are saying that you object to including libcxx and
compiler-rt in the llvm monorepo because:

* It would consume an additional ~400mb of disk space, and
* It's unnecessary and unreasonable to ask libcxx etc. developers to
run a script when they check out the monorepo if they want a sparse
checkout and/or a setup that mirrors the multirepo.

I'm not trying to put words in your mouth or subtly change what you're
saying, so please let me know if I didn't get that right.

Thanks again for all your time here.

-Justin

On Thu, Jul 28, 2016 at 11:28 AM, Chris Bieneman <beanz at apple.com>
wrote:>
>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com>
wrote:
>>
>> Thanks again for your thoughts, Chris.
>>
>>> As a straw man I would suggest the following criteria for inclusion
into the mono-repo:
>>>
>>> (1) Projects in the mono-repo must be tightly coupled to specific
versions or commits of other projects in the mono-repo
>>
>> I'm fine with that, fwiw.  That was in fact the original proposal.
>
> That is the wording of the original proposal, but I disagree that it is the
content of the original proposal. I don’t believe that Compiler-RT is tightly
coupled to LLVM at all, which is a big source of my disagreement here.
>
>> I'm also fine if we decide to put everything inside the monorepo. 
I
>> think Richard Smith had some good arguments for why they belong
>> together.
>>
>> But I am really surprised that you think this is such a big deal that
>> you would object to the whole monorepo if this decision doesn't go
>> your way.
>
> I really hate your phrasing on this. I’m not objecting to this proposal
just because some minor decision doesn’t go my way. I think this is a very
crucial point of whether or not the monorepo solution’s benefit outweighs its
cost.
>
>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>
> Maybe there are few, but the impact is non-insignificant. Also I think the
opinions of the read-write consumers of the sub-projects being included should
count for a lot, and as a read-write consumer I don’t like this proposal if it
includes the runtime libraries.
>
>>  Read-only consumers *are entirely
>> unaffected by the decision*, as they can continue to use the read-only
>> subproject mirrors exactly as today.
>
> The existence of subproject mirrors requires someone to write and maintain
the tooling to keep those mirrors updated, and those mirrors will have all the
technical hurdles and drawbacks that a submodule repository would have.
>
> The question here is: Do you make downstream single project users work off
potentially unreliable mirrors, or do you make the people who need a mono-repo
experience work off a potentially unreliable submodule repo?
>
> I think the only answer anyone can reasonably give to this is that we don’t
have enough information to make a reasonable decision that maximizes the
benefits to most users while minimizing the adverse impacts. Hence why I keep
saying we need a survey to understand how *people* interact with the project and
what kinds of workflows are important. I emphasize the word “people” in that
last sentence because this decision impacts the contributors to the community,
and downstream users. We need to take all perspectives into account when making
this kind of infrastructure decision.
>
>>
>>> (2) The projects in the mono-repo most provide wide benefit to the
community such that the overall community benefit outweighs the impacts of the
project being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of
standards. LLVM’s coding standards might be a bit much, but something along
those lines.
>>
>> Would you mind explaining why you think the criteria for inclusion in
>> the monorepo should be different than the criteria for inclusion as an
>> LLVM subproject?
>
> For starters, including things as LLVM subproject doesn’t require that they
meet criteria #1 in my proposal. Simply put, they don’t need to be tightly
coupled to LLVM. We have many examples of that.
>
>>
>> I think these are fine criteria -- for inclusion of code as an LLVM
>> subproject.  But it seems to me -- and maybe I'm wrong -- that the
>> reason you're proposing them is that there exist today LLVM
>> subprojects that are version-locked to other projects but you think do
>> not meet these criteria, and therefore you want to exclude them from
>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
>> your list above.
>>
>> I understand that lldb is persona non grata in some circles.  But.
>> It's not right to use the source code migration as a tool to
revisit
>> an old decision like this.  That is procedurally unjust.  The relevant
>> decision should be, "is LLDB an LLVM subproject that is
version-locked
>> to other subprojects, or not?”
>
> I really don’t want to debate LLDB. It is a hot issue for a lot of people,
and I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.
>
> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I
can tell, abandoned, but it is still an LLVM project that is tightly coupled to
LLVM versions. So it meets criteria #1. I think it fails to meet criteria #2
because DragonEgg is basically abandoned and provides no real value to the
community. Even though the burden of a dead project on the mono-repo is
minuscule, I think there is no good reason to include DragonEgg.
>
> Do you disagree?
>
>>
>> If you feel strongly that we should reevaluate every project on the
>> basis of these last two criteria before including them in the
>> monorepo, would you mind elaborating on what exactly are the harms of
>> including a project that isn't up to snuff?
>
> Every project that is added to the mono-repo will incur a small cost to
developers in terms of the size it adds to the repository, and the tooling or
workflow adjustments to handle the change. In most cases this will be minimal,
even negligible. However I think the burden on runtime developers is
significant.
>
>>  If you are aesthetically
>> displeased by a project, you can hide it using sparse checkouts.  And
>> nobody is going to make you build it.  At that point, the only cost I
>> can think of from including a project is the bytes on disk.  But since
>> the full history of all LLVM subprojects (excluding test-suite) is
>> 500mb (*), surely you're not going to argue for the exclusion of
(say)
>> lldb on the grounds of saving 25mb (or whatever)?
>
> I won’t argue over lldb at all. My arguments are from the perspective of
someone working on the runtime library projects, the burden is significant to be
included in the llvm mono-repo. While the full history of LLVM is around 500MB,
the full history of *all* the runtime projects is less than 100MB. Developers
working on libcxx or compiler-rt should not need to clone LLVM, and run commands
to do sparse checkouts. That is more burden than we should incur. Further the
setup cost of doing multiple sparse checkouts in order to approximate the
workflows we have today with decoupled projects is, IMO, unnecessary and
unreasonable.
>
> Those arguments go away if you follow criteria that exclude runtime
projects from the mono-repo.
>
> -Chris
>
>>
>> -Justin
>>
>> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
>>
>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> This does not apply to libc++.  We support building the entire
LLVM suite
>>>> with other C++ standard library implementations (at least
libstdc++, and I
>>>> think also with Visual Studio’s implementation), so there is no
dependency
>>>> of anything on libc++.  Similarly, we support building libc++
with other
>>>> compilers (in FreeBSD, we currently build it with gcc 6.1 for
RISC-V, for
>>>> example, where the LLVM toolchain is not quite useable).
>>>>
>>>> The same applies to libunwind, to an even greater degree (where
libc++
>>>> implements a standard API, libunwind implements a standard
ABI).
>>>
>>> I think the dependencies of lib* in LLVM are more conceptual than
version
>>> lock, but they're still there.
>>>
>>> I agree with you in all other points, mind you, but RT needs an
unwind
>>> library as much as it needs clang. Without them, RT "can"
(and indeed does)
>>> work, but we're not providing a complete solution.
>>>
>>> I won't *push* to bundle libunwind, libcxxabi (and ultimately
libcxx) on
>>> those merits alone, but my opinion is that we should. I can't
see much use
>>> in RT without them. That's why we're still defaulting to
libgcc on Linux.
>>>
>>> Renato, I just want to point out that the Compiler-RT story is
*WAY* more
>>> complicated than it might seem from your comments here. Compiler-RT
is
>>> really two or three conceptually different things that happen to be
in the
>>> same project, and parts of it are very useful without libunwind,
libcxxabi,
>>> and libcxx.
>>>
>>> For example, the Compiler-RT sanitizers are used with GCC and
libgcc. They
>>> can be built to be used with libstdc++ as well as libc++ (although
I do
>>> think that loses some features).
>>>
>>> I would not object to a mono-repo that included LLVM, Clang, LLD,
and
>>> Clang-Tools-Extra. I strongly object to any mono-repo that includes
any of
>>> the runtime library projects. I also think that once you move away
from the
>>> “mono-repo including all” you need to identify criteria for how you
>>> determine which projects get included, and potentially how you
evaluate
>>> adding projects to the mono-repo.
>>>
>>> As a straw man I would suggest the following criteria for inclusion
into the
>>> mono-repo:
>>>
>>> (1) Projects in the mono-repo must be tightly coupled to specific
versions
>>> or commits of other projects in the mono-repo
>>> (2) The projects in the mono-repo most provide wide benefit to the
community
>>> such that the overall community benefit outweighs the impacts of
the project
>>> being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of
standards.
>>> LLVM’s coding standards might be a bit much, but something along
those
>>> lines.
>>>
>>> Thoughts?
>>>
>>> -Chris
>>>
>>> My tuppence.
>>>
>>> Cheers,
>>> Renato
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>

Chris Bieneman via llvm-dev

2016-Jul-28 20:41 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 12:05 PM, Justin Lebar <jlebar at google.com>
wrote:
> 
>>> The decision of whether or not to include these projects
>>> affects only read-write consumers of these projects -- of which
there
>>> are relatively few people.
>> 
>> Maybe there are few, but the impact is non-insignificant. Also I think
the opinions of the read-write consumers of the sub-projects being included
should count for a lot
> 
> I agree.
> 
>> as a read-write consumer I don’t like this proposal if it includes the
runtime libraries.
> 
> Point well-taken.
> 
>> The existence of subproject mirrors requires someone to write and
maintain the tooling to keep those mirrors updated,
> 
> I think you will find on this thread no shortage of people willing to
> maintain said mirrors in exchange for getting a monorepo as the
> canonical source of truth.
Ok. Money where my mouth is time.

Submodule repo: 

https://github.com/llvm-beanz/llvm-submodules
<https://github.com/llvm-beanz/llvm-submodules>

Bot auto-updating it:

http://beanz-bot.com:8180/jenkins/job/submodule-update/
<http://beanz-bot.com:8180/jenkins/job/submodule-update/>

If we go down this path improvements can be made to the bot so that each
submodule update commit only includes one submodule update. That would be fairly
simple to add.
> 
>> and those mirrors will have all the technical hurdles and drawbacks
that a submodule repository would have.
> 
> I don't understand this.  The point of the mirrors is to allow people
> to use a read-only multirepo workflow.  I agree that if one chose to
> do so, one would bite all of the drawbacks of a multirepo workflow,
> but...that's the point?  Maybe I'm missing something.
What I’m referring to is that since we don’t have the ability to run server-side
hooks on github the submodule repositories will have some complications because
they can’t automatically be updated, and the infrastructure to do so would have
multiple points of failure.

This limitation in github hosting was discussed in at least one of the github
related threads.
> 
>> The question here is: Do you make downstream single project users work
off potentially unreliable mirrors, or do you make the people who need a
mono-repo experience work off a potentially unreliable submodule repo?
> 
> I agree with the gist of this question, but I want to refine the
> trade-off a bit.
> 
> With a monorepo, downstream single-project users actually have two
> options.  They can work off the mirrors, or they can just download the
> whole thing.  So with the monorepo, downstream single-project users
> are not forced to work off noncanonical mirrors.  They are only
> "forced" to do so if they are unable or unwilling to download a
500mb
> repo and throw away most of it.  Which I think may actually be
> relatively few people.  But what do I know?
I think we have evidence that many of our projects are used in isolation by
relatively large numbers of users. Whether or not those users would be
sufficiently inconvenienced to do something about a mono-repo is a harder thing
to know.

In the submodule approach this isn’t really an issue because users will continue
to work as they always have with the per-project repositories, and the
developers who need bisecting capabilities can clone the submodule repo, which
can also be used as read-write for making changes to the subprojects.
> 
> Anyway my answer to this question has been and still is, that a
> monorepo is strictly more powerful than a multirepo.
> 
> For one thing, we can atomically commit across subprojects using a
> monorepo.  On IRC I've had a bunch of people just begging me for this.
> 
> Putative scripts that allow monorepo users to commit to the multirepo
> would not be able to translate cross-cutting commits into a single
> commit in the umbrella repository without cooperation from the script
> that translates commits to the multirepos into commits in the umbrella
> repository (that's the one that contains all the multirepos as git
> subrepositories).  It's possible -- it's turing complete --, but it
> would be very complicated.
> 
> Still more complicated would be writing a script that would allow
> monorepo users to push to putative try bots that are based off the
> multirepo.  Again anything is possible, but I have written and
> maintained similar software in the past (for a significantly simpler
> setup) and it was fragile as heck, and again this is going to require
> extensive cooperation between us and the multirepo --> umbrella repo
> script.
For cross-repository changes I am fairly certain you could construct something
that can be pushed to a try bot based on the submodule repository. There is no
technical reason that shouldn’t work, and I don’t even think the scripting
around that would be terribly complicated. Admittedly that is more complicated
than just writing a pull request to a single repository, but I suspect not much.
I may look into that.
> 
> In contrast, as discussed earlier, if people want a multirepo-like
> setup based on the monorepo, we can reduce this to a single command
> run once when the repository is cloned.  It ends up being far less
> fragile, and requiring far fewer (actually, zero) tricks on the server
> side.
The only thing a monorepo gets you that strictly isn’t possible without it is
the ability to commit to multiple projects in a single commit. Personally I
don’t think that is a big enough justification, but that is my opinion, not a
fact.
> 
>> Instead, let’s talk about DragonEgg.
> 
> +1.
> 
>> The DragonEgg project is, as far as I can tell, abandoned, but it is
still an LLVM project that is tightly coupled to LLVM versions. So it meets
criteria #1. I think it fails to meet criteria #2 because DragonEgg is basically
abandoned and provides no real value to the community. Even though the burden of
a dead project on the mono-repo is minuscule, I think there is no good reason to
include DragonEgg.
> 
> If DragonEgg is abandoned, I think we should keep the history in our
> repository and just delete it from head.
> 
> My argument for keeping it in our history is: Suppose we go with a
> monorepo, and suppose at some point in the future, some other LLVM
> project -- say, lld -- became abandoned.  Would we rewrite our
> monorepo history to erase all trace of lld, because it no longer
> provides value to us?
> 
> No, right?  lld's history is part of our history.  We'd just delete
it
> from head and move on with our lives.
> 
>> My arguments are from the perspective of someone working on the runtime
library projects, the burden is significant to be included in the llvm
mono-repo. While the full history of LLVM is around 500MB, the full history of
*all* the runtime projects is less than 100MB.  Developers working on libcxx or
compiler-rt should not need to clone LLVM, and run commands to do sparse
checkouts. That is more burden than we should incur. Further the setup cost of
doing multiple sparse checkouts in order to approximate the workflows we have
today with decoupled projects is, IMO, unnecessary and unreasonable.
> 
> OK, just to make sure I understand your point here, because this is
> important, you are saying that you object to including libcxx and
> compiler-rt in the llvm monorepo because:
> 
> * It would consume an additional ~400mb of disk space, and
> * It's unnecessary and unreasonable to ask libcxx etc. developers to
> run a script when they check out the monorepo if they want a sparse
> checkout and/or a setup that mirrors the multirepo.
> 
> I'm not trying to put words in your mouth or subtly change what
you're
> saying, so please let me know if I didn't get that right.
I have a lot of arguments against the runtime libraries being included. First
and foremost, they don’t meet the “tightly coupled” criteria. Also, yes, an
extra 400 MB of disk space when the repository for libcxx is only ~20 MB is a
big deal to me. You’re not talking about a 10% or 20% increase in repository
size, you’re talking about a 20x increase in repository size. That is a burden.

To me, needing to run a script to do sparse checkouts is also a burden.
Similarly I think that running a script to bisect a submodule repository (which
is my proposal) is also a burden. I can’t judge which burden is more significant
because I don’t know how many people bisect. What I can say is that it is my
belief that I’m not the only person who works on runtime projects in isolation.
As a potential example (because I don’t want to put words into anyone’s mouth),
Marshal Clow and Eric Fiselier have made *a ton* of contributions to libcxx over
the last year, but neither of them are frequent contributors to LLVM or Clang.

While it is true that Clang developers may want or need the runtime libraries,
the runtime library developers frequently don’t need clang. I really don’t want
a solution that makes the lives of Clang developers easier at the expense of
other subprojects unless it is strictly necessary and for a common “greater
good”.

-Chris
> 
> Thanks again for all your time here.
> 
> -Justin
> 
> On Thu, Jul 28, 2016 at 11:28 AM, Chris Bieneman <beanz at apple.com>
wrote:
>> 
>>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at
google.com> wrote:
>>> 
>>> Thanks again for your thoughts, Chris.
>>> 
>>>> As a straw man I would suggest the following criteria for
inclusion into the mono-repo:
>>>> 
>>>> (1) Projects in the mono-repo must be tightly coupled to
specific versions or commits of other projects in the mono-repo
>>> 
>>> I'm fine with that, fwiw.  That was in fact the original
proposal.
>> 
>> That is the wording of the original proposal, but I disagree that it is
the content of the original proposal. I don’t believe that Compiler-RT is
tightly coupled to LLVM at all, which is a big source of my disagreement here.
>> 
>>> I'm also fine if we decide to put everything inside the
monorepo.  I
>>> think Richard Smith had some good arguments for why they belong
>>> together.
>>> 
>>> But I am really surprised that you think this is such a big deal
that
>>> you would object to the whole monorepo if this decision doesn't
go
>>> your way.
>> 
>> I really hate your phrasing on this. I’m not objecting to this proposal
just because some minor decision doesn’t go my way. I think this is a very
crucial point of whether or not the monorepo solution’s benefit outweighs its
cost.
>> 
>>> The decision of whether or not to include these projects
>>> affects only read-write consumers of these projects -- of which
there
>>> are relatively few people.
>> 
>> Maybe there are few, but the impact is non-insignificant. Also I think
the opinions of the read-write consumers of the sub-projects being included
should count for a lot, and as a read-write consumer I don’t like this proposal
if it includes the runtime libraries.
>> 
>>> Read-only consumers *are entirely
>>> unaffected by the decision*, as they can continue to use the
read-only
>>> subproject mirrors exactly as today.
>> 
>> The existence of subproject mirrors requires someone to write and
maintain the tooling to keep those mirrors updated, and those mirrors will have
all the technical hurdles and drawbacks that a submodule repository would have.
>> 
>> The question here is: Do you make downstream single project users work
off potentially unreliable mirrors, or do you make the people who need a
mono-repo experience work off a potentially unreliable submodule repo?
>> 
>> I think the only answer anyone can reasonably give to this is that we
don’t have enough information to make a reasonable decision that maximizes the
benefits to most users while minimizing the adverse impacts. Hence why I keep
saying we need a survey to understand how *people* interact with the project and
what kinds of workflows are important. I emphasize the word “people” in that
last sentence because this decision impacts the contributors to the community,
and downstream users. We need to take all perspectives into account when making
this kind of infrastructure decision.
>> 
>>> 
>>>> (2) The projects in the mono-repo most provide wide benefit to
the community such that the overall community benefit outweighs the impacts of
the project being in the repo
>>>> (3) Projects in the mono-repo must conform to some defined set
of standards. LLVM’s coding standards might be a bit much, but something along
those lines.
>>> 
>>> Would you mind explaining why you think the criteria for inclusion
in
>>> the monorepo should be different than the criteria for inclusion as
an
>>> LLVM subproject?
>> 
>> For starters, including things as LLVM subproject doesn’t require that
they meet criteria #1 in my proposal. Simply put, they don’t need to be tightly
coupled to LLVM. We have many examples of that.
>> 
>>> 
>>> I think these are fine criteria -- for inclusion of code as an LLVM
>>> subproject.  But it seems to me -- and maybe I'm wrong -- that
the
>>> reason you're proposing them is that there exist today LLVM
>>> subprojects that are version-locked to other projects but you think
do
>>> not meet these criteria, and therefore you want to exclude them
from
>>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't
in
>>> your list above.
>>> 
>>> I understand that lldb is persona non grata in some circles.  But.
>>> It's not right to use the source code migration as a tool to
revisit
>>> an old decision like this.  That is procedurally unjust.  The
relevant
>>> decision should be, "is LLDB an LLVM subproject that is
version-locked
>>> to other subprojects, or not?”
>> 
>> I really don’t want to debate LLDB. It is a hot issue for a lot of
people, and I’d really prefer if we didn’t start a “let’s all rag on lldb”
thread.
>> 
>> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far
as I can tell, abandoned, but it is still an LLVM project that is tightly
coupled to LLVM versions. So it meets criteria #1. I think it fails to meet
criteria #2 because DragonEgg is basically abandoned and provides no real value
to the community. Even though the burden of a dead project on the mono-repo is
minuscule, I think there is no good reason to include DragonEgg.
>> 
>> Do you disagree?
>> 
>>> 
>>> If you feel strongly that we should reevaluate every project on the
>>> basis of these last two criteria before including them in the
>>> monorepo, would you mind elaborating on what exactly are the harms
of
>>> including a project that isn't up to snuff?
>> 
>> Every project that is added to the mono-repo will incur a small cost to
developers in terms of the size it adds to the repository, and the tooling or
workflow adjustments to handle the change. In most cases this will be minimal,
even negligible. However I think the burden on runtime developers is
significant.
>> 
>>> If you are aesthetically
>>> displeased by a project, you can hide it using sparse checkouts. 
And
>>> nobody is going to make you build it.  At that point, the only cost
I
>>> can think of from including a project is the bytes on disk.  But
since
>>> the full history of all LLVM subprojects (excluding test-suite) is
>>> 500mb (*), surely you're not going to argue for the exclusion
of (say)
>>> lldb on the grounds of saving 25mb (or whatever)?
>> 
>> I won’t argue over lldb at all. My arguments are from the perspective
of someone working on the runtime library projects, the burden is significant to
be included in the llvm mono-repo. While the full history of LLVM is around
500MB, the full history of *all* the runtime projects is less than 100MB.
Developers working on libcxx or compiler-rt should not need to clone LLVM, and
run commands to do sparse checkouts. That is more burden than we should incur.
Further the setup cost of doing multiple sparse checkouts in order to
approximate the workflows we have today with decoupled projects is, IMO,
unnecessary and unreasonable.
>> 
>> Those arguments go away if you follow criteria that exclude runtime
projects from the mono-repo.
>> 
>> -Chris
>> 
>>> 
>>> -Justin
>>> 
>>> (*) I'd called it 1.2gb before, but Bruce Hoult set me
straight.
>>> 
>>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On 28 Jul 2016 8:36 a.m., "David Chisnall via
llvm-dev"
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> This does not apply to libc++.  We support building the
entire LLVM suite
>>>>> with other C++ standard library implementations (at least
libstdc++, and I
>>>>> think also with Visual Studio’s implementation), so there
is no dependency
>>>>> of anything on libc++.  Similarly, we support building
libc++ with other
>>>>> compilers (in FreeBSD, we currently build it with gcc 6.1
for RISC-V, for
>>>>> example, where the LLVM toolchain is not quite useable).
>>>>> 
>>>>> The same applies to libunwind, to an even greater degree
(where libc++
>>>>> implements a standard API, libunwind implements a standard
ABI).
>>>> 
>>>> I think the dependencies of lib* in LLVM are more conceptual
than version
>>>> lock, but they're still there.
>>>> 
>>>> I agree with you in all other points, mind you, but RT needs an
unwind
>>>> library as much as it needs clang. Without them, RT
"can" (and indeed does)
>>>> work, but we're not providing a complete solution.
>>>> 
>>>> I won't *push* to bundle libunwind, libcxxabi (and
ultimately libcxx) on
>>>> those merits alone, but my opinion is that we should. I
can't see much use
>>>> in RT without them. That's why we're still defaulting
to libgcc on Linux.
>>>> 
>>>> Renato, I just want to point out that the Compiler-RT story is
*WAY* more
>>>> complicated than it might seem from your comments here.
Compiler-RT is
>>>> really two or three conceptually different things that happen
to be in the
>>>> same project, and parts of it are very useful without
libunwind, libcxxabi,
>>>> and libcxx.
>>>> 
>>>> For example, the Compiler-RT sanitizers are used with GCC and
libgcc. They
>>>> can be built to be used with libstdc++ as well as libc++
(although I do
>>>> think that loses some features).
>>>> 
>>>> I would not object to a mono-repo that included LLVM, Clang,
LLD, and
>>>> Clang-Tools-Extra. I strongly object to any mono-repo that
includes any of
>>>> the runtime library projects. I also think that once you move
away from the
>>>> “mono-repo including all” you need to identify criteria for how
you
>>>> determine which projects get included, and potentially how you
evaluate
>>>> adding projects to the mono-repo.
>>>> 
>>>> As a straw man I would suggest the following criteria for
inclusion into the
>>>> mono-repo:
>>>> 
>>>> (1) Projects in the mono-repo must be tightly coupled to
specific versions
>>>> or commits of other projects in the mono-repo
>>>> (2) The projects in the mono-repo most provide wide benefit to
the community
>>>> such that the overall community benefit outweighs the impacts
of the project
>>>> being in the repo
>>>> (3) Projects in the mono-repo must conform to some defined set
of standards.
>>>> LLVM’s coding standards might be a bit much, but something
along those
>>>> lines.
>>>> 
>>>> Thoughts?
>>>> 
>>>> -Chris
>>>> 
>>>> My tuppence.
>>>> 
>>>> Cheers,
>>>> Renato
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/a5b99d93/attachment.html>

Renato Golin via llvm-dev

2016-Jul-28 20:47 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 28 July 2016 at 21:41, Chris Bieneman <beanz at apple.com>
wrote:> Ok. Money where my mouth is time.
> Submodule repo:
> https://github.com/llvm-beanz/llvm-submodules
>
> Bot auto-updating it:
> http://beanz-bot.com:8180/jenkins/job/submodule-update/
>
> If we go down this path improvements can be made to the bot so that each
> submodule update commit only includes one submodule update. That would be
> fairly simple to add.
Nice! Thanks Chris!

I'll update the GitHubSubModules proposal with those links.

cheers,
--renato

Justin Lebar via llvm-dev

2016-Jul-28 21:07 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Chris,

What I notice in your latest e-mail -- and I don't know if this is
intentional, so sorry if I'm reading too much into it -- is that the
language has switched from "an unwarranted and unacceptable burden" to
"a burden":
> Also, yes, an extra 400 MB of disk space when the repository for libcxx is
only ~20 MB is a big deal to me. You’re not talking about a 10% or 20% increase
in repository size, you’re talking about a 20x increase in repository size. That
is a burden.
>
> To me, needing to run a script to do sparse checkouts is also a burden.
Similarly I think that running a script to bisect a submodule repository (which
is my proposal) is also a burden.
I have to admit that I'm pleased to see this softening of language,
because it's much easier for me to agree with this.  Yes, 400mb has a
nonzero cost (although, to nitpick, I don't think that the
multiplicative increase in space is germane).  Yes, running a script
has a nonzero cost.  Totally agree.

It sounds like we also agree that these costs should be weighed
against the benefits accrued to others, and that furthermore that
we'll ultimately want to get wider input from the community about how
much to weigh (to use synecdoche) your 400mb versus my workflow
convenience.
> The only thing a monorepo gets you that strictly isn’t possible without it
is the ability to commit to multiple projects in a single commit. Personally I
don’t think that is a big enough justification, but that is my opinion, not a
fact.
When pushing a set of multiple patches to one subproject, we very
explicitly want every patch in the sequence to build.  So to me, it
seems like we should want this same property for changes that affect
multiple subprojects.  Choosing a repository structure such that it's
impossible to achieve this in general seems Bad.

But that's just me.
> While it is true that Clang developers may want or need the runtime
libraries, the runtime library developers frequently don’t need clang. I really
don’t want a solution that makes the lives of Clang developers easier at the
expense of other subprojects unless it is strictly necessary and for a common
“greater good”.
I too want to work towards a common greater good.  We may disagree
about "strictly necessary", but maybe we can set that aside for now.

It does seem that, although you may not be crazy about the monorepo,
you wouldn't come out swinging against it if it didn't include the
runtime libraries.  I'd call that major progress.

On Thu, Jul 28, 2016 at 1:41 PM, Chris Bieneman <beanz at apple.com>
wrote:>
> On Jul 28, 2016, at 12:05 PM, Justin Lebar <jlebar at google.com>
wrote:
>
> The decision of whether or not to include these projects
> affects only read-write consumers of these projects -- of which there
> are relatively few people.
>
>
> Maybe there are few, but the impact is non-insignificant. Also I think the
> opinions of the read-write consumers of the sub-projects being included
> should count for a lot
>
>
> I agree.
>
> as a read-write consumer I don’t like this proposal if it includes the
> runtime libraries.
>
>
> Point well-taken.
>
> The existence of subproject mirrors requires someone to write and maintain
> the tooling to keep those mirrors updated,
>
>
> I think you will find on this thread no shortage of people willing to
> maintain said mirrors in exchange for getting a monorepo as the
> canonical source of truth.
>
>
> Ok. Money where my mouth is time.
>
> Submodule repo:
>
> https://github.com/llvm-beanz/llvm-submodules
>
> Bot auto-updating it:
>
> http://beanz-bot.com:8180/jenkins/job/submodule-update/
>
> If we go down this path improvements can be made to the bot so that each
> submodule update commit only includes one submodule update. That would be
> fairly simple to add.
>
>
> and those mirrors will have all the technical hurdles and drawbacks that a
> submodule repository would have.
>
>
> I don't understand this.  The point of the mirrors is to allow people
> to use a read-only multirepo workflow.  I agree that if one chose to
> do so, one would bite all of the drawbacks of a multirepo workflow,
> but...that's the point?  Maybe I'm missing something.
>
>
> What I’m referring to is that since we don’t have the ability to run
> server-side hooks on github the submodule repositories will have some
> complications because they can’t automatically be updated, and the
> infrastructure to do so would have multiple points of failure.
>
> This limitation in github hosting was discussed in at least one of the
> github related threads.
>
>
> The question here is: Do you make downstream single project users work off
> potentially unreliable mirrors, or do you make the people who need a
> mono-repo experience work off a potentially unreliable submodule repo?
>
>
> I agree with the gist of this question, but I want to refine the
> trade-off a bit.
>
> With a monorepo, downstream single-project users actually have two
> options.  They can work off the mirrors, or they can just download the
> whole thing.  So with the monorepo, downstream single-project users
> are not forced to work off noncanonical mirrors.  They are only
> "forced" to do so if they are unable or unwilling to download a
500mb
> repo and throw away most of it.  Which I think may actually be
> relatively few people.  But what do I know?
>
>
> I think we have evidence that many of our projects are used in isolation by
> relatively large numbers of users. Whether or not those users would be
> sufficiently inconvenienced to do something about a mono-repo is a harder
> thing to know.
>
> In the submodule approach this isn’t really an issue because users will
> continue to work as they always have with the per-project repositories, and
> the developers who need bisecting capabilities can clone the submodule
repo,
> which can also be used as read-write for making changes to the subprojects.
>
>
> Anyway my answer to this question has been and still is, that a
> monorepo is strictly more powerful than a multirepo.
>
>
> For one thing, we can atomically commit across subprojects using a
> monorepo.  On IRC I've had a bunch of people just begging me for this.
>
> Putative scripts that allow monorepo users to commit to the multirepo
> would not be able to translate cross-cutting commits into a single
> commit in the umbrella repository without cooperation from the script
> that translates commits to the multirepos into commits in the umbrella
> repository (that's the one that contains all the multirepos as git
> subrepositories).  It's possible -- it's turing complete --, but it
> would be very complicated.
>
> Still more complicated would be writing a script that would allow
> monorepo users to push to putative try bots that are based off the
> multirepo.  Again anything is possible, but I have written and
> maintained similar software in the past (for a significantly simpler
> setup) and it was fragile as heck, and again this is going to require
> extensive cooperation between us and the multirepo --> umbrella repo
> script.
>
>
> For cross-repository changes I am fairly certain you could construct
> something that can be pushed to a try bot based on the submodule
repository.
> There is no technical reason that shouldn’t work, and I don’t even think
the
> scripting around that would be terribly complicated. Admittedly that is
more
> complicated than just writing a pull request to a single repository, but I
> suspect not much. I may look into that.
>
>
> In contrast, as discussed earlier, if people want a multirepo-like
> setup based on the monorepo, we can reduce this to a single command
> run once when the repository is cloned.  It ends up being far less
> fragile, and requiring far fewer (actually, zero) tricks on the server
> side.
>
>
> The only thing a monorepo gets you that strictly isn’t possible without it
> is the ability to commit to multiple projects in a single commit.
Personally
> I don’t think that is a big enough justification, but that is my opinion,
> not a fact.
>
>
> Instead, let’s talk about DragonEgg.
>
>
> +1.
>
> The DragonEgg project is, as far as I can tell, abandoned, but it is still
> an LLVM project that is tightly coupled to LLVM versions. So it meets
> criteria #1. I think it fails to meet criteria #2 because DragonEgg is
> basically abandoned and provides no real value to the community. Even
though
> the burden of a dead project on the mono-repo is minuscule, I think there
is
> no good reason to include DragonEgg.
>
>
> If DragonEgg is abandoned, I think we should keep the history in our
> repository and just delete it from head.
>
> My argument for keeping it in our history is: Suppose we go with a
> monorepo, and suppose at some point in the future, some other LLVM
> project -- say, lld -- became abandoned.  Would we rewrite our
> monorepo history to erase all trace of lld, because it no longer
> provides value to us?
>
> No, right?  lld's history is part of our history.  We'd just delete
it
> from head and move on with our lives.
>
> My arguments are from the perspective of someone working on the runtime
> library projects, the burden is significant to be included in the llvm
> mono-repo. While the full history of LLVM is around 500MB, the full history
> of *all* the runtime projects is less than 100MB.  Developers working on
> libcxx or compiler-rt should not need to clone LLVM, and run commands to do
> sparse checkouts. That is more burden than we should incur. Further the
> setup cost of doing multiple sparse checkouts in order to approximate the
> workflows we have today with decoupled projects is, IMO, unnecessary and
> unreasonable.
>
>
> OK, just to make sure I understand your point here, because this is
> important, you are saying that you object to including libcxx and
> compiler-rt in the llvm monorepo because:
>
> * It would consume an additional ~400mb of disk space, and
> * It's unnecessary and unreasonable to ask libcxx etc. developers to
> run a script when they check out the monorepo if they want a sparse
> checkout and/or a setup that mirrors the multirepo.
>
>
> I'm not trying to put words in your mouth or subtly change what
you're
> saying, so please let me know if I didn't get that right.
>
>
> I have a lot of arguments against the runtime libraries being included.
> First and foremost, they don’t meet the “tightly coupled” criteria. Also,
> yes, an extra 400 MB of disk space when the repository for libcxx is only
> ~20 MB is a big deal to me. You’re not talking about a 10% or 20% increase
> in repository size, you’re talking about a 20x increase in repository size.
> That is a burden.
>
> To me, needing to run a script to do sparse checkouts is also a burden.
> Similarly I think that running a script to bisect a submodule repository
> (which is my proposal) is also a burden. I can’t judge which burden is more
> significant because I don’t know how many people bisect. What I can say is
> that it is my belief that I’m not the only person who works on runtime
> projects in isolation. As a potential example (because I don’t want to put
> words into anyone’s mouth), Marshal Clow and Eric Fiselier have made *a
ton*
> of contributions to libcxx over the last year, but neither of them are
> frequent contributors to LLVM or Clang.
>
> While it is true that Clang developers may want or need the runtime
> libraries, the runtime library developers frequently don’t need clang. I
> really don’t want a solution that makes the lives of Clang developers
easier
> at the expense of other subprojects unless it is strictly necessary and for
> a common “greater good”.
>
> -Chris
>
>
> Thanks again for all your time here.
>
> -Justin
>
> On Thu, Jul 28, 2016 at 11:28 AM, Chris Bieneman <beanz at apple.com>
wrote:
>
>
> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com>
wrote:
>
> Thanks again for your thoughts, Chris.
>
> As a straw man I would suggest the following criteria for inclusion into
the
> mono-repo:
>
> (1) Projects in the mono-repo must be tightly coupled to specific versions
> or commits of other projects in the mono-repo
>
>
> I'm fine with that, fwiw.  That was in fact the original proposal.
>
>
> That is the wording of the original proposal, but I disagree that it is the
> content of the original proposal. I don’t believe that Compiler-RT is
> tightly coupled to LLVM at all, which is a big source of my disagreement
> here.
>
> I'm also fine if we decide to put everything inside the monorepo.  I
> think Richard Smith had some good arguments for why they belong
> together.
>
> But I am really surprised that you think this is such a big deal that
> you would object to the whole monorepo if this decision doesn't go
> your way.
>
>
> I really hate your phrasing on this. I’m not objecting to this proposal
just
> because some minor decision doesn’t go my way. I think this is a very
> crucial point of whether or not the monorepo solution’s benefit outweighs
> its cost.
>
> The decision of whether or not to include these projects
> affects only read-write consumers of these projects -- of which there
> are relatively few people.
>
>
> Maybe there are few, but the impact is non-insignificant. Also I think the
> opinions of the read-write consumers of the sub-projects being included
> should count for a lot, and as a read-write consumer I don’t like this
> proposal if it includes the runtime libraries.
>
> Read-only consumers *are entirely
> unaffected by the decision*, as they can continue to use the read-only
> subproject mirrors exactly as today.
>
>
> The existence of subproject mirrors requires someone to write and maintain
> the tooling to keep those mirrors updated, and those mirrors will have all
> the technical hurdles and drawbacks that a submodule repository would have.
>
> The question here is: Do you make downstream single project users work off
> potentially unreliable mirrors, or do you make the people who need a
> mono-repo experience work off a potentially unreliable submodule repo?
>
> I think the only answer anyone can reasonably give to this is that we don’t
> have enough information to make a reasonable decision that maximizes the
> benefits to most users while minimizing the adverse impacts. Hence why I
> keep saying we need a survey to understand how *people* interact with the
> project and what kinds of workflows are important. I emphasize the word
> “people” in that last sentence because this decision impacts the
> contributors to the community, and downstream users. We need to take all
> perspectives into account when making this kind of infrastructure decision.
>
>
> (2) The projects in the mono-repo most provide wide benefit to the
community
> such that the overall community benefit outweighs the impacts of the
project
> being in the repo
> (3) Projects in the mono-repo must conform to some defined set of
standards.
> LLVM’s coding standards might be a bit much, but something along those
> lines.
>
>
> Would you mind explaining why you think the criteria for inclusion in
> the monorepo should be different than the criteria for inclusion as an
> LLVM subproject?
>
>
> For starters, including things as LLVM subproject doesn’t require that they
> meet criteria #1 in my proposal. Simply put, they don’t need to be tightly
> coupled to LLVM. We have many examples of that.
>
>
> I think these are fine criteria -- for inclusion of code as an LLVM
> subproject.  But it seems to me -- and maybe I'm wrong -- that the
> reason you're proposing them is that there exist today LLVM
> subprojects that are version-locked to other projects but you think do
> not meet these criteria, and therefore you want to exclude them from
> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
> your list above.
>
> I understand that lldb is persona non grata in some circles.  But.
> It's not right to use the source code migration as a tool to revisit
> an old decision like this.  That is procedurally unjust.  The relevant
> decision should be, "is LLDB an LLVM subproject that is version-locked
> to other subprojects, or not?”
>
>
> I really don’t want to debate LLDB. It is a hot issue for a lot of people,
> and I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.
>
> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I
> can tell, abandoned, but it is still an LLVM project that is tightly
coupled
> to LLVM versions. So it meets criteria #1. I think it fails to meet
criteria
> #2 because DragonEgg is basically abandoned and provides no real value to
> the community. Even though the burden of a dead project on the mono-repo is
> minuscule, I think there is no good reason to include DragonEgg.
>
> Do you disagree?
>
>
> If you feel strongly that we should reevaluate every project on the
> basis of these last two criteria before including them in the
> monorepo, would you mind elaborating on what exactly are the harms of
> including a project that isn't up to snuff?
>
>
> Every project that is added to the mono-repo will incur a small cost to
> developers in terms of the size it adds to the repository, and the tooling
> or workflow adjustments to handle the change. In most cases this will be
> minimal, even negligible. However I think the burden on runtime developers
> is significant.
>
> If you are aesthetically
> displeased by a project, you can hide it using sparse checkouts.  And
> nobody is going to make you build it.  At that point, the only cost I
> can think of from including a project is the bytes on disk.  But since
> the full history of all LLVM subprojects (excluding test-suite) is
> 500mb (*), surely you're not going to argue for the exclusion of (say)
> lldb on the grounds of saving 25mb (or whatever)?
>
>
> I won’t argue over lldb at all. My arguments are from the perspective of
> someone working on the runtime library projects, the burden is significant
> to be included in the llvm mono-repo. While the full history of LLVM is
> around 500MB, the full history of *all* the runtime projects is less than
> 100MB. Developers working on libcxx or compiler-rt should not need to clone
> LLVM, and run commands to do sparse checkouts. That is more burden than we
> should incur. Further the setup cost of doing multiple sparse checkouts in
> order to approximate the workflows we have today with decoupled projects
is,
> IMO, unnecessary and unreasonable.
>
> Those arguments go away if you follow criteria that exclude runtime
projects
> from the mono-repo.
>
> -Chris
>
>
> -Justin
>
> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
>
> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
> <llvm-dev at lists.llvm.org> wrote:
>
> This does not apply to libc++.  We support building the entire LLVM suite
> with other C++ standard library implementations (at least libstdc++, and I
> think also with Visual Studio’s implementation), so there is no dependency
> of anything on libc++.  Similarly, we support building libc++ with other
> compilers (in FreeBSD, we currently build it with gcc 6.1 for RISC-V, for
> example, where the LLVM toolchain is not quite useable).
>
> The same applies to libunwind, to an even greater degree (where libc++
> implements a standard API, libunwind implements a standard ABI).
>
>
> I think the dependencies of lib* in LLVM are more conceptual than version
> lock, but they're still there.
>
> I agree with you in all other points, mind you, but RT needs an unwind
> library as much as it needs clang. Without them, RT "can" (and
indeed does)
> work, but we're not providing a complete solution.
>
> I won't *push* to bundle libunwind, libcxxabi (and ultimately libcxx)
on
> those merits alone, but my opinion is that we should. I can't see much
use
> in RT without them. That's why we're still defaulting to libgcc on
Linux.
>
> Renato, I just want to point out that the Compiler-RT story is *WAY* more
> complicated than it might seem from your comments here. Compiler-RT is
> really two or three conceptually different things that happen to be in the
> same project, and parts of it are very useful without libunwind, libcxxabi,
> and libcxx.
>
> For example, the Compiler-RT sanitizers are used with GCC and libgcc. They
> can be built to be used with libstdc++ as well as libc++ (although I do
> think that loses some features).
>
> I would not object to a mono-repo that included LLVM, Clang, LLD, and
> Clang-Tools-Extra. I strongly object to any mono-repo that includes any of
> the runtime library projects. I also think that once you move away from the
> “mono-repo including all” you need to identify criteria for how you
> determine which projects get included, and potentially how you evaluate
> adding projects to the mono-repo.
>
> As a straw man I would suggest the following criteria for inclusion into
the
> mono-repo:
>
> (1) Projects in the mono-repo must be tightly coupled to specific versions
> or commits of other projects in the mono-repo
> (2) The projects in the mono-repo most provide wide benefit to the
community
> such that the overall community benefit outweighs the impacts of the
project
> being in the repo
> (3) Projects in the mono-repo must conform to some defined set of
standards.
> LLVM’s coding standards might be a bit much, but something along those
> lines.
>
> Thoughts?
>
> -Chris
>
> My tuppence.
>
> Cheers,
> Renato
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>

Chris Bieneman via llvm-dev

2016-Jul-28 21:12 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

It is worth pointing out the Jenkins job that runs that is a playground I setup
for myself. It is nowhere near production ready, and it will fail frequently as
I iterate messing around with it.

-Chris
> On Jul 28, 2016, at 1:47 PM, Renato Golin <renato.golin at
linaro.org> wrote:
> 
> On 28 July 2016 at 21:41, Chris Bieneman <beanz at apple.com> wrote:
>> Ok. Money where my mouth is time.
>> Submodule repo:
>> https://github.com/llvm-beanz/llvm-submodules
>> 
>> Bot auto-updating it:
>> http://beanz-bot.com:8180/jenkins/job/submodule-update/
>> 
>> If we go down this path improvements can be made to the bot so that
each
>> submodule update commit only includes one submodule update. That would
be
>> fairly simple to add.
> 
> Nice! Thanks Chris!
> 
> I'll update the GitHubSubModules proposal with those links.
> 
> cheers,
> --renato

Chris Bieneman via llvm-dev

2016-Jul-28 21:20 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 2:07 PM, Justin Lebar <jlebar at google.com>
wrote:
> 
> Chris,
> 
> What I notice in your latest e-mail -- and I don't know if this is
> intentional, so sorry if I'm reading too much into it -- is that the
> language has switched from "an unwarranted and unacceptable
burden" to
> "a burden”:
I consider it unwarranted and unacceptable to me, however that is based on a
context of my workflows. Without a wider context I cannot know if the burden to
me is justifiable by benefits to the wider community.

I’m trying to be explicit here. I’m stating my opinions and perspectives. There
are virtually no facts to throw around in these conversations, and I’m trying
not to put words in other people’s mouths (I hope I’m not failing at this).
> 
>> Also, yes, an extra 400 MB of disk space when the repository for libcxx
is only ~20 MB is a big deal to me. You’re not talking about a 10% or 20%
increase in repository size, you’re talking about a 20x increase in repository
size. That is a burden.
>> 
>> To me, needing to run a script to do sparse checkouts is also a burden.
Similarly I think that running a script to bisect a submodule repository (which
is my proposal) is also a burden.
> 
> I have to admit that I'm pleased to see this softening of language,
> because it's much easier for me to agree with this.  Yes, 400mb has a
> nonzero cost (although, to nitpick, I don't think that the
> multiplicative increase in space is germane).  Yes, running a script
> has a nonzero cost.  Totally agree.
Why is it not relevant that runtime developers who don’t need LLVM would se a
multiplicative increase in repository size? This would significantly increase
fetch and clone times at the very least.
> 
> It sounds like we also agree that these costs should be weighed
> against the benefits accrued to others, and that furthermore that
> we'll ultimately want to get wider input from the community about how
> much to weigh (to use synecdoche) your 400mb versus my workflow
> convenience.
Yes, and there is a balancing to be had across multiple proposals.
> 
>> The only thing a monorepo gets you that strictly isn’t possible without
it is the ability to commit to multiple projects in a single commit. Personally
I don’t think that is a big enough justification, but that is my opinion, not a
fact.
> 
> When pushing a set of multiple patches to one subproject, we very
> explicitly want every patch in the sequence to build.  So to me, it
> seems like we should want this same property for changes that affect
> multiple subprojects.  Choosing a repository structure such that it's
> impossible to achieve this in general seems Bad.
> 
> But that's just me.
Maybe, but the Clang project has lived with this limitation for its entire
history, and many projects built on LLVM and Clang have the same problems. I’m
not convinced that this is a problem that we need to solve.
> 
>> While it is true that Clang developers may want or need the runtime
libraries, the runtime library developers frequently don’t need clang. I really
don’t want a solution that makes the lives of Clang developers easier at the
expense of other subprojects unless it is strictly necessary and for a common
“greater good”.
> 
> I too want to work towards a common greater good.  We may disagree
> about "strictly necessary", but maybe we can set that aside for
now.
> 
> It does seem that, although you may not be crazy about the monorepo,
> you wouldn't come out swinging against it if it didn't include the
> runtime libraries.  I'd call that major progress.
This is a fair characterization of my feelings. If the mono-repo excluded
libcxx, libcxxabi, compiler-rt, libunwind, and parallel-libs I still wouldn’t
support it because I prefer the submodule approach, but I wouldn’t be as
strongly opposed.

-Chris
> 
> On Thu, Jul 28, 2016 at 1:41 PM, Chris Bieneman <beanz at apple.com>
wrote:
>> 
>> On Jul 28, 2016, at 12:05 PM, Justin Lebar <jlebar at google.com>
wrote:
>> 
>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>> 
>> 
>> Maybe there are few, but the impact is non-insignificant. Also I think
the
>> opinions of the read-write consumers of the sub-projects being included
>> should count for a lot
>> 
>> 
>> I agree.
>> 
>> as a read-write consumer I don’t like this proposal if it includes the
>> runtime libraries.
>> 
>> 
>> Point well-taken.
>> 
>> The existence of subproject mirrors requires someone to write and
maintain
>> the tooling to keep those mirrors updated,
>> 
>> 
>> I think you will find on this thread no shortage of people willing to
>> maintain said mirrors in exchange for getting a monorepo as the
>> canonical source of truth.
>> 
>> 
>> Ok. Money where my mouth is time.
>> 
>> Submodule repo:
>> 
>> https://github.com/llvm-beanz/llvm-submodules
>> 
>> Bot auto-updating it:
>> 
>> http://beanz-bot.com:8180/jenkins/job/submodule-update/
>> 
>> If we go down this path improvements can be made to the bot so that
each
>> submodule update commit only includes one submodule update. That would
be
>> fairly simple to add.
>> 
>> 
>> and those mirrors will have all the technical hurdles and drawbacks
that a
>> submodule repository would have.
>> 
>> 
>> I don't understand this.  The point of the mirrors is to allow
people
>> to use a read-only multirepo workflow.  I agree that if one chose to
>> do so, one would bite all of the drawbacks of a multirepo workflow,
>> but...that's the point?  Maybe I'm missing something.
>> 
>> 
>> What I’m referring to is that since we don’t have the ability to run
>> server-side hooks on github the submodule repositories will have some
>> complications because they can’t automatically be updated, and the
>> infrastructure to do so would have multiple points of failure.
>> 
>> This limitation in github hosting was discussed in at least one of the
>> github related threads.
>> 
>> 
>> The question here is: Do you make downstream single project users work
off
>> potentially unreliable mirrors, or do you make the people who need a
>> mono-repo experience work off a potentially unreliable submodule repo?
>> 
>> 
>> I agree with the gist of this question, but I want to refine the
>> trade-off a bit.
>> 
>> With a monorepo, downstream single-project users actually have two
>> options.  They can work off the mirrors, or they can just download the
>> whole thing.  So with the monorepo, downstream single-project users
>> are not forced to work off noncanonical mirrors.  They are only
>> "forced" to do so if they are unable or unwilling to download
a 500mb
>> repo and throw away most of it.  Which I think may actually be
>> relatively few people.  But what do I know?
>> 
>> 
>> I think we have evidence that many of our projects are used in
isolation by
>> relatively large numbers of users. Whether or not those users would be
>> sufficiently inconvenienced to do something about a mono-repo is a
harder
>> thing to know.
>> 
>> In the submodule approach this isn’t really an issue because users will
>> continue to work as they always have with the per-project repositories,
and
>> the developers who need bisecting capabilities can clone the submodule
repo,
>> which can also be used as read-write for making changes to the
subprojects.
>> 
>> 
>> Anyway my answer to this question has been and still is, that a
>> monorepo is strictly more powerful than a multirepo.
>> 
>> 
>> For one thing, we can atomically commit across subprojects using a
>> monorepo.  On IRC I've had a bunch of people just begging me for
this.
>> 
>> Putative scripts that allow monorepo users to commit to the multirepo
>> would not be able to translate cross-cutting commits into a single
>> commit in the umbrella repository without cooperation from the script
>> that translates commits to the multirepos into commits in the umbrella
>> repository (that's the one that contains all the multirepos as git
>> subrepositories).  It's possible -- it's turing complete --,
but it
>> would be very complicated.
>> 
>> Still more complicated would be writing a script that would allow
>> monorepo users to push to putative try bots that are based off the
>> multirepo.  Again anything is possible, but I have written and
>> maintained similar software in the past (for a significantly simpler
>> setup) and it was fragile as heck, and again this is going to require
>> extensive cooperation between us and the multirepo --> umbrella repo
>> script.
>> 
>> 
>> For cross-repository changes I am fairly certain you could construct
>> something that can be pushed to a try bot based on the submodule
repository.
>> There is no technical reason that shouldn’t work, and I don’t even
think the
>> scripting around that would be terribly complicated. Admittedly that is
more
>> complicated than just writing a pull request to a single repository,
but I
>> suspect not much. I may look into that.
>> 
>> 
>> In contrast, as discussed earlier, if people want a multirepo-like
>> setup based on the monorepo, we can reduce this to a single command
>> run once when the repository is cloned.  It ends up being far less
>> fragile, and requiring far fewer (actually, zero) tricks on the server
>> side.
>> 
>> 
>> The only thing a monorepo gets you that strictly isn’t possible without
it
>> is the ability to commit to multiple projects in a single commit.
Personally
>> I don’t think that is a big enough justification, but that is my
opinion,
>> not a fact.
>> 
>> 
>> Instead, let’s talk about DragonEgg.
>> 
>> 
>> +1.
>> 
>> The DragonEgg project is, as far as I can tell, abandoned, but it is
still
>> an LLVM project that is tightly coupled to LLVM versions. So it meets
>> criteria #1. I think it fails to meet criteria #2 because DragonEgg is
>> basically abandoned and provides no real value to the community. Even
though
>> the burden of a dead project on the mono-repo is minuscule, I think
there is
>> no good reason to include DragonEgg.
>> 
>> 
>> If DragonEgg is abandoned, I think we should keep the history in our
>> repository and just delete it from head.
>> 
>> My argument for keeping it in our history is: Suppose we go with a
>> monorepo, and suppose at some point in the future, some other LLVM
>> project -- say, lld -- became abandoned.  Would we rewrite our
>> monorepo history to erase all trace of lld, because it no longer
>> provides value to us?
>> 
>> No, right?  lld's history is part of our history.  We'd just
delete it
>> from head and move on with our lives.
>> 
>> My arguments are from the perspective of someone working on the runtime
>> library projects, the burden is significant to be included in the llvm
>> mono-repo. While the full history of LLVM is around 500MB, the full
history
>> of *all* the runtime projects is less than 100MB.  Developers working
on
>> libcxx or compiler-rt should not need to clone LLVM, and run commands
to do
>> sparse checkouts. That is more burden than we should incur. Further the
>> setup cost of doing multiple sparse checkouts in order to approximate
the
>> workflows we have today with decoupled projects is, IMO, unnecessary
and
>> unreasonable.
>> 
>> 
>> OK, just to make sure I understand your point here, because this is
>> important, you are saying that you object to including libcxx and
>> compiler-rt in the llvm monorepo because:
>> 
>> * It would consume an additional ~400mb of disk space, and
>> * It's unnecessary and unreasonable to ask libcxx etc. developers
to
>> run a script when they check out the monorepo if they want a sparse
>> checkout and/or a setup that mirrors the multirepo.
>> 
>> 
>> I'm not trying to put words in your mouth or subtly change what
you're
>> saying, so please let me know if I didn't get that right.
>> 
>> 
>> I have a lot of arguments against the runtime libraries being included.
>> First and foremost, they don’t meet the “tightly coupled” criteria.
Also,
>> yes, an extra 400 MB of disk space when the repository for libcxx is
only
>> ~20 MB is a big deal to me. You’re not talking about a 10% or 20%
increase
>> in repository size, you’re talking about a 20x increase in repository
size.
>> That is a burden.
>> 
>> To me, needing to run a script to do sparse checkouts is also a burden.
>> Similarly I think that running a script to bisect a submodule
repository
>> (which is my proposal) is also a burden. I can’t judge which burden is
more
>> significant because I don’t know how many people bisect. What I can say
is
>> that it is my belief that I’m not the only person who works on runtime
>> projects in isolation. As a potential example (because I don’t want to
put
>> words into anyone’s mouth), Marshal Clow and Eric Fiselier have made *a
ton*
>> of contributions to libcxx over the last year, but neither of them are
>> frequent contributors to LLVM or Clang.
>> 
>> While it is true that Clang developers may want or need the runtime
>> libraries, the runtime library developers frequently don’t need clang.
I
>> really don’t want a solution that makes the lives of Clang developers
easier
>> at the expense of other subprojects unless it is strictly necessary and
for
>> a common “greater good”.
>> 
>> -Chris
>> 
>> 
>> Thanks again for all your time here.
>> 
>> -Justin
>> 
>> On Thu, Jul 28, 2016 at 11:28 AM, Chris Bieneman <beanz at
apple.com> wrote:
>> 
>> 
>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com>
wrote:
>> 
>> Thanks again for your thoughts, Chris.
>> 
>> As a straw man I would suggest the following criteria for inclusion
into the
>> mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific
versions
>> or commits of other projects in the mono-repo
>> 
>> 
>> I'm fine with that, fwiw.  That was in fact the original proposal.
>> 
>> 
>> That is the wording of the original proposal, but I disagree that it is
the
>> content of the original proposal. I don’t believe that Compiler-RT is
>> tightly coupled to LLVM at all, which is a big source of my
disagreement
>> here.
>> 
>> I'm also fine if we decide to put everything inside the monorepo. 
I
>> think Richard Smith had some good arguments for why they belong
>> together.
>> 
>> But I am really surprised that you think this is such a big deal that
>> you would object to the whole monorepo if this decision doesn't go
>> your way.
>> 
>> 
>> I really hate your phrasing on this. I’m not objecting to this proposal
just
>> because some minor decision doesn’t go my way. I think this is a very
>> crucial point of whether or not the monorepo solution’s benefit
outweighs
>> its cost.
>> 
>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>> 
>> 
>> Maybe there are few, but the impact is non-insignificant. Also I think
the
>> opinions of the read-write consumers of the sub-projects being included
>> should count for a lot, and as a read-write consumer I don’t like this
>> proposal if it includes the runtime libraries.
>> 
>> Read-only consumers *are entirely
>> unaffected by the decision*, as they can continue to use the read-only
>> subproject mirrors exactly as today.
>> 
>> 
>> The existence of subproject mirrors requires someone to write and
maintain
>> the tooling to keep those mirrors updated, and those mirrors will have
all
>> the technical hurdles and drawbacks that a submodule repository would
have.
>> 
>> The question here is: Do you make downstream single project users work
off
>> potentially unreliable mirrors, or do you make the people who need a
>> mono-repo experience work off a potentially unreliable submodule repo?
>> 
>> I think the only answer anyone can reasonably give to this is that we
don’t
>> have enough information to make a reasonable decision that maximizes
the
>> benefits to most users while minimizing the adverse impacts. Hence why
I
>> keep saying we need a survey to understand how *people* interact with
the
>> project and what kinds of workflows are important. I emphasize the word
>> “people” in that last sentence because this decision impacts the
>> contributors to the community, and downstream users. We need to take
all
>> perspectives into account when making this kind of infrastructure
decision.
>> 
>> 
>> (2) The projects in the mono-repo most provide wide benefit to the
community
>> such that the overall community benefit outweighs the impacts of the
project
>> being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of
standards.
>> LLVM’s coding standards might be a bit much, but something along those
>> lines.
>> 
>> 
>> Would you mind explaining why you think the criteria for inclusion in
>> the monorepo should be different than the criteria for inclusion as an
>> LLVM subproject?
>> 
>> 
>> For starters, including things as LLVM subproject doesn’t require that
they
>> meet criteria #1 in my proposal. Simply put, they don’t need to be
tightly
>> coupled to LLVM. We have many examples of that.
>> 
>> 
>> I think these are fine criteria -- for inclusion of code as an LLVM
>> subproject.  But it seems to me -- and maybe I'm wrong -- that the
>> reason you're proposing them is that there exist today LLVM
>> subprojects that are version-locked to other projects but you think do
>> not meet these criteria, and therefore you want to exclude them from
>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
>> your list above.
>> 
>> I understand that lldb is persona non grata in some circles.  But.
>> It's not right to use the source code migration as a tool to
revisit
>> an old decision like this.  That is procedurally unjust.  The relevant
>> decision should be, "is LLDB an LLVM subproject that is
version-locked
>> to other subprojects, or not?”
>> 
>> 
>> I really don’t want to debate LLDB. It is a hot issue for a lot of
people,
>> and I’d really prefer if we didn’t start a “let’s all rag on lldb”
thread.
>> 
>> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far
as I
>> can tell, abandoned, but it is still an LLVM project that is tightly
coupled
>> to LLVM versions. So it meets criteria #1. I think it fails to meet
criteria
>> #2 because DragonEgg is basically abandoned and provides no real value
to
>> the community. Even though the burden of a dead project on the
mono-repo is
>> minuscule, I think there is no good reason to include DragonEgg.
>> 
>> Do you disagree?
>> 
>> 
>> If you feel strongly that we should reevaluate every project on the
>> basis of these last two criteria before including them in the
>> monorepo, would you mind elaborating on what exactly are the harms of
>> including a project that isn't up to snuff?
>> 
>> 
>> Every project that is added to the mono-repo will incur a small cost to
>> developers in terms of the size it adds to the repository, and the
tooling
>> or workflow adjustments to handle the change. In most cases this will
be
>> minimal, even negligible. However I think the burden on runtime
developers
>> is significant.
>> 
>> If you are aesthetically
>> displeased by a project, you can hide it using sparse checkouts.  And
>> nobody is going to make you build it.  At that point, the only cost I
>> can think of from including a project is the bytes on disk.  But since
>> the full history of all LLVM subprojects (excluding test-suite) is
>> 500mb (*), surely you're not going to argue for the exclusion of
(say)
>> lldb on the grounds of saving 25mb (or whatever)?
>> 
>> 
>> I won’t argue over lldb at all. My arguments are from the perspective
of
>> someone working on the runtime library projects, the burden is
significant
>> to be included in the llvm mono-repo. While the full history of LLVM is
>> around 500MB, the full history of *all* the runtime projects is less
than
>> 100MB. Developers working on libcxx or compiler-rt should not need to
clone
>> LLVM, and run commands to do sparse checkouts. That is more burden than
we
>> should incur. Further the setup cost of doing multiple sparse checkouts
in
>> order to approximate the workflows we have today with decoupled
projects is,
>> IMO, unnecessary and unreasonable.
>> 
>> Those arguments go away if you follow criteria that exclude runtime
projects
>> from the mono-repo.
>> 
>> -Chris
>> 
>> 
>> -Justin
>> 
>> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
>> 
>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> This does not apply to libc++.  We support building the entire LLVM
suite
>> with other C++ standard library implementations (at least libstdc++,
and I
>> think also with Visual Studio’s implementation), so there is no
dependency
>> of anything on libc++.  Similarly, we support building libc++ with
other
>> compilers (in FreeBSD, we currently build it with gcc 6.1 for RISC-V,
for
>> example, where the LLVM toolchain is not quite useable).
>> 
>> The same applies to libunwind, to an even greater degree (where libc++
>> implements a standard API, libunwind implements a standard ABI).
>> 
>> 
>> I think the dependencies of lib* in LLVM are more conceptual than
version
>> lock, but they're still there.
>> 
>> I agree with you in all other points, mind you, but RT needs an unwind
>> library as much as it needs clang. Without them, RT "can"
(and indeed does)
>> work, but we're not providing a complete solution.
>> 
>> I won't *push* to bundle libunwind, libcxxabi (and ultimately
libcxx) on
>> those merits alone, but my opinion is that we should. I can't see
much use
>> in RT without them. That's why we're still defaulting to libgcc
on Linux.
>> 
>> Renato, I just want to point out that the Compiler-RT story is *WAY*
more
>> complicated than it might seem from your comments here. Compiler-RT is
>> really two or three conceptually different things that happen to be in
the
>> same project, and parts of it are very useful without libunwind,
libcxxabi,
>> and libcxx.
>> 
>> For example, the Compiler-RT sanitizers are used with GCC and libgcc.
They
>> can be built to be used with libstdc++ as well as libc++ (although I do
>> think that loses some features).
>> 
>> I would not object to a mono-repo that included LLVM, Clang, LLD, and
>> Clang-Tools-Extra. I strongly object to any mono-repo that includes any
of
>> the runtime library projects. I also think that once you move away from
the
>> “mono-repo including all” you need to identify criteria for how you
>> determine which projects get included, and potentially how you evaluate
>> adding projects to the mono-repo.
>> 
>> As a straw man I would suggest the following criteria for inclusion
into the
>> mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific
versions
>> or commits of other projects in the mono-repo
>> (2) The projects in the mono-repo most provide wide benefit to the
community
>> such that the overall community benefit outweighs the impacts of the
project
>> being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of
standards.
>> LLVM’s coding standards might be a bit much, but something along those
>> lines.
>> 
>> Thoughts?
>> 
>> -Chris
>> 
>> My tuppence.
>> 
>> Cheers,
>> Renato
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>>

Renato Golin via llvm-dev

2016-Jul-28 22:19 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com>
wrote:> It is worth pointing out the Jenkins job that runs that is a playground I
setup for myself. It is nowhere near production ready, and it will fail
frequently as I iterate messing around with it.
Sure, I think that's implied.

cheers,
--renato

Michael Gottesman via llvm-dev

2016-Jul-29 02:50 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 28, 2016, at 6:23 PM, Lang Hames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Aaaand I'm (mostly) caught up. Phew.
> 
> FWIW Chris B is right: I had been put off commenting on this thread by the
length, and the number of git discussions that have come before this. He
convinced me to make the effort to put my 2 cents in though - thanks Chris.
I am also having issues attempting to follow this thread (it is huge and filled
with... distractions).

I would appreciate it if someone would put together a succint proposal that
strips out the rest of the thread (unless I have missed it). This would help to
center the discussion and involve more people in the discussion.

From a quick skim, it definitely seems like this whole proposal is about
optimizing for a specific use-case, producing a clang based toolchain. But I
really can not say without spending an hour or two reading the whole thread.

Thanks,
Michael
> 
> So - for my use-case I don't have strong feelings one way or the other*
<https://www.youtube.com/watch?v=fpaQpyU_QiM>. That said, something about
the discussion so far strikes me as dissonant: If we're going to break out
some sub-projects (the test-suite for licensing reasons, the runtimes for
modularity) then it's not really a mono-repo any more. It's a multi-repo
where we've collapsed some (but not all) of the existing repos. To the
extent that we have to build tooling to support multiple-repos (auto-mergers for
test bots, command line utils for devs who want the main repo plus tests plus
...), could we re-use that to keep the existing modular project setup? This
might be a fairly low-benefit proposition if the tools we develop were only
usable by in-tree projects, but there are many other users of LLVM (Swift leaps
to mind since I'm at Apple, but there are many others) who might appreciate
the ability to use LLVM-provided tools to pick-and-mix LLVM projects into their
repos. Otherwise, every downstream user will have to roll some version of these
tools themselves.
> 
> - Lang.
> 
> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com
<mailto:beanz at apple.com>> wrote:
> > It is worth pointing out the Jenkins job that runs that is a
playground I setup for myself. It is nowhere near production ready, and it will
fail frequently as I iterate messing around with it.
> 
> Sure, I think that's implied.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/be1f010a/attachment-0001.html>

Justin Lebar via llvm-dev

2016-Jul-29 03:28 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> I would appreciate it if someone would put together a succint proposalthat strips out the rest of the thread (unless I have missed it). This
would help to center the discussion and involve more people in the
discussion.

Majnemer is on it.  Hope to get this for you soon.

-Justin
> From a quick skim, it definitely seems like this whole proposal is aboutoptimizing for a specific use-case, producing a clang based toolchain. But
I really can not say without spending an hour or two reading the whole
thread.>
> Thanks,
> Michael
>
>>
>> So - for my use-case I don't have strong feelings one way or the
other*.That said, something about the discussion so far strikes me as dissonant:
If we're going to break out some sub-projects (the test-suite for licensing
reasons, the runtimes for modularity) then it's not really a mono-repo any
more. It's a multi-repo where we've collapsed some (but not all) of the
existing repos. To the extent that we have to build tooling to support
multiple-repos (auto-mergers for test bots, command line utils for devs who
want the main repo plus tests plus ...), could we re-use that to keep the
existing modular project setup? This might be a fairly low-benefit
proposition if the tools we develop were only usable by in-tree projects,
but there are many other users of LLVM (Swift leaps to mind since I'm at
Apple, but there are many others) who might appreciate the ability to use
LLVM-provided tools to pick-and-mix LLVM projects into their repos.
Otherwise, every downstream user will have to roll some version of these
tools themselves.>>
>> - Lang.
>>
>> On Thu, Jul 28, 2016 at 3:19 PM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:>>>
>>> On 28 July 2016 at 22:12, Chris Bieneman <beanz at apple.com>
wrote:
>>> > It is worth pointing out the Jenkins job that runs that is aplayground I setup for myself. It is nowhere near production ready, and it
will fail frequently as I iterate messing around with
it.>>>
>>> Sure, I think that's implied.
>>>
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/fae0cc30/attachment.html>

Dean Michael Berris via llvm-dev

2016-Jul-29 08:00 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 29 Jul 2016, at 06:41, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Jul 28, 2016, at 12:05 PM, Justin Lebar <jlebar at google.com>
wrote:
>> 
> 
> For cross-repository changes I am fairly certain you could construct
something that can be pushed to a try bot based on the submodule repository.
There is no technical reason that shouldn’t work, and I don’t even think the
scripting around that would be terribly complicated. Admittedly that is more
complicated than just writing a pull request to a single repository, but I
suspect not much. I may look into that.
> 
>> 
>> In contrast, as discussed earlier, if people want a multirepo-like
>> setup based on the monorepo, we can reduce this to a single command
>> run once when the repository is cloned.  It ends up being far less
>> fragile, and requiring far fewer (actually, zero) tricks on the server
>> side.
> 
> The only thing a monorepo gets you that strictly isn’t possible without it
is the ability to commit to multiple projects in a single commit. Personally I
don’t think that is a big enough justification, but that is my opinion, not a
fact.
> 
As someone who's recently had to change LLVM, Clang, and compiler-rt with
interlocking interdependent changes (and I suspect a lot more people do this
than just me), I would offer a dissenting opinion. It's *too hard* and *too
much work* to get cross-cutting changes like these to vertical projects like
XRay which spans clang, llvm, and compiler-rt.

It will certainly advantage similar efforts (say, like the coroutines work) if
there was a single repo and changes to the runtime, front-end, and back-end just
happen normally -- and that _no acrobatics will be required to accomplish them_.

Consider things like the builtins library -- you add a new built-in intrinsic in
LLVM, then in the same commit have that builtin implemented in compiler-rt along
with the tests on both sides.

A mono-repo will be a strict improvement over the status quo or any other
formulation involving submodules.

Michael Gottesman via llvm-dev

2016-Jul-29 15:22 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

I agree with Renato here. From someone who is just beginning to participate in
this thread, the sheer amount of ad hominem argument thrown about is
disappointing and unhelpful. What we need is a specific proposal to center the
discussion and then line by line review that breaks out into (potentially) more
specific discussion on individual points if they are contentious.

Sent from my iPhone
> On Jul 29, 2016, at 7:52 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 29 July 2016 at 15:26, Robinson, Paul via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> I believe David Chisnall up-thread cited a difference in checkout times
>> on the order of a handful of seconds versus a couple of minutes.  While
>> naively it might seem not a big deal, over time and depending on what
you
>> are trying to do yes it can be a big burden.
> 
> TL;DR: This thread is dead. Let's move on.
> 
> I think the biggest fallacy in this thread is that changing process is
cheap.
> 
> It is certainly cheap for me to do "git foo" instead of "git
bar" from
> now on. It's moderately expensive to change my buildbot
> configurations, Zorg's builders and re-test everything for public CI.
> It's a lot more expensive to change how distributions build their
> hundreds of thousands of packages over multiple LTS releases, or how
> downstream users like Sony, Apple or ARM re-factor their entire build
> systems (which very likely link to a lot of non-LLVM stuff), and then
> some.
> 
> None of that is impossible, most of that is a "one off". Most of
the
> companies and big projects "could" afford to do that.
> 
> But there are two big points that people like me, Paul and David have
> been unsuccessfully trying to make obvious:
> 
> 1. Not every LLVM user is as big as FreeBSD, Sony or Apple. There are
> a lot of very interesting projects (hobbyists, academia, professional)
> using Clang, LLVM, libc++, etc. that don't have the staff to do that
> move. Being a hobbyist myself, I know too well that, when a library
> radically changes the way they behave (like boost did every new
> release about 10 years ago), I will stop using it.
> 
> 2. Changes in complex systems have unwanted larger consequences. Build
> systems are some of the most complex systems in existence because
> they're mostly irrational and patched together with duct tape and
> paper clips. What may be very simple for some build systems, could be
> impossible for others, and that's not the other's team's fault.
> 
> So, if you have a complex build system yourself, and you spent some
> time and have figured out that it would be easy, you *cannot* assume
> it should be easy for everyone with an less or equally complex build
> systems.
> 
> If you find it simple to change your own workflow towards this or that
> solution, you *cannot* assume everyone else should feel the same. This
> also doesn't diminish their intelligence or competence. Intelligent
> and competent people work in very different ways, and it's actually
> because of that fact that we can do such complex software works in a
> multitude of systems. If we were all equal, we wouldn't need to
> discuss anything. :)
> 
> Mehdi said very early, and repeated many time, on some of the threads,
> something to the effect of: "Saying how hard or easy it is for you is
> an invalid argument, we need more concrete facts".
> 
> I absolutely agree with that statement, but interpreting how easy or
> hard concrete facts would be fall on the same fallacy, so it doesn't
> bring us closer to consensus, it brings us closer to dissent.
> 
> That is why I think this thread has already surpassed it's usefulness
> (for a long time), and we need a concrete write up on the proposal. (I
> hear it's in progress, let's wait for it).
> 
> From now on, I'd propose the discussion to be *just* about this
> specific proposal, preferably over a Phabricator review on the
> document. People that have strong opinions about it should wait for
> the survey.
> 
> Just to reiterate, the survey is to collect opinions in a formal and
> non-passionate manner. It will not be a "majority vote", and
we're not
> locked between these two solutions as they're absolutely drawn out in
> the documents, nor we are forced to take any decision if the community
> is clearly split. The last think I want is to destroy part of the
> community while trying to make it better.
> 
> But this long thread is not doing any good either.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Michael Gottesman via llvm-dev

2016-Jul-29 15:51 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Additionally we should reach out to individual stakeholders and get real data
about:

1. Given the current workflow, what would it take to change to this different
workflow. Whether or not it is easy or hard should be left out. Just specific
details.

2. Once the workflow has been changed, how does this workflow change day by day
living for their users? Again this should be specific and a judgement of ease or
difficulty should be left out.

Without impartial data gathering, followed by compilation of the data, can an
effective discussion happen.

In fact I hope the proposal has an alternatives considered section that lists
all alternatives and a section that lists all impacts on other people, rather
than just the specific proposal.

Sent from my iPhone
> On Jul 29, 2016, at 8:22 AM, Michael Gottesman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I agree with Renato here. From someone who is just beginning to participate
in this thread, the sheer amount of ad hominem argument thrown about is
disappointing and unhelpful. What we need is a specific proposal to center the
discussion and then line by line review that breaks out into (potentially) more
specific discussion on individual points if they are contentious.
> 
> Sent from my iPhone
> 
>> On Jul 29, 2016, at 7:52 AM, Renato Golin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> On 29 July 2016 at 15:26, Robinson, Paul via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I believe David Chisnall up-thread cited a difference in checkout
times
>>> on the order of a handful of seconds versus a couple of minutes. 
While
>>> naively it might seem not a big deal, over time and depending on
what you
>>> are trying to do yes it can be a big burden.
>> 
>> TL;DR: This thread is dead. Let's move on.
>> 
>> I think the biggest fallacy in this thread is that changing process is
cheap.
>> 
>> It is certainly cheap for me to do "git foo" instead of
"git bar" from
>> now on. It's moderately expensive to change my buildbot
>> configurations, Zorg's builders and re-test everything for public
CI.
>> It's a lot more expensive to change how distributions build their
>> hundreds of thousands of packages over multiple LTS releases, or how
>> downstream users like Sony, Apple or ARM re-factor their entire build
>> systems (which very likely link to a lot of non-LLVM stuff), and then
>> some.
>> 
>> None of that is impossible, most of that is a "one off". Most
of the
>> companies and big projects "could" afford to do that.
>> 
>> But there are two big points that people like me, Paul and David have
>> been unsuccessfully trying to make obvious:
>> 
>> 1. Not every LLVM user is as big as FreeBSD, Sony or Apple. There are
>> a lot of very interesting projects (hobbyists, academia, professional)
>> using Clang, LLVM, libc++, etc. that don't have the staff to do
that
>> move. Being a hobbyist myself, I know too well that, when a library
>> radically changes the way they behave (like boost did every new
>> release about 10 years ago), I will stop using it.
>> 
>> 2. Changes in complex systems have unwanted larger consequences. Build
>> systems are some of the most complex systems in existence because
>> they're mostly irrational and patched together with duct tape and
>> paper clips. What may be very simple for some build systems, could be
>> impossible for others, and that's not the other's team's
fault.
>> 
>> So, if you have a complex build system yourself, and you spent some
>> time and have figured out that it would be easy, you *cannot* assume
>> it should be easy for everyone with an less or equally complex build
>> systems.
>> 
>> If you find it simple to change your own workflow towards this or that
>> solution, you *cannot* assume everyone else should feel the same. This
>> also doesn't diminish their intelligence or competence. Intelligent
>> and competent people work in very different ways, and it's actually
>> because of that fact that we can do such complex software works in a
>> multitude of systems. If we were all equal, we wouldn't need to
>> discuss anything. :)
>> 
>> Mehdi said very early, and repeated many time, on some of the threads,
>> something to the effect of: "Saying how hard or easy it is for you
is
>> an invalid argument, we need more concrete facts".
>> 
>> I absolutely agree with that statement, but interpreting how easy or
>> hard concrete facts would be fall on the same fallacy, so it
doesn't
>> bring us closer to consensus, it brings us closer to dissent.
>> 
>> That is why I think this thread has already surpassed it's
usefulness
>> (for a long time), and we need a concrete write up on the proposal. (I
>> hear it's in progress, let's wait for it).
>> 
>> From now on, I'd propose the discussion to be *just* about this
>> specific proposal, preferably over a Phabricator review on the
>> document. People that have strong opinions about it should wait for
>> the survey.
>> 
>> Just to reiterate, the survey is to collect opinions in a formal and
>> non-passionate manner. It will not be a "majority vote", and
we're not
>> locked between these two solutions as they're absolutely drawn out
in
>> the documents, nor we are forced to take any decision if the community
>> is clearly split. The last think I want is to destroy part of the
>> community while trying to make it better.
>> 
>> But this long thread is not doing any good either.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Bieneman via llvm-dev

2016-Jul-29 17:04 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 29, 2016, at 8:53 AM, Daniel Sanders via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> > > Even then, are we seriously ignoring the fact that even if you
did clone
> > > the whole repository including everything, that you can still
build just
> > > the libc++ and sanitiser runtimes if you wanted to?
> >
> > Is it that easy to build a subset of a large checked-out tree?  I
haven't
> > tried it but my impression is: not so much.
>  
> It's possible to disable subsets of an LLVM build by setting the
various 'LLVM_TOOL_*_BUILD' options to 'OFF' in cmake. For
example, 'LLVM_TOOL_COMPILER_RT_BUILD=OFF' will prevent the build for
projects/compiler-rt, and 'LLVM_TOOL_CLANG_TOOLS_EXTRA_BUILD=OFF' will
disable the tools/projects/clang/tools/extra (I've just double checked the
latter). IIRC, these are the variables that disable the build for projects that
aren't checked out.
>  
> Also, I haven’t tried this myself but we should still be able to do
standalone builds of particular projects by picking the initial CMakeLists.txt
like so:
>             cmake ../llvm/projects/compiler-rt
Just as a side point here. This only works for certain projects. The runtimes
projects all support standalone builds, as does Clang, and (I think) LLDB. LLD
and clang-tools-extra do not. I’m not sure about Polly or the new Parallel Libs
project.

One of the side discussions that would need to take place if the monorepo
approach is taken would be how people want the build system to behave. It could
be modified to disable all the currently optional subprojects and require build
options to enable them.

If we moved to a mono-repo where the projects are all side-by-side (as the
llvm-project is today) by default you would only get LLVM in your configuration.
You’d need to either symlink the other projects into the LLVM repository or
specify CMake options to enable them. Alternatively we could include at the root
of the mono-repo a CMakeLists file that setup all the sub-projects for
convenience, which seems reasonable to me.

-Chris
>  
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Bruce Hoult via llvm-dev
> Sent: 29 July 2016 15:52
> To: Robinson, Paul
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>  
>  
>  
> On Sat, Jul 30, 2016 at 2:26 AM, Robinson, Paul via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> > Even then, are we seriously ignoring the fact that even if you did
clone
> > the whole repository including everything, that you can still build
just
> > the libc++ and sanitiser runtimes if you wanted to?
> 
> Is it that easy to build a subset of a large checked-out tree?  I
haven't
> tried it but my impression is: not so much.  Certainly the advertised
> tactics for configuring/building don't tell you how to do that. 
Somebody
> figuring out what it takes would be very constructive here, instead of
> just asserting it can't possibly be that hard.
>  
> Right now, no. The build system assumes that if you checked someone out
then you want to build it.
>  
> This needs to change.
>  
>  I believe David Chisnall up-thread cited a difference in checkout times
> on the order of a handful of seconds versus a couple of minutes.  While
> naively it might seem not a big deal, over time and depending on what you
> are trying to do yes it can be a big burden
>  
> That's a one time cost, not every time you do an update.
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/db38b6b0/attachment-0001.html>

Michael Gottesman via llvm-dev

2016-Jul-30 06:56 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

I talked with Majnemer/Mehdi about developing this proposal on github. They said
that this was ok (after all we are moving to github). We can add facts to the
specific proposal via PRs which we can use to center the discussion.

I created a straw man repo and a scaffolding hacked from the swift-evolution
process for just this purpose. I hacked some words from jlebar's initial
email as just a starting point.

https://github.com/gottesmm/llvm-evolution/blob/master/proposals/0001-monorepo.md

What do you guys think?
Michael
> On Jul 29, 2016, at 8:51 AM, Michael Gottesman <mgottesman at
apple.com> wrote:
> 
> Additionally we should reach out to individual stakeholders and get real
data about:
> 
> 1. Given the current workflow, what would it take to change to this
different workflow. Whether or not it is easy or hard should be left out. Just
specific details.
> 
> 2. Once the workflow has been changed, how does this workflow change day by
day living for their users? Again this should be specific and a judgement of
ease or difficulty should be left out.
> 
> Without impartial data gathering, followed by compilation of the data, can
an effective discussion happen.
> 
> In fact I hope the proposal has an alternatives considered section that
lists all alternatives and a section that lists all impacts on other people,
rather than just the specific proposal.
> 
> Sent from my iPhone
> 
>> On Jul 29, 2016, at 8:22 AM, Michael Gottesman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> 
>> I agree with Renato here. From someone who is just beginning to
participate in this thread, the sheer amount of ad hominem argument thrown about
is disappointing and unhelpful. What we need is a specific proposal to center
the discussion and then line by line review that breaks out into (potentially)
more specific discussion on individual points if they are contentious.
>> 
>> Sent from my iPhone
>> 
>>> On Jul 29, 2016, at 7:52 AM, Renato Golin via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>> 
>>> On 29 July 2016 at 15:26, Robinson, Paul via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> I believe David Chisnall up-thread cited a difference in
checkout times
>>>> on the order of a handful of seconds versus a couple of
minutes.  While
>>>> naively it might seem not a big deal, over time and depending
on what you
>>>> are trying to do yes it can be a big burden.
>>> 
>>> TL;DR: This thread is dead. Let's move on.
>>> 
>>> I think the biggest fallacy in this thread is that changing process
is cheap.
>>> 
>>> It is certainly cheap for me to do "git foo" instead of
"git bar" from
>>> now on. It's moderately expensive to change my buildbot
>>> configurations, Zorg's builders and re-test everything for
public CI.
>>> It's a lot more expensive to change how distributions build
their
>>> hundreds of thousands of packages over multiple LTS releases, or
how
>>> downstream users like Sony, Apple or ARM re-factor their entire
build
>>> systems (which very likely link to a lot of non-LLVM stuff), and
then
>>> some.
>>> 
>>> None of that is impossible, most of that is a "one off".
Most of the
>>> companies and big projects "could" afford to do that.
>>> 
>>> But there are two big points that people like me, Paul and David
have
>>> been unsuccessfully trying to make obvious:
>>> 
>>> 1. Not every LLVM user is as big as FreeBSD, Sony or Apple. There
are
>>> a lot of very interesting projects (hobbyists, academia,
professional)
>>> using Clang, LLVM, libc++, etc. that don't have the staff to do
that
>>> move. Being a hobbyist myself, I know too well that, when a library
>>> radically changes the way they behave (like boost did every new
>>> release about 10 years ago), I will stop using it.
>>> 
>>> 2. Changes in complex systems have unwanted larger consequences.
Build
>>> systems are some of the most complex systems in existence because
>>> they're mostly irrational and patched together with duct tape
and
>>> paper clips. What may be very simple for some build systems, could
be
>>> impossible for others, and that's not the other's
team's fault.
>>> 
>>> So, if you have a complex build system yourself, and you spent some
>>> time and have figured out that it would be easy, you *cannot*
assume
>>> it should be easy for everyone with an less or equally complex
build
>>> systems.
>>> 
>>> If you find it simple to change your own workflow towards this or
that
>>> solution, you *cannot* assume everyone else should feel the same.
This
>>> also doesn't diminish their intelligence or competence.
Intelligent
>>> and competent people work in very different ways, and it's
actually
>>> because of that fact that we can do such complex software works in
a
>>> multitude of systems. If we were all equal, we wouldn't need to
>>> discuss anything. :)
>>> 
>>> Mehdi said very early, and repeated many time, on some of the
threads,
>>> something to the effect of: "Saying how hard or easy it is for
you is
>>> an invalid argument, we need more concrete facts".
>>> 
>>> I absolutely agree with that statement, but interpreting how easy
or
>>> hard concrete facts would be fall on the same fallacy, so it
doesn't
>>> bring us closer to consensus, it brings us closer to dissent.
>>> 
>>> That is why I think this thread has already surpassed it's
usefulness
>>> (for a long time), and we need a concrete write up on the proposal.
(I
>>> hear it's in progress, let's wait for it).
>>> 
>>> From now on, I'd propose the discussion to be *just* about this
>>> specific proposal, preferably over a Phabricator review on the
>>> document. People that have strong opinions about it should wait for
>>> the survey.
>>> 
>>> Just to reiterate, the survey is to collect opinions in a formal
and
>>> non-passionate manner. It will not be a "majority vote",
and we're not
>>> locked between these two solutions as they're absolutely drawn
out in
>>> the documents, nor we are forced to take any decision if the
community
>>> is clearly split. The last think I want is to destroy part of the
>>> community while trying to make it better.
>>> 
>>> But this long thread is not doing any good either.
>>> 
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2016-Jul-30 09:28 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Michael,

Can you guys do a similar format and location as the other one?

http://llvm.org/docs/Proposals/GitHubSubMod.html

There are still changes that may be made to this one, but we should keep
their formats, location and contents similar, to help with the comparison.

Cheers,
Renato

On 30 Jul 2016 7:56 a.m., "Michael Gottesman" <mgottesman at
apple.com> wrote:
> I talked with Majnemer/Mehdi about developing this proposal on github.
> They said that this was ok (after all we are moving to github). We can add
> facts to the specific proposal via PRs which we can use to center the
> discussion.
>
> I created a straw man repo and a scaffolding hacked from the
> swift-evolution process for just this purpose. I hacked some words from
> jlebar's initial email as just a starting point.
>
>
>
https://github.com/gottesmm/llvm-evolution/blob/master/proposals/0001-monorepo.md
>
> What do you guys think?
> Michael
>
> > On Jul 29, 2016, at 8:51 AM, Michael Gottesman <mgottesman at
apple.com>
> wrote:
> >
> > Additionally we should reach out to individual stakeholders and get
real
> data about:
> >
> > 1. Given the current workflow, what would it take to change to this
> different workflow. Whether or not it is easy or hard should be left out.
> Just specific details.
> >
> > 2. Once the workflow has been changed, how does this workflow change
day
> by day living for their users? Again this should be specific and a
> judgement of ease or difficulty should be left out.
> >
> > Without impartial data gathering, followed by compilation of the data,
> can an effective discussion happen.
> >
> > In fact I hope the proposal has an alternatives considered section
that
> lists all alternatives and a section that lists all impacts on other
> people, rather than just the specific proposal.
> >
> > Sent from my iPhone
> >
> >> On Jul 29, 2016, at 8:22 AM, Michael Gottesman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> I agree with Renato here. From someone who is just beginning to
> participate in this thread, the sheer amount of ad hominem argument thrown
> about is disappointing and unhelpful. What we need is a specific proposal
> to center the discussion and then line by line review that breaks out into
> (potentially) more specific discussion on individual points if they are
> contentious.
> >>
> >> Sent from my iPhone
> >>
> >>> On Jul 29, 2016, at 7:52 AM, Renato Golin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> On 29 July 2016 at 15:26, Robinson, Paul via llvm-dev
> >>> <llvm-dev at lists.llvm.org> wrote:
> >>>> I believe David Chisnall up-thread cited a difference in
checkout
> times
> >>>> on the order of a handful of seconds versus a couple of
minutes.
> While
> >>>> naively it might seem not a big deal, over time and
depending on what
> you
> >>>> are trying to do yes it can be a big burden.
> >>>
> >>> TL;DR: This thread is dead. Let's move on.
> >>>
> >>> I think the biggest fallacy in this thread is that changing
process is
> cheap.
> >>>
> >>> It is certainly cheap for me to do "git foo" instead
of "git bar" from
> >>> now on. It's moderately expensive to change my buildbot
> >>> configurations, Zorg's builders and re-test everything for
public CI.
> >>> It's a lot more expensive to change how distributions
build their
> >>> hundreds of thousands of packages over multiple LTS releases,
or how
> >>> downstream users like Sony, Apple or ARM re-factor their
entire build
> >>> systems (which very likely link to a lot of non-LLVM stuff),
and then
> >>> some.
> >>>
> >>> None of that is impossible, most of that is a "one
off". Most of the
> >>> companies and big projects "could" afford to do
that.
> >>>
> >>> But there are two big points that people like me, Paul and
David have
> >>> been unsuccessfully trying to make obvious:
> >>>
> >>> 1. Not every LLVM user is as big as FreeBSD, Sony or Apple.
There are
> >>> a lot of very interesting projects (hobbyists, academia,
professional)
> >>> using Clang, LLVM, libc++, etc. that don't have the staff
to do that
> >>> move. Being a hobbyist myself, I know too well that, when a
library
> >>> radically changes the way they behave (like boost did every
new
> >>> release about 10 years ago), I will stop using it.
> >>>
> >>> 2. Changes in complex systems have unwanted larger
consequences. Build
> >>> systems are some of the most complex systems in existence
because
> >>> they're mostly irrational and patched together with duct
tape and
> >>> paper clips. What may be very simple for some build systems,
could be
> >>> impossible for others, and that's not the other's
team's fault.
> >>>
> >>> So, if you have a complex build system yourself, and you spent
some
> >>> time and have figured out that it would be easy, you *cannot*
assume
> >>> it should be easy for everyone with an less or equally complex
build
> >>> systems.
> >>>
> >>> If you find it simple to change your own workflow towards this
or that
> >>> solution, you *cannot* assume everyone else should feel the
same. This
> >>> also doesn't diminish their intelligence or competence.
Intelligent
> >>> and competent people work in very different ways, and it's
actually
> >>> because of that fact that we can do such complex software
works in a
> >>> multitude of systems. If we were all equal, we wouldn't
need to
> >>> discuss anything. :)
> >>>
> >>> Mehdi said very early, and repeated many time, on some of the
threads,
> >>> something to the effect of: "Saying how hard or easy it
is for you is
> >>> an invalid argument, we need more concrete facts".
> >>>
> >>> I absolutely agree with that statement, but interpreting how
easy or
> >>> hard concrete facts would be fall on the same fallacy, so it
doesn't
> >>> bring us closer to consensus, it brings us closer to dissent.
> >>>
> >>> That is why I think this thread has already surpassed it's
usefulness
> >>> (for a long time), and we need a concrete write up on the
proposal. (I
> >>> hear it's in progress, let's wait for it).
> >>>
> >>> From now on, I'd propose the discussion to be *just* about
this
> >>> specific proposal, preferably over a Phabricator review on the
> >>> document. People that have strong opinions about it should
wait for
> >>> the survey.
> >>>
> >>> Just to reiterate, the survey is to collect opinions in a
formal and
> >>> non-passionate manner. It will not be a "majority
vote", and we're not
> >>> locked between these two solutions as they're absolutely
drawn out in
> >>> the documents, nor we are forced to take any decision if the
community
> >>> is clearly split. The last think I want is to destroy part of
the
> >>> community while trying to make it better.
> >>>
> >>> But this long thread is not doing any good either.
> >>>
> >>> cheers,
> >>> --renato
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160730/19319787/attachment.html>

Robinson, Paul via llvm-dev

2016-Jul-31 05:38 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> The only thing a monorepo gets you that strictly isn’t possible without
> it is the ability to commit to multiple projects in a single commit.
> Personally I don’t think that is a big enough justification, but that is
> my opinion, not a fact.
Okay, I just bumped into r277008, in which commits to llvm, clang, and
clang-tools-extra all have the same SVN revision number.
I don't know how it happened but it did.  Is this just an artifact of
how somebody pasted together a bunch of git-svn projects, or is it
something that a top-level git repo with submodules would allow?
And if it is, then the "only thing a monorepo gets you" isn't
something
that you need a monorepo to get.
Your befuddled correspondent,
--paulr

Michael Gottesman via llvm-dev

2016-Jul-31 07:24 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 31, 2016, at 12:06 AM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> And if it is, then the "only thing a monorepo gets you"
isn't something that you need a monorepo to get.
> 
> This is an *extremely important* point to understand, so let me try to
> be really clear about the current state of the world and the state of
> the world under the two "move to git" proposals.
> 
> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
> a monorepo, so today, a single commit can touch multiple subprojects.
> How you get the commit into SVN is your business.  Maybe you can hack
> git-svn somehow to do the atomic commit.  (If this is possible, it's
> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
> a git user, I wrote a hacky script [1] that cherry-picks commits from
> the existing monorepo mirror and commits them via SVN.  It's annoying
> to do, but it is possible today to atomically commit to multiple
> subprojects, as you observed.
> 
> Under the monorepo proposal, this becomes much easier.  It's just
"git
> commit", no magic.
> 
> Under the multirepo git proposal, this becomes either impossible or
> much more complicated.  Under the proposal, we have separate git
> repositories for each subproject, and we push directly to these.
> There's then an umbrella repository, which includes the subproject
> repos as git submodules.  There's a script which periodically checks
> the subproject repos for updates.  When it sees an update, it creates
> a new commit in the umbrella repository.  The script is the only thing
> that can create commits in the umbrella repo.
> 
> In order to get atomic commits in the multirepo world, we would need
> some way to inform the script that two otherwise separate commits
> should appear in the umbrella repo as a single commit.  We'd probably
> need to agree on a protocol communicated via commit messages.  We'd
> also probably need client-side scripts to set the commit messages
> appropriately.
I have been thinking about this a little bit last night.

The natural way to synchronize a multi-commit update in a git repository is via
a merge commit. This suggests what we really want in this case are several
updates (one to each repository) on a branch that is then merged in one instant
into the umbrella repository. Then the only thing the bots would see would be
the merge commit and thus state is synchronized.

The natural way to do this would be via a multiple-repo PR. In such a case, the
CI would handle the merging for you after you have done your testing and thus
update the umbrella repo appropriately. In Swift land we are using PRs
extensively and are going to most likely do multi-repo PRs. Once that is done I
expect to implement what I just described so I have a nice repo to drive our
performance tracking (which is using something else that is unfortunate right
now).

I do not think it will be that complicated to implement.
Michael
> 
> I expect this would be so much of a hassle, even if we managed to
> implement it on the server side, it would be prohibitively complex for
> most users.
> 
> In addition, under the multirepo, you only get synchronized subproject
> commits in your local checkout if you choose to use a git-submodules
> based workflow.  If you use the workflow that we currently have, then
> on the client side, there is no guarantee that your subprojects will
> be sync'ed.  (This is the same as most peoples' client-side git
> workflows today.)  *Even if we manage to atomically commit across
> subprojects*, that is of limited utility unless those commits show up
> atomically on developers' workstations.  But using a workflow based on
> git-submodules is highly complex as compared to the monorepo -- this
> was what I was trying to illustrate in my very first email on this
> thread.
> 
> When we say "the monorepo gets you atomic commits," that's an
abbreviation for
> 
> 1) The monorepo makes it far simpler to make atomic commits from git
> as compared to the current SVN setup.
> 2) Atomic commits are definitely possible in the monorepo.  They are
> theoretically possible in the multirepo, with extensive tooling etc.
> 3) Under the basic monorepo workflow, your checkouts are always
> correct with respect to atomic commits.  Under the basic multirepo
> workflow, this is not true -- you have to engage with git submodules
> to get this property, and that is a giant pain.
> 
> Sorry for the wall of text, but this is important.
> 
> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
> made one commit with it so far.  :)
> 
> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at
sony.com> wrote:
>>> The only thing a monorepo gets you that strictly isn’t possible
without
>>> it is the ability to commit to multiple projects in a single
commit.
>>> Personally I don’t think that is a big enough justification, but
that is
>>> my opinion, not a fact.
>> 
>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>> clang-tools-extra all have the same SVN revision number.
>> I don't know how it happened but it did.  Is this just an artifact
of
>> how somebody pasted together a bunch of git-svn projects, or is it
>> something that a top-level git repo with submodules would allow?
>> And if it is, then the "only thing a monorepo gets you"
isn't something
>> that you need a monorepo to get.
>> Your befuddled correspondent,
>> --paulr
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Michael Gottesman via llvm-dev

2016-Jul-31 07:56 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 31, 2016, at 12:36 AM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> This suggests what we really want in this case are several updates (one
to each repository) on a branch that is then merged in one instant into the
umbrella repository. Then the only thing the bots would see would be the merge
commit and thus state is synchronized.
> 
> The script that updates the umbrella repo would need to know that the
> several updates should all go into one branch that is then merged in
> one instant into the umbrella repo's master.  At the point that the
> umbrella repo can know this, it might as well just make a single
> commit to master that updates all N subproject hashes at once (which
> is what I was suggesting) -- I don't see how having a branch makes the
> situation any less complicated.
Ok. Sure. I was thinking out loud.
> 
>> The natural way to do this would be via a multiple-repo PR.
> 
> That doesn't exist in github, right?  We would have to somehow create
> this multi-repo PR-management system and link it in with the script
> that is managing the umbrella repo?  That is what I was describing.
The script in this case would not be making the change. The change would be made
by a special trusted continuous integration bot that does the merging. In Swift
land we have been using our @swift-ci system to merge things into master after
testing with great success. In terms of the script, the script would see that it
couldn't push a change and then would then just restart the loop.
> 
> Again, I am not claiming this isn't possible.  And, I don't care a
ton
> about complexity on the server-side.  But I do care about complexity
> on the client side.  I think it's highly unlikely that there exists a
> system for creating atomic commits and then checking out code in a way
> that respects that atomicity that as simple as "git commit" and
"git
> checkout" (which is what we'd have on the monorepo).
The key assumption here is that LLVM will not switch to a heavy PR model
(something which from what I understand is not a part of this specific
discussion and will be considered strictly after the move to github). In such a
case, I believe that it will be relatively simple to communicate this to the CI
and have the CI manage it for you. If on the swift side we implement such a
thing, I would be more than happy to provide guidance to you to help setup such
a system reusing the work on the Swift side.

Another thing that I do not understand about the mono-repo proposal is that
(*note* correct me if I am wrong) is that we can only avoid external
synchronization if we get /all/ projects that have build dependencies on the
mono-repo into the mono-repo. This suggests that (unless we are saying that
synchronization of those repositories are not important), that we will need to
invest in some sort of synchronization regardless of the mono-repo proposal. In
such a case the mono-repo proposal is essentially just an attempt to make it
convenient for a large subset of the community to ease their workflows, rather
than truly being an alternative to the submodule proposal. Am I
misunderstanding?

Michael
> 
> On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at google.com>
wrote:
>> By the way, I've been using the existing read-only monorepo [1] for
a
>> few days now.  The intent is to commit via the script I put together
>> [2], although I haven't committed anything other than a testing
commit
>> [3].
>> 
>> All I can say is, *wow* is it nice.  I hid everything I don't care
>> about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p [5]
>> [6], ycm [7]) suddenly work better now that there isn't an
artificial
>> boundary between my clang and llvm repositories.  I can have patch
>> queues that include LLVM commits and clang commits arbitrarily
>> interspersed with one another -- something I didn't realize I
wanted
>> until I made the switch and noticed I already had branches I could
>> merge (and something we can't do with Bogner's suggested
multirepo
>> workflow).
>> 
>> [1] https://github.com/llvm-project/llvm-project
>> [2] https://github.com/jlebar/llvm-repo-tools
>> [3]
https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
>> (which, embarrassingly, has a typo in the commit message)
>> [4]
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>> [5] https://github.com/kien/ctrlp.vim
>> [6] https://github.com/jlebar/ctrlp-py-matcher
>> [7] https://github.com/Valloric/YouCompleteMe
>> 
>> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at
google.com> wrote:
>>>> And if it is, then the "only thing a monorepo gets
you" isn't something that you need a monorepo to get.
>>> 
>>> This is an *extremely important* point to understand, so let me try
to
>>> be really clear about the current state of the world and the state
of
>>> the world under the two "move to git" proposals.
>>> 
>>> Today, all commits ultimately end up in SVN.  Our SVN is a
effectively
>>> a monorepo, so today, a single commit can touch multiple
subprojects.
>>> How you get the commit into SVN is your business.  Maybe you can
hack
>>> git-svn somehow to do the atomic commit.  (If this is possible,
it's
>>> beyond my ken.)  Alternatively you can just commit via SVN.  If
you're
>>> a git user, I wrote a hacky script [1] that cherry-picks commits
from
>>> the existing monorepo mirror and commits them via SVN.  It's
annoying
>>> to do, but it is possible today to atomically commit to multiple
>>> subprojects, as you observed.
>>> 
>>> Under the monorepo proposal, this becomes much easier.  It's
just "git
>>> commit", no magic.
>>> 
>>> Under the multirepo git proposal, this becomes either impossible or
>>> much more complicated.  Under the proposal, we have separate git
>>> repositories for each subproject, and we push directly to these.
>>> There's then an umbrella repository, which includes the
subproject
>>> repos as git submodules.  There's a script which periodically
checks
>>> the subproject repos for updates.  When it sees an update, it
creates
>>> a new commit in the umbrella repository.  The script is the only
thing
>>> that can create commits in the umbrella repo.
>>> 
>>> In order to get atomic commits in the multirepo world, we would
need
>>> some way to inform the script that two otherwise separate commits
>>> should appear in the umbrella repo as a single commit.  We'd
probably
>>> need to agree on a protocol communicated via commit messages. 
We'd
>>> also probably need client-side scripts to set the commit messages
>>> appropriately.
>>> 
>>> I expect this would be so much of a hassle, even if we managed to
>>> implement it on the server side, it would be prohibitively complex
for
>>> most users.
>>> 
>>> In addition, under the multirepo, you only get synchronized
subproject
>>> commits in your local checkout if you choose to use a
git-submodules
>>> based workflow.  If you use the workflow that we currently have,
then
>>> on the client side, there is no guarantee that your subprojects
will
>>> be sync'ed.  (This is the same as most peoples' client-side
git
>>> workflows today.)  *Even if we manage to atomically commit across
>>> subprojects*, that is of limited utility unless those commits show
up
>>> atomically on developers' workstations.  But using a workflow
based on
>>> git-submodules is highly complex as compared to the monorepo --
this
>>> was what I was trying to illustrate in my very first email on this
>>> thread.
>>> 
>>> When we say "the monorepo gets you atomic commits,"
that's an abbreviation for
>>> 
>>> 1) The monorepo makes it far simpler to make atomic commits from
git
>>> as compared to the current SVN setup.
>>> 2) Atomic commits are definitely possible in the monorepo.  They
are
>>> theoretically possible in the multirepo, with extensive tooling
etc.
>>> 3) Under the basic monorepo workflow, your checkouts are always
>>> correct with respect to atomic commits.  Under the basic multirepo
>>> workflow, this is not true -- you have to engage with git
submodules
>>> to get this property, and that is a giant pain.
>>> 
>>> Sorry for the wall of text, but this is important.
>>> 
>>> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful,
I've only
>>> made one commit with it so far.  :)
>>> 
>>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson
at sony.com> wrote:
>>>>> The only thing a monorepo gets you that strictly isn’t
possible without
>>>>> it is the ability to commit to multiple projects in a
single commit.
>>>>> Personally I don’t think that is a big enough
justification, but that is
>>>>> my opinion, not a fact.
>>>> 
>>>> Okay, I just bumped into r277008, in which commits to llvm,
clang, and
>>>> clang-tools-extra all have the same SVN revision number.
>>>> I don't know how it happened but it did.  Is this just an
artifact of
>>>> how somebody pasted together a bunch of git-svn projects, or is
it
>>>> something that a top-level git repo with submodules would
allow?
>>>> And if it is, then the "only thing a monorepo gets
you" isn't something
>>>> that you need a monorepo to get.
>>>> Your befuddled correspondent,
>>>> --paulr
>>>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Michael Gottesman via llvm-dev

2016-Aug-01 02:22 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 31, 2016, at 12:56 AM, Michael Gottesman via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
>> 
>> On Jul 31, 2016, at 12:36 AM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> This suggests what we really want in this case are several updates
(one to each repository) on a branch that is then merged in one instant into the
umbrella repository. Then the only thing the bots would see would be the merge
commit and thus state is synchronized.
>> 
>> The script that updates the umbrella repo would need to know that the
>> several updates should all go into one branch that is then merged in
>> one instant into the umbrella repo's master.  At the point that the
>> umbrella repo can know this, it might as well just make a single
>> commit to master that updates all N subproject hashes at once (which
>> is what I was suggesting) -- I don't see how having a branch makes
the
>> situation any less complicated.
> 
> Ok. Sure. I was thinking out loud.
> 
>> 
>>> The natural way to do this would be via a multiple-repo PR.
>> 
>> That doesn't exist in github, right?  We would have to somehow
create
>> this multi-repo PR-management system and link it in with the script
>> that is managing the umbrella repo?  That is what I was describing.
> 
> The script in this case would not be making the change. The change would be
made by a special trusted continuous integration bot that does the merging. In
Swift land we have been using our @swift-ci system to merge things into master
after testing with great success. In terms of the script, the script would see
that it couldn't push a change and then would then just restart the loop.
> 
>> 
>> Again, I am not claiming this isn't possible.  And, I don't
care a ton
>> about complexity on the server-side.  But I do care about complexity
>> on the client side.  I think it's highly unlikely that there exists
a
>> system for creating atomic commits and then checking out code in a way
>> that respects that atomicity that as simple as "git commit"
and "git
>> checkout" (which is what we'd have on the monorepo).
> 
> The key assumption here is that LLVM will not switch to a heavy PR model
(something which from what I understand is not a part of this specific
discussion and will be considered strictly after the move to github). In such a
case, I believe that it will be relatively simple to communicate this to the CI
and have the CI manage it for you. If on the swift side we implement such a
thing, I would be more than happy to provide guidance to you to help setup such
a system reusing the work on the Swift side.
> 
> Another thing that I do not understand about the mono-repo proposal is that
(*note* correct me if I am wrong) is that we can only avoid external
synchronization if we get /all/ projects that have build dependencies on the
mono-repo into the mono-repo. This suggests that (unless we are saying that
synchronization of those repositories are not important), that we will need to
invest in some sort of synchronization regardless of the mono-repo proposal. In
such a case the mono-repo proposal is essentially just an attempt to make it
convenient for a large subset of the community to ease their workflows, rather
than truly being an alternative to the submodule proposal. Am I
misunderstanding?
Just an FYI, I talked with jlebar on IRC and we advanced the conversation, I am
going to update the document when I get some time later tonight.

Michael
> 
> Michael
> 
>> 
>> On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at
google.com> wrote:
>>> By the way, I've been using the existing read-only monorepo [1]
for a
>>> few days now.  The intent is to commit via the script I put
together
>>> [2], although I haven't committed anything other than a testing
commit
>>> [3].
>>> 
>>> All I can say is, *wow* is it nice.  I hid everything I don't
care
>>> about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p
[5]
>>> [6], ycm [7]) suddenly work better now that there isn't an
artificial
>>> boundary between my clang and llvm repositories.  I can have patch
>>> queues that include LLVM commits and clang commits arbitrarily
>>> interspersed with one another -- something I didn't realize I
wanted
>>> until I made the switch and noticed I already had branches I could
>>> merge (and something we can't do with Bogner's suggested
multirepo
>>> workflow).
>>> 
>>> [1] https://github.com/llvm-project/llvm-project
>>> [2] https://github.com/jlebar/llvm-repo-tools
>>> [3]
https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
>>> (which, embarrassingly, has a typo in the commit message)
>>> [4]
http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>>> [5] https://github.com/kien/ctrlp.vim
>>> [6] https://github.com/jlebar/ctrlp-py-matcher
>>> [7] https://github.com/Valloric/YouCompleteMe
>>> 
>>> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at
google.com> wrote:
>>>>> And if it is, then the "only thing a monorepo gets
you" isn't something that you need a monorepo to get.
>>>> 
>>>> This is an *extremely important* point to understand, so let me
try to
>>>> be really clear about the current state of the world and the
state of
>>>> the world under the two "move to git" proposals.
>>>> 
>>>> Today, all commits ultimately end up in SVN.  Our SVN is a
effectively
>>>> a monorepo, so today, a single commit can touch multiple
subprojects.
>>>> How you get the commit into SVN is your business.  Maybe you
can hack
>>>> git-svn somehow to do the atomic commit.  (If this is possible,
it's
>>>> beyond my ken.)  Alternatively you can just commit via SVN.  If
you're
>>>> a git user, I wrote a hacky script [1] that cherry-picks
commits from
>>>> the existing monorepo mirror and commits them via SVN. 
It's annoying
>>>> to do, but it is possible today to atomically commit to
multiple
>>>> subprojects, as you observed.
>>>> 
>>>> Under the monorepo proposal, this becomes much easier. 
It's just "git
>>>> commit", no magic.
>>>> 
>>>> Under the multirepo git proposal, this becomes either
impossible or
>>>> much more complicated.  Under the proposal, we have separate
git
>>>> repositories for each subproject, and we push directly to
these.
>>>> There's then an umbrella repository, which includes the
subproject
>>>> repos as git submodules.  There's a script which
periodically checks
>>>> the subproject repos for updates.  When it sees an update, it
creates
>>>> a new commit in the umbrella repository.  The script is the
only thing
>>>> that can create commits in the umbrella repo.
>>>> 
>>>> In order to get atomic commits in the multirepo world, we would
need
>>>> some way to inform the script that two otherwise separate
commits
>>>> should appear in the umbrella repo as a single commit. 
We'd probably
>>>> need to agree on a protocol communicated via commit messages. 
We'd
>>>> also probably need client-side scripts to set the commit
messages
>>>> appropriately.
>>>> 
>>>> I expect this would be so much of a hassle, even if we managed
to
>>>> implement it on the server side, it would be prohibitively
complex for
>>>> most users.
>>>> 
>>>> In addition, under the multirepo, you only get synchronized
subproject
>>>> commits in your local checkout if you choose to use a
git-submodules
>>>> based workflow.  If you use the workflow that we currently
have, then
>>>> on the client side, there is no guarantee that your subprojects
will
>>>> be sync'ed.  (This is the same as most peoples'
client-side git
>>>> workflows today.)  *Even if we manage to atomically commit
across
>>>> subprojects*, that is of limited utility unless those commits
show up
>>>> atomically on developers' workstations.  But using a
workflow based on
>>>> git-submodules is highly complex as compared to the monorepo --
this
>>>> was what I was trying to illustrate in my very first email on
this
>>>> thread.
>>>> 
>>>> When we say "the monorepo gets you atomic commits,"
that's an abbreviation for
>>>> 
>>>> 1) The monorepo makes it far simpler to make atomic commits
from git
>>>> as compared to the current SVN setup.
>>>> 2) Atomic commits are definitely possible in the monorepo. 
They are
>>>> theoretically possible in the multirepo, with extensive tooling
etc.
>>>> 3) Under the basic monorepo workflow, your checkouts are always
>>>> correct with respect to atomic commits.  Under the basic
multirepo
>>>> workflow, this is not true -- you have to engage with git
submodules
>>>> to get this property, and that is a giant pain.
>>>> 
>>>> Sorry for the wall of text, but this is important.
>>>> 
>>>> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful,
I've only
>>>> made one commit with it so far.  :)
>>>> 
>>>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul
<paul.robinson at sony.com> wrote:
>>>>>> The only thing a monorepo gets you that strictly isn’t
possible without
>>>>>> it is the ability to commit to multiple projects in a
single commit.
>>>>>> Personally I don’t think that is a big enough
justification, but that is
>>>>>> my opinion, not a fact.
>>>>> 
>>>>> Okay, I just bumped into r277008, in which commits to llvm,
clang, and
>>>>> clang-tools-extra all have the same SVN revision number.
>>>>> I don't know how it happened but it did.  Is this just
an artifact of
>>>>> how somebody pasted together a bunch of git-svn projects,
or is it
>>>>> something that a top-level git repo with submodules would
allow?
>>>>> And if it is, then the "only thing a monorepo gets
you" isn't something
>>>>> that you need a monorepo to get.
>>>>> Your befuddled correspondent,
>>>>> --paulr
>>>>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160731/d29b4f63/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Aug-09 00:09 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com
<mailto:jlebar at google.com>> wrote:
>> 
>> Thanks for your thoughts, Chris.
>> 
>>> As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.
>> 
>> I hope they'll join us in this thread.
>> 
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
> 
> I don’t agree with this assertion. I believe that if you put forth multiple
proposals, and have an articulate discussion of the merits and costs of each
solution you can create a survey that can help inform decision making. I suppose
we can agree to disagree.
> 
>> 
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>> 
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>> 
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
> 
> Ok, let me describe a few workflows I’ve used in the last year that are (in
my mind) adversely impacted by a mono-repo.
> 
> Case Study 1 - Simple development on a sub-project
> 
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.
> 
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
either I have to wrangle some crazy git or CMake foo, or when I run “ninja
compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
sucks.
> 
> What this example illustrates to me is that today we have loosely coupled
projects with an occasional rev lock. Moving to a mono-repo enforces a tight
coupling that isn’t strictly required today.
> 
> Case Study 2 - Working on a sub-project in isolation across many platforms
> 
> I did a lot of work on Compiler-RT last year that had no direct dependency
on any other LLVM project. During the development I was working with a
Compiler-RT checkout and a build directory of just Compiler-RT. Every once in a
while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
> 
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
> 
> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
So for the “I spin a VM and want to make a commit but don’t want to download a
few hundred MBs with a git clone” story, it turns out that the github bridge
with SVN helps to optimize with a “lean” checkout:

I fork the unified repo here:
https://github.com/joker-eph/llvm-project/commits/master
<https://github.com/joker-eph/llvm-project/commits/master> and then:  svn
co https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>

So that’s a net “no regression” compared to the current state :)


— 
Mehdi




> 
> 
>> 
>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>> 
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.  
> 
> I actually think there are *a lot* more considerations we need to be making
for an infrastructure change like this. While it is true that our SCM hosting
strategy primarily impacts developers, it also impacts our users. We should be
conscious of the impact to downstream users in making infrastructure changes
like this. That is part of why the idea of a survey holds appeal to me; it would
give us the opportunity to get feedback from a much wider audience than the
current “people on llvm-dev who haven’t been scared away”.
> 
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>> 
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
> 
> I actually think we’re both talking about solutions that require tooling,
and while we *could* be disagreeing over how much effort each tooling initiative
would require (I think they’re pretty close, so I don’t care to have that
argument), my actual disagreement with your proposal is that it is a change that
impacts developers and users universally and I don’t think that it is justified.
Simply put, I don’t feel that the benefits are substantial enough to warrant the
kind of disruptive change you’re proposing.
> 
>> 
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
> 
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
wrong”. Bisecting across multiple git repositories isn’t a great experience. But
neither is bisecting across a half dozen separate folders in an SVN repository.
Both the submodule solution and the mono-repo solution solve this problem
equivalently well.
> 
>> Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>> 
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
> 
> I’m not sure what “brokenness” you’re referring to. We have a collection of
loosely connected projects by design. As a result of that intentional design
certain workflows will be impacted. I don’t think that is brokenness. I think
our loose coupling is a feature even if it makes some workflows harder.
> 
> -Chris
> 
>> 
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>> 
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I’m just now catching up on this massive thread after being on
vacation last
>>> week, and I have a few thoughts I’d like to share.
>>> 
>>> First and foremost please don’t consider lack of dissent on the
thread as
>>> presence of consensus. The various git-related threads on LLVM-dev
lately
>>> have been so active and contentious that I think a lot of people
are zoning
>>> out on the conversations. As supporting evidence of this, I was
discussing
>>> this thread yesterday around the office yesterday and had quite a
few people
>>> responding something along the lines of “they’re proposing what?”.
>>> 
>>> I think it would be great for us to have several different
proposals for how
>>> the git-transition could work, and have a survey to get people’s
opinions. I
>>> know this has been discussed repeatedly, and I want to put in my
vote in
>>> favor of having a survey that takes into account multiple different
>>> approaches.
>>> 
>>> WRT the actual proposal in this thread, I’m strongly opposed to a
>>> mono-repository. While I understand the argument that the full
clone’s cost
>>> on disk space is minimal compared to an LLVM object directory, what
about
>>> for contributors that contribute to the smaller runtimes projects
but *not*
>>> to LLVM or Clang. A contributor that only contributes to libcxx or
>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>> order to push a patch kinda sucks.
>>> 
>>> I want to point out a few workflows people may not be considering.
>>> 
>>> Clang can be built against an installed LLVM. I know this workflow
is used
>>> by some people because I’ve broken it in the past and had to fix
it. With a
>>> mono-repo this workflow gets a bit more complicated because you’d
need to do
>>> sparse checkouts, and it probably means we should just nuke the
workflow
>>> entirely because there is no real value added by having it.
>>> 
>>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While
for the
>>> common use case maintaining sparse repository mirrors would limit
impact of
>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>> forcing them to clone a much larger repository than necessary.
>>> 
>>> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>>> libcxxabi, libunwind, and potentially any other runtime library
projects
>>> that we may create in the future.
>>> 
>>> Beyond all that I want to point out that the git multi-repository
story is
>>> basically the same thing we have today with SVN except for the
absence of a
>>> monotonically increasing number that corresponds across
repositories. While
>>> admittedly you do get a linear history with using the
mono-repository, that
>>> isn’t the only way to solve the problem, and I don’t really think
that the
>>> benefit (not needing to write some tooling) justifies the increased
burden
>>> applied to contributors that don’t use the full LLVM family of
projects.
>>> 
>>> I think we have some pretty strong evidence in the form of the
github fork
>>> counts (https://github.com/llvm-mirror/) that most people aren’t
using all
>>> of the LLVM projects. In fact, by that evidence Clang (the second
most
>>> popular project) is forked less than 2/3 as many times as LLVM.
>>> 
>>> -Chris
>>> 
>>> 
>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Even if it were possible, I would still keep my upstream checkout
>>> separate just as a safety measure, to keep from sending private
stuff
>>> upstream by accident.
>>> 
>>> 
>>> Just FYI, this is our (Azul's) workflow as well, and for
similar
>>> reasons.
>>> 
>>> 
>>> Same here.
>>> 
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/2dd7e99c/attachment-0001.html>

Chris Bieneman via llvm-dev

2016-Aug-09 01:02 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> 
> 
>>> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> 
>>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com> wrote:
>>> 
>>> Thanks for your thoughts, Chris.
>>> 
>>>> As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.
>>> 
>>> I hope they'll join us in this thread.
>>> 
>>> Ultimately a survey is going to be strongly biased in favor of
"don't
>>> change anything".  There is a strong psychological bias to
weight
>>> losses more than gains, so if one doesn't engage with the
issue, it's
>>> only natural to conclude "keep it as similar as possible to
what it is
>>> today -- that is safe."  But that line of thinking does not
>>> necessarily lead us to the best outcome.
>> 
>> I don’t agree with this assertion. I believe that if you put forth
multiple proposals, and have an articulate discussion of the merits and costs of
each solution you can create a survey that can help inform decision making. I
suppose we can agree to disagree.
>> 
>>> 
>>> We've heard in thread from a lot of developers about how a
monorepo
>>> would improve their workflow.  I would love to hear from some
>>> developers who are actually affected in the way you describe,
rather
>>> than just considering the hypothetical.
>>> 
>>> My expectation is that the effect of the monorepo on said
developers
>>> would be relatively small -- we're talking about 1gb of disk
space.  I
>>> understand that there's a "yuck" factor to this, but
inasmuch as there
>>> aren't other concrete effects, this is just change aversion. 
And
>>> essentially all of the other effects of the monorepo can be hidden
via
>>> sparse checkouts, as we've discussed.
>>> 
>>> Maybe I am wrong.  But I don't think we're going to get to
the bottom
>>> of it without actually engaging with people who are actually
affected
>>> in the way you posit.
>> 
>> Ok, let me describe a few workflows I’ve used in the last year that are
(in my mind) adversely impacted by a mono-repo.
>> 
>> Case Study 1 - Simple development on a sub-project
>> 
>> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.
>> 
>> With a mono-repo rebasing the checkout means rebasing the whole tree.
So, either I have to wrangle some crazy git or CMake foo, or when I run “ninja
compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
sucks.
>> 
>> What this example illustrates to me is that today we have loosely
coupled projects with an occasional rev lock. Moving to a mono-repo enforces a
tight coupling that isn’t strictly required today.
>> 
>> Case Study 2 - Working on a sub-project in isolation across many
platforms
>> 
>> I did a lot of work on Compiler-RT last year that had no direct
dependency on any other LLVM project. During the development I was working with
a Compiler-RT checkout and a build directory of just Compiler-RT. Every once in
a while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
>> 
>> (1) Spin up a VM on a VPS that closely matched the configuration I
broke
>> (2) Checkout Compiler-RT
>> (3) Reproduce, debug, fix the failure
>> (4) Commit the patch from the VM
>> 
>> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
> 
> So for the “I spin a VM and want to make a commit but don’t want to
download a few hundred MBs with a git clone” story, it turns out that the github
bridge with SVN helps to optimize with a “lean” checkout:
> 
> I fork the unified repo here:
https://github.com/joker-eph/llvm-project/commits/master and then:  svn co
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> 
> So that’s a net “no regression” compared to the current state :)
Is the github SVN interface's "co" magically as fast as a git
clone? If not, it is a performance regression because today I use git clone and
git-svn on my VMs just like on my physical machines, and either way it adds some
crazy complexity.

-Chris

> 
> 
> — 
> Mehdi
> 
> 
> 
> 
> 
>> 
>> 
>>> 
>>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>>> 
>>> I think the trade-off you're considering here (cost to
developers who
>>> use llvm plus a version-locked subrepo vs. cost to developers who
>>> don't want an llvm clone) is the right one.  
>> 
>> I actually think there are *a lot* more considerations we need to be
making for an infrastructure change like this. While it is true that our SCM
hosting strategy primarily impacts developers, it also impacts our users. We
should be conscious of the impact to downstream users in making infrastructure
changes like this. That is part of why the idea of a survey holds appeal to me;
it would give us the opportunity to get feedback from a much wider audience than
the current “people on llvm-dev who haven’t been scared away”.
>> 
>>> But as someone who has
>>> extensively used git submodules and repo (a wrapper script), I
>>> strongly disagree with the judgement that a monorepo would not be a
>>> significant improvement.
>>> 
>>> Our primary disagreement, I think, is over how much cost there is
to
>>> "writing some tooling".  To me, this is a significant
barrier standing
>>> in the way of developer productivity.  Here at Google I did a quick
>>> survey, and more than half of us don't have scripts of the sort
that
>>> Justin Bogner described.  We are all just floundering around
rebasing
>>> clang and llvm until it compiles.  It *sucks*.
>> 
>> I actually think we’re both talking about solutions that require
tooling, and while we *could* be disagreeing over how much effort each tooling
initiative would require (I think they’re pretty close, so I don’t care to have
that argument), my actual disagreement with your proposal is that it is a change
that impacts developers and users universally and I don’t think that it is
justified. Simply put, I don’t feel that the benefits are substantial enough to
warrant the kind of disruptive change you’re proposing.
>> 
>>> 
>>> I suggest that saying that all of these developers are "doing
it
>>> wrong" is not helpful.
>> 
>> Maybe I’m missing something, but I don’t think I said anyone was “doing
it wrong”. Bisecting across multiple git repositories isn’t a great experience.
But neither is bisecting across a half dozen separate folders in an SVN
repository. Both the submodule solution and the mono-repo solution solve this
problem equivalently well.
>> 
>>> Not everyone has the git and python/bash chops
>>> to write the necessary scripts.  Not everyone has the personality
to
>>> obsessively script around stuff, or the desire to maintain said
>>> scripts.  Not everyone works on llvm/clang so much that it's
worth
>>> adopting a special-snowflake workflow.  And some of us -- myself
>>> included -- have extensive git scripts which work with the standard
>>> git workflow but would be completely broken by adding a custom
level
>>> of indirection around git.
>>> 
>>> When put this way, maybe it's clear that it's actually a
niche set of
>>> people for whom "script around the brokenness" is a good
solution.
>> 
>> I’m not sure what “brokenness” you’re referring to. We have a
collection of loosely connected projects by design. As a result of that
intentional design certain workflows will be impacted. I don’t think that is
brokenness. I think our loose coupling is a feature even if it makes some
workflows harder.
>> 
>> -Chris
>> 
>>> 
>>> As I've said a bunch of times above, we have to weigh a cost
paid by
>>> all of us every time we type a command that starts with
"git" --
>>> something we do tens or hundreds of times a day -- versus the
one-time
>>> cost of asking people to download 1gb of data.
>>> 
>>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> I’m just now catching up on this massive thread after being on
vacation last
>>>> week, and I have a few thoughts I’d like to share.
>>>> 
>>>> First and foremost please don’t consider lack of dissent on the
thread as
>>>> presence of consensus. The various git-related threads on
LLVM-dev lately
>>>> have been so active and contentious that I think a lot of
people are zoning
>>>> out on the conversations. As supporting evidence of this, I was
discussing
>>>> this thread yesterday around the office yesterday and had quite
a few people
>>>> responding something along the lines of “they’re proposing
what?”.
>>>> 
>>>> I think it would be great for us to have several different
proposals for how
>>>> the git-transition could work, and have a survey to get
people’s opinions. I
>>>> know this has been discussed repeatedly, and I want to put in
my vote in
>>>> favor of having a survey that takes into account multiple
different
>>>> approaches.
>>>> 
>>>> WRT the actual proposal in this thread, I’m strongly opposed to
a
>>>> mono-repository. While I understand the argument that the full
clone’s cost
>>>> on disk space is minimal compared to an LLVM object directory,
what about
>>>> for contributors that contribute to the smaller runtimes
projects but *not*
>>>> to LLVM or Clang. A contributor that only contributes to libcxx
or
>>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>>> order to push a patch kinda sucks.
>>>> 
>>>> I want to point out a few workflows people may not be
considering.
>>>> 
>>>> Clang can be built against an installed LLVM. I know this
workflow is used
>>>> by some people because I’ve broken it in the past and had to
fix it. With a
>>>> mono-repo this workflow gets a bit more complicated because
you’d need to do
>>>> sparse checkouts, and it probably means we should just nuke the
workflow
>>>> entirely because there is no real value added by having it.
>>>> 
>>>> Compiler-RT’s sanitizers are used with GCC; no LLVM required.
While for the
>>>> common use case maintaining sparse repository mirrors would
limit impact of
>>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>>> forcing them to clone a much larger repository than necessary.
>>>> 
>>>> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>>>> libcxxabi, libunwind, and potentially any other runtime library
projects
>>>> that we may create in the future.
>>>> 
>>>> Beyond all that I want to point out that the git
multi-repository story is
>>>> basically the same thing we have today with SVN except for the
absence of a
>>>> monotonically increasing number that corresponds across
repositories. While
>>>> admittedly you do get a linear history with using the
mono-repository, that
>>>> isn’t the only way to solve the problem, and I don’t really
think that the
>>>> benefit (not needing to write some tooling) justifies the
increased burden
>>>> applied to contributors that don’t use the full LLVM family of
projects.
>>>> 
>>>> I think we have some pretty strong evidence in the form of the
github fork
>>>> counts (https://github.com/llvm-mirror/) that most people
aren’t using all
>>>> of the LLVM projects. In fact, by that evidence Clang (the
second most
>>>> popular project) is forked less than 2/3 as many times as LLVM.
>>>> 
>>>> -Chris
>>>> 
>>>> 
>>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> Even if it were possible, I would still keep my upstream
checkout
>>>> separate just as a safety measure, to keep from sending private
stuff
>>>> upstream by accident.
>>>> 
>>>> 
>>>> Just FYI, this is our (Azul's) workflow as well, and for
similar
>>>> reasons.
>>>> 
>>>> 
>>>> Same here.
>>>> 
>>>> cheers,
>>>> --renato
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/625d3236/attachment.html>

Mehdi Amini via llvm-dev

2016-Aug-09 17:08 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com>
wrote:
> 
> 
> 
> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> 
>>> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>>> 
>>>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com <mailto:jlebar at google.com>> wrote:
>>>> 
>>>> Thanks for your thoughts, Chris.
>>>> 
>>>>> As supporting evidence of this, I was discussing this
thread yesterday around the office yesterday and had quite a few people
responding something along the lines of “they’re proposing what?”.
>>>> 
>>>> I hope they'll join us in this thread.
>>>> 
>>>> Ultimately a survey is going to be strongly biased in favor of
"don't
>>>> change anything".  There is a strong psychological bias to
weight
>>>> losses more than gains, so if one doesn't engage with the
issue, it's
>>>> only natural to conclude "keep it as similar as possible
to what it is
>>>> today -- that is safe."  But that line of thinking does
not
>>>> necessarily lead us to the best outcome.
>>> 
>>> I don’t agree with this assertion. I believe that if you put forth
multiple proposals, and have an articulate discussion of the merits and costs of
each solution you can create a survey that can help inform decision making. I
suppose we can agree to disagree.
>>> 
>>>> 
>>>> We've heard in thread from a lot of developers about how a
monorepo
>>>> would improve their workflow.  I would love to hear from some
>>>> developers who are actually affected in the way you describe,
rather
>>>> than just considering the hypothetical.
>>>> 
>>>> My expectation is that the effect of the monorepo on said
developers
>>>> would be relatively small -- we're talking about 1gb of
disk space.  I
>>>> understand that there's a "yuck" factor to this,
but inasmuch as there
>>>> aren't other concrete effects, this is just change
aversion.  And
>>>> essentially all of the other effects of the monorepo can be
hidden via
>>>> sparse checkouts, as we've discussed.
>>>> 
>>>> Maybe I am wrong.  But I don't think we're going to get
to the bottom
>>>> of it without actually engaging with people who are actually
affected
>>>> in the way you posit.
>>> 
>>> Ok, let me describe a few workflows I’ve used in the last year that
are (in my mind) adversely impacted by a mono-repo.
>>> 
>>> Case Study 1 - Simple development on a sub-project
>>> 
>>> I build LLVM + Clang + Compiler-RT using the just-built Clang to
build Compiler-RT. I iterate on some complicated Compiler-RT changes over a
period of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
repo, rebuild compiler-rt then commit.
>>> 
>>> With a mono-repo rebasing the checkout means rebasing the whole
tree. So, either I have to wrangle some crazy git or CMake foo, or when I run
“ninja compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
kinda sucks.
>>> 
>>> What this example illustrates to me is that today we have loosely
coupled projects with an occasional rev lock. Moving to a mono-repo enforces a
tight coupling that isn’t strictly required today.
>>> 
>>> Case Study 2 - Working on a sub-project in isolation across many
platforms
>>> 
>>> I did a lot of work on Compiler-RT last year that had no direct
dependency on any other LLVM project. During the development I was working with
a Compiler-RT checkout and a build directory of just Compiler-RT. Every once in
a while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
>>> 
>>> (1) Spin up a VM on a VPS that closely matched the configuration I
broke
>>> (2) Checkout Compiler-RT
>>> (3) Reproduce, debug, fix the failure
>>> (4) Commit the patch from the VM
>>> 
>>> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
>> 
>> So for the “I spin a VM and want to make a commit but don’t want to
download a few hundred MBs with a git clone” story, it turns out that the github
bridge with SVN helps to optimize with a “lean” checkout:
>> 
>> I fork the unified repo here:
https://github.com/joker-eph/llvm-project/commits/master
<https://github.com/joker-eph/llvm-project/commits/master> and then:  svn
co https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>
>> 
>> So that’s a net “no regression” compared to the current state :)
> 
> Is the github SVN interface's "co" magically as fast as a git
clone?
$ time svn co  https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>
….
real	0m8.539s	user	0m0.919s 	sys	0m1.917s
$ time git clone https://github.com/joker-eph/compiler-rt.git
real	0m5.487s	user	0m1.208s	sys	0m0.825s

> If not, it is a performance regression because today I use git clone and
git-svn on my VMs just like on my physical machines, and either way it adds some
crazy complexity.
No problem, I get it, exactly same workflow as today:

# Clone from the single read-only git repo
$ git clone https://github.com/joker-eph/compiler-rt.git 
…
# Configure the SVN remote and initialize the svn metadata
$ cd compiler-rt
$ git svn init https://github.com/joker-eph/llvm-project/trunk/compiler-rt
—username$ git config svn-remote.svn.fetch :refs/remotes/origin/master
$ git svn rebase -l
...
# Remove and empty file and commit with git
$ git rm empty
$ git commit -m "remove empty file"
# commit/push with svn to the unified git repo
$ git svn dcommit
Committing to https://github.com/joker-eph/llvm-project/trunk/compiler-rt ...
	D	empty
Committed r354148

 
Here is the commit:
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
<https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe>


— 
Mehdi
>> 
>> 
>> 
>> 
>> 
>>> 
>>> 
>>>> 
>>>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>>>> 
>>>> I think the trade-off you're considering here (cost to
developers who
>>>> use llvm plus a version-locked subrepo vs. cost to developers
who
>>>> don't want an llvm clone) is the right one.  
>>> 
>>> I actually think there are *a lot* more considerations we need to
be making for an infrastructure change like this. While it is true that our SCM
hosting strategy primarily impacts developers, it also impacts our users. We
should be conscious of the impact to downstream users in making infrastructure
changes like this. That is part of why the idea of a survey holds appeal to me;
it would give us the opportunity to get feedback from a much wider audience than
the current “people on llvm-dev who haven’t been scared away”.
>>> 
>>>> But as someone who has
>>>> extensively used git submodules and repo (a wrapper script), I
>>>> strongly disagree with the judgement that a monorepo would not
be a
>>>> significant improvement.
>>>> 
>>>> Our primary disagreement, I think, is over how much cost there
is to
>>>> "writing some tooling".  To me, this is a significant
barrier standing
>>>> in the way of developer productivity.  Here at Google I did a
quick
>>>> survey, and more than half of us don't have scripts of the
sort that
>>>> Justin Bogner described.  We are all just floundering around
rebasing
>>>> clang and llvm until it compiles.  It *sucks*.
>>> 
>>> I actually think we’re both talking about solutions that require
tooling, and while we *could* be disagreeing over how much effort each tooling
initiative would require (I think they’re pretty close, so I don’t care to have
that argument), my actual disagreement with your proposal is that it is a change
that impacts developers and users universally and I don’t think that it is
justified. Simply put, I don’t feel that the benefits are substantial enough to
warrant the kind of disruptive change you’re proposing.
>>> 
>>>> 
>>>> I suggest that saying that all of these developers are
"doing it
>>>> wrong" is not helpful.
>>> 
>>> Maybe I’m missing something, but I don’t think I said anyone was
“doing it wrong”. Bisecting across multiple git repositories isn’t a great
experience. But neither is bisecting across a half dozen separate folders in an
SVN repository. Both the submodule solution and the mono-repo solution solve
this problem equivalently well.
>>> 
>>>> Not everyone has the git and python/bash chops
>>>> to write the necessary scripts.  Not everyone has the
personality to
>>>> obsessively script around stuff, or the desire to maintain said
>>>> scripts.  Not everyone works on llvm/clang so much that
it's worth
>>>> adopting a special-snowflake workflow.  And some of us --
myself
>>>> included -- have extensive git scripts which work with the
standard
>>>> git workflow but would be completely broken by adding a custom
level
>>>> of indirection around git.
>>>> 
>>>> When put this way, maybe it's clear that it's actually
a niche set of
>>>> people for whom "script around the brokenness" is a
good solution.
>>> 
>>> I’m not sure what “brokenness” you’re referring to. We have a
collection of loosely connected projects by design. As a result of that
intentional design certain workflows will be impacted. I don’t think that is
brokenness. I think our loose coupling is a feature even if it makes some
workflows harder.
>>> 
>>> -Chris
>>> 
>>>> 
>>>> As I've said a bunch of times above, we have to weigh a
cost paid by
>>>> all of us every time we type a command that starts with
"git" --
>>>> something we do tens or hundreds of times a day -- versus the
one-time
>>>> cost of asking people to download 1gb of data.
>>>> 
>>>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>> I’m just now catching up on this massive thread after being
on vacation last
>>>>> week, and I have a few thoughts I’d like to share.
>>>>> 
>>>>> First and foremost please don’t consider lack of dissent on
the thread as
>>>>> presence of consensus. The various git-related threads on
LLVM-dev lately
>>>>> have been so active and contentious that I think a lot of
people are zoning
>>>>> out on the conversations. As supporting evidence of this, I
was discussing
>>>>> this thread yesterday around the office yesterday and had
quite a few people
>>>>> responding something along the lines of “they’re proposing
what?”.
>>>>> 
>>>>> I think it would be great for us to have several different
proposals for how
>>>>> the git-transition could work, and have a survey to get
people’s opinions. I
>>>>> know this has been discussed repeatedly, and I want to put
in my vote in
>>>>> favor of having a survey that takes into account multiple
different
>>>>> approaches.
>>>>> 
>>>>> WRT the actual proposal in this thread, I’m strongly
opposed to a
>>>>> mono-repository. While I understand the argument that the
full clone’s cost
>>>>> on disk space is minimal compared to an LLVM object
directory, what about
>>>>> for contributors that contribute to the smaller runtimes
projects but *not*
>>>>> to LLVM or Clang. A contributor that only contributes to
libcxx or
>>>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>>>> order to push a patch kinda sucks.
>>>>> 
>>>>> I want to point out a few workflows people may not be
considering.
>>>>> 
>>>>> Clang can be built against an installed LLVM. I know this
workflow is used
>>>>> by some people because I’ve broken it in the past and had
to fix it. With a
>>>>> mono-repo this workflow gets a bit more complicated because
you’d need to do
>>>>> sparse checkouts, and it probably means we should just nuke
the workflow
>>>>> entirely because there is no real value added by having it.
>>>>> 
>>>>> Compiler-RT’s sanitizers are used with GCC; no LLVM
required. While for the
>>>>> common use case maintaining sparse repository mirrors would
limit impact of
>>>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>>>> forcing them to clone a much larger repository than
necessary.
>>>>> 
>>>>> The same problem with Compiler-RT’s sanitizers also applies
to libcxx,
>>>>> libcxxabi, libunwind, and potentially any other runtime
library projects
>>>>> that we may create in the future.
>>>>> 
>>>>> Beyond all that I want to point out that the git
multi-repository story is
>>>>> basically the same thing we have today with SVN except for
the absence of a
>>>>> monotonically increasing number that corresponds across
repositories. While
>>>>> admittedly you do get a linear history with using the
mono-repository, that
>>>>> isn’t the only way to solve the problem, and I don’t really
think that the
>>>>> benefit (not needing to write some tooling) justifies the
increased burden
>>>>> applied to contributors that don’t use the full LLVM family
of projects.
>>>>> 
>>>>> I think we have some pretty strong evidence in the form of
the github fork
>>>>> counts (https://github.com/llvm-mirror/
<https://github.com/llvm-mirror/>) that most people aren’t using all
>>>>> of the LLVM projects. In fact, by that evidence Clang (the
second most
>>>>> popular project) is forked less than 2/3 as many times as
LLVM.
>>>>> 
>>>>> -Chris
>>>>> 
>>>>> 
>>>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>> 
>>>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>> 
>>>>> Even if it were possible, I would still keep my upstream
checkout
>>>>> separate just as a safety measure, to keep from sending
private stuff
>>>>> upstream by accident.
>>>>> 
>>>>> 
>>>>> Just FYI, this is our (Azul's) workflow as well, and
for similar
>>>>> reasons.
>>>>> 
>>>>> 
>>>>> Same here.
>>>>> 
>>>>> cheers,
>>>>> --renato
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/1562308e/attachment.html>

Chris Bieneman via llvm-dev

2016-Aug-09 18:22 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> 
>> 
>> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com
<mailto:beanz at apple.com>> wrote:
>> 
>> 
>> 
>> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
>> 
>>> 
>>>> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>>> 
>>>>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com <mailto:jlebar at google.com>> wrote:
>>>>> 
>>>>> Thanks for your thoughts, Chris.
>>>>> 
>>>>>> As supporting evidence of this, I was discussing this
thread yesterday around the office yesterday and had quite a few people
responding something along the lines of “they’re proposing what?”.
>>>>> 
>>>>> I hope they'll join us in this thread.
>>>>> 
>>>>> Ultimately a survey is going to be strongly biased in favor
of "don't
>>>>> change anything".  There is a strong psychological
bias to weight
>>>>> losses more than gains, so if one doesn't engage with
the issue, it's
>>>>> only natural to conclude "keep it as similar as
possible to what it is
>>>>> today -- that is safe."  But that line of thinking
does not
>>>>> necessarily lead us to the best outcome.
>>>> 
>>>> I don’t agree with this assertion. I believe that if you put
forth multiple proposals, and have an articulate discussion of the merits and
costs of each solution you can create a survey that can help inform decision
making. I suppose we can agree to disagree.
>>>> 
>>>>> 
>>>>> We've heard in thread from a lot of developers about
how a monorepo
>>>>> would improve their workflow.  I would love to hear from
some
>>>>> developers who are actually affected in the way you
describe, rather
>>>>> than just considering the hypothetical.
>>>>> 
>>>>> My expectation is that the effect of the monorepo on said
developers
>>>>> would be relatively small -- we're talking about 1gb of
disk space.  I
>>>>> understand that there's a "yuck" factor to
this, but inasmuch as there
>>>>> aren't other concrete effects, this is just change
aversion.  And
>>>>> essentially all of the other effects of the monorepo can be
hidden via
>>>>> sparse checkouts, as we've discussed.
>>>>> 
>>>>> Maybe I am wrong.  But I don't think we're going to
get to the bottom
>>>>> of it without actually engaging with people who are
actually affected
>>>>> in the way you posit.
>>>> 
>>>> Ok, let me describe a few workflows I’ve used in the last year
that are (in my mind) adversely impacted by a mono-repo.
>>>> 
>>>> Case Study 1 - Simple development on a sub-project
>>>> 
>>>> I build LLVM + Clang + Compiler-RT using the just-built Clang
to build Compiler-RT. I iterate on some complicated Compiler-RT changes over a
period of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
repo, rebuild compiler-rt then commit.
>>>> 
>>>> With a mono-repo rebasing the checkout means rebasing the whole
tree. So, either I have to wrangle some crazy git or CMake foo, or when I run
“ninja compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
kinda sucks.
>>>> 
>>>> What this example illustrates to me is that today we have
loosely coupled projects with an occasional rev lock. Moving to a mono-repo
enforces a tight coupling that isn’t strictly required today.
>>>> 
>>>> Case Study 2 - Working on a sub-project in isolation across
many platforms
>>>> 
>>>> I did a lot of work on Compiler-RT last year that had no direct
dependency on any other LLVM project. During the development I was working with
a Compiler-RT checkout and a build directory of just Compiler-RT. Every once in
a while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
>>>> 
>>>> (1) Spin up a VM on a VPS that closely matched the
configuration I broke
>>>> (2) Checkout Compiler-RT
>>>> (3) Reproduce, debug, fix the failure
>>>> (4) Commit the patch from the VM
>>>> 
>>>> In a mono-repository doing this would require checking out
*all* sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
>>> 
>>> So for the “I spin a VM and want to make a commit but don’t want to
download a few hundred MBs with a git clone” story, it turns out that the github
bridge with SVN helps to optimize with a “lean” checkout:
>>> 
>>> I fork the unified repo here:
https://github.com/joker-eph/llvm-project/commits/master
<https://github.com/joker-eph/llvm-project/commits/master> and then:  svn
co https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>
>>> 
>>> So that’s a net “no regression” compared to the current state :)
>> 
>> Is the github SVN interface's "co" magically as fast as a
git clone?
> 
> $ time svn co  https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>
> ….
> real	0m8.539s	user	0m0.919s 	sys	0m1.917s
> $ time git clone https://github.com/joker-eph/compiler-rt.git
<https://github.com/joker-eph/compiler-rt.git>
> real	0m5.487s	user	0m1.208s	sys	0m0.825s
That’s actually not terrible! Color me impressed.
> 
> 
>> If not, it is a performance regression because today I use git clone
and git-svn on my VMs just like on my physical machines, and either way it adds
some crazy complexity.
> 
> No problem, I get it, exactly same workflow as today:
Yep. Which isn’t bad. I do however have two concerns.

(1) What happens if we move to pull request-based workflows? Do we still support
this workflow?
(2) If I’m stuck using git-svn I kinda feel like there is no real point in
changing anything. I dislike this workflow less than the earlier proposals, but
I see no reason to move to this instead of staying on SVN (other than the
hosting issues which could be solved in other ways).

-Chris
> 
> # Clone from the single read-only git repo
> $ git clone https://github.com/joker-eph/compiler-rt.git
<https://github.com/joker-eph/compiler-rt.git>
> …
> # Configure the SVN remote and initialize the svn metadata
> $ cd compiler-rt
> $ git svn init https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt>
—username> $ git config svn-remote.svn.fetch :refs/remotes/origin/master
> $ git svn rebase -l
> ...
> # Remove and empty file and commit with git
> $ git rm empty
> $ git commit -m "remove empty file"
> # commit/push with svn to the unified git repo
> $ git svn dcommit
> Committing to https://github.com/joker-eph/llvm-project/trunk/compiler-rt
<https://github.com/joker-eph/llvm-project/trunk/compiler-rt> ...
> 	D	empty
> Committed r354148
> 
>  
> Here is the commit:
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
<https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe>
> 
> 
> — 
> Mehdi
> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>>> 
>>>>>> While admittedly you do get a linear history with using
the mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>>>>> 
>>>>> I think the trade-off you're considering here (cost to
developers who
>>>>> use llvm plus a version-locked subrepo vs. cost to
developers who
>>>>> don't want an llvm clone) is the right one.  
>>>> 
>>>> I actually think there are *a lot* more considerations we need
to be making for an infrastructure change like this. While it is true that our
SCM hosting strategy primarily impacts developers, it also impacts our users. We
should be conscious of the impact to downstream users in making infrastructure
changes like this. That is part of why the idea of a survey holds appeal to me;
it would give us the opportunity to get feedback from a much wider audience than
the current “people on llvm-dev who haven’t been scared away”.
>>>> 
>>>>> But as someone who has
>>>>> extensively used git submodules and repo (a wrapper
script), I
>>>>> strongly disagree with the judgement that a monorepo would
not be a
>>>>> significant improvement.
>>>>> 
>>>>> Our primary disagreement, I think, is over how much cost
there is to
>>>>> "writing some tooling".  To me, this is a
significant barrier standing
>>>>> in the way of developer productivity.  Here at Google I did
a quick
>>>>> survey, and more than half of us don't have scripts of
the sort that
>>>>> Justin Bogner described.  We are all just floundering
around rebasing
>>>>> clang and llvm until it compiles.  It *sucks*.
>>>> 
>>>> I actually think we’re both talking about solutions that
require tooling, and while we *could* be disagreeing over how much effort each
tooling initiative would require (I think they’re pretty close, so I don’t care
to have that argument), my actual disagreement with your proposal is that it is
a change that impacts developers and users universally and I don’t think that it
is justified. Simply put, I don’t feel that the benefits are substantial enough
to warrant the kind of disruptive change you’re proposing.
>>>> 
>>>>> 
>>>>> I suggest that saying that all of these developers are
"doing it
>>>>> wrong" is not helpful.
>>>> 
>>>> Maybe I’m missing something, but I don’t think I said anyone
was “doing it wrong”. Bisecting across multiple git repositories isn’t a great
experience. But neither is bisecting across a half dozen separate folders in an
SVN repository. Both the submodule solution and the mono-repo solution solve
this problem equivalently well.
>>>> 
>>>>> Not everyone has the git and python/bash chops
>>>>> to write the necessary scripts.  Not everyone has the
personality to
>>>>> obsessively script around stuff, or the desire to maintain
said
>>>>> scripts.  Not everyone works on llvm/clang so much that
it's worth
>>>>> adopting a special-snowflake workflow.  And some of us --
myself
>>>>> included -- have extensive git scripts which work with the
standard
>>>>> git workflow but would be completely broken by adding a
custom level
>>>>> of indirection around git.
>>>>> 
>>>>> When put this way, maybe it's clear that it's
actually a niche set of
>>>>> people for whom "script around the brokenness" is
a good solution.
>>>> 
>>>> I’m not sure what “brokenness” you’re referring to. We have a
collection of loosely connected projects by design. As a result of that
intentional design certain workflows will be impacted. I don’t think that is
brokenness. I think our loose coupling is a feature even if it makes some
workflows harder.
>>>> 
>>>> -Chris
>>>> 
>>>>> 
>>>>> As I've said a bunch of times above, we have to weigh a
cost paid by
>>>>> all of us every time we type a command that starts with
"git" --
>>>>> something we do tens or hundreds of times a day -- versus
the one-time
>>>>> cost of asking people to download 1gb of data.
>>>>> 
>>>>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via
llvm-dev
>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>>> I’m just now catching up on this massive thread after
being on vacation last
>>>>>> week, and I have a few thoughts I’d like to share.
>>>>>> 
>>>>>> First and foremost please don’t consider lack of
dissent on the thread as
>>>>>> presence of consensus. The various git-related threads
on LLVM-dev lately
>>>>>> have been so active and contentious that I think a lot
of people are zoning
>>>>>> out on the conversations. As supporting evidence of
this, I was discussing
>>>>>> this thread yesterday around the office yesterday and
had quite a few people
>>>>>> responding something along the lines of “they’re
proposing what?”.
>>>>>> 
>>>>>> I think it would be great for us to have several
different proposals for how
>>>>>> the git-transition could work, and have a survey to get
people’s opinions. I
>>>>>> know this has been discussed repeatedly, and I want to
put in my vote in
>>>>>> favor of having a survey that takes into account
multiple different
>>>>>> approaches.
>>>>>> 
>>>>>> WRT the actual proposal in this thread, I’m strongly
opposed to a
>>>>>> mono-repository. While I understand the argument that
the full clone’s cost
>>>>>> on disk space is minimal compared to an LLVM object
directory, what about
>>>>>> for contributors that contribute to the smaller
runtimes projects but *not*
>>>>>> to LLVM or Clang. A contributor that only contributes
to libcxx or
>>>>>> compiler-rt being forced to do a full clone of all the
LLVM projects in
>>>>>> order to push a patch kinda sucks.
>>>>>> 
>>>>>> I want to point out a few workflows people may not be
considering.
>>>>>> 
>>>>>> Clang can be built against an installed LLVM. I know
this workflow is used
>>>>>> by some people because I’ve broken it in the past and
had to fix it. With a
>>>>>> mono-repo this workflow gets a bit more complicated
because you’d need to do
>>>>>> sparse checkouts, and it probably means we should just
nuke the workflow
>>>>>> entirely because there is no real value added by having
it.
>>>>>> 
>>>>>> Compiler-RT’s sanitizers are used with GCC; no LLVM
required. While for the
>>>>>> common use case maintaining sparse repository mirrors
would limit impact of
>>>>>> this on users, should any GCC user want to contribute
to Compiler-RT, you’re
>>>>>> forcing them to clone a much larger repository than
necessary.
>>>>>> 
>>>>>> The same problem with Compiler-RT’s sanitizers also
applies to libcxx,
>>>>>> libcxxabi, libunwind, and potentially any other runtime
library projects
>>>>>> that we may create in the future.
>>>>>> 
>>>>>> Beyond all that I want to point out that the git
multi-repository story is
>>>>>> basically the same thing we have today with SVN except
for the absence of a
>>>>>> monotonically increasing number that corresponds across
repositories. While
>>>>>> admittedly you do get a linear history with using the
mono-repository, that
>>>>>> isn’t the only way to solve the problem, and I don’t
really think that the
>>>>>> benefit (not needing to write some tooling) justifies
the increased burden
>>>>>> applied to contributors that don’t use the full LLVM
family of projects.
>>>>>> 
>>>>>> I think we have some pretty strong evidence in the form
of the github fork
>>>>>> counts (https://github.com/llvm-mirror/
<https://github.com/llvm-mirror/>) that most people aren’t using all
>>>>>> of the LLVM projects. In fact, by that evidence Clang
(the second most
>>>>>> popular project) is forked less than 2/3 as many times
as LLVM.
>>>>>> 
>>>>>> -Chris
>>>>>> 
>>>>>> 
>>>>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>>> 
>>>>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>>> 
>>>>>> Even if it were possible, I would still keep my
upstream checkout
>>>>>> separate just as a safety measure, to keep from sending
private stuff
>>>>>> upstream by accident.
>>>>>> 
>>>>>> 
>>>>>> Just FYI, this is our (Azul's) workflow as well,
and for similar
>>>>>> reasons.
>>>>>> 
>>>>>> 
>>>>>> Same here.
>>>>>> 
>>>>>> cheers,
>>>>>> --renato
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/588cf4a4/attachment-0001.html>

Justin Lebar via llvm-dev

2016-Aug-09 18:27 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> (2) If I’m stuck using git-svn I kinda feel like there is no real point in
changing anything.
No real point *for you specifically*.

But the vast majority of people would not be stuck using git-svn.  And
in addition the LLVM project would not be stuck using svn, with all
the baggage, hosting issues, workflow issues (for people other than
you), etc.

The bar by which this proposal should be measured is not "is it a net
gain for beanz?"  :)  I think we'd be thrilled with a "meh"
from your
corner.

On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at apple.com>
wrote:>
> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
>
> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com>
wrote:
>
>
>
> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
>
> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
>
> Thanks for your thoughts, Chris.
>
> As supporting evidence of this, I was discussing this thread yesterday
> around the office yesterday and had quite a few people responding something
> along the lines of “they’re proposing what?”.
>
>
> I hope they'll join us in this thread.
>
> Ultimately a survey is going to be strongly biased in favor of
"don't
> change anything".  There is a strong psychological bias to weight
> losses more than gains, so if one doesn't engage with the issue,
it's
> only natural to conclude "keep it as similar as possible to what it is
> today -- that is safe."  But that line of thinking does not
> necessarily lead us to the best outcome.
>
>
> I don’t agree with this assertion. I believe that if you put forth multiple
> proposals, and have an articulate discussion of the merits and costs of
each
> solution you can create a survey that can help inform decision making. I
> suppose we can agree to disagree.
>
>
> We've heard in thread from a lot of developers about how a monorepo
> would improve their workflow.  I would love to hear from some
> developers who are actually affected in the way you describe, rather
> than just considering the hypothetical.
>
> My expectation is that the effect of the monorepo on said developers
> would be relatively small -- we're talking about 1gb of disk space.  I
> understand that there's a "yuck" factor to this, but inasmuch
as there
> aren't other concrete effects, this is just change aversion.  And
> essentially all of the other effects of the monorepo can be hidden via
> sparse checkouts, as we've discussed.
>
> Maybe I am wrong.  But I don't think we're going to get to the
bottom
> of it without actually engaging with people who are actually affected
> in the way you posit.
>
>
> Ok, let me describe a few workflows I’ve used in the last year that are (in
> my mind) adversely impacted by a mono-repo.
>
> Case Study 1 - Simple development on a sub-project
>
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
> Compiler-RT. I iterate on some complicated Compiler-RT changes over a
period
> of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
> repo, rebuild compiler-rt then commit.
>
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
> either I have to wrangle some crazy git or CMake foo, or when I run “ninja
> compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
kinda
> sucks.
>
> What this example illustrates to me is that today we have loosely coupled
> projects with an occasional rev lock. Moving to a mono-repo enforces a
tight
> coupling that isn’t strictly required today.
>
> Case Study 2 - Working on a sub-project in isolation across many platforms
>
> I did a lot of work on Compiler-RT last year that had no direct dependency
> on any other LLVM project. During the development I was working with a
> Compiler-RT checkout and a build directory of just Compiler-RT. Every once
> in a while (or every other day as it were) I would make a change that would
> break a configuration that I wasn’t directly developing on. My workflow for
> handling those cases was:
>
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
>
> In a mono-repository doing this would require checking out *all*
> sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
> workflow, but it is one I use that would be adversely impacted by needing
to
> checkout a full LLVM. Now, you might say I could check out the sub-project
> mirror, but then I can’t commit from the VM, which kinda sucks.
>
>
> So for the “I spin a VM and want to make a commit but don’t want to
download
> a few hundred MBs with a git clone” story, it turns out that the github
> bridge with SVN helps to optimize with a “lean” checkout:
>
> I fork the unified repo here:
> https://github.com/joker-eph/llvm-project/commits/master and then:  svn co
> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>
> So that’s a net “no regression” compared to the current state :)
>
>
> Is the github SVN interface's "co" magically as fast as a git
clone?
>
>
> $ time svn co  https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> ….
> real 0m8.539s user 0m0.919s  sys 0m1.917s
> $ time git clone https://github.com/joker-eph/compiler-rt.git
> real 0m5.487s user 0m1.208s sys 0m0.825s
>
>
> That’s actually not terrible! Color me impressed.
>
>
>
> If not, it is a performance regression because today I use git clone and
> git-svn on my VMs just like on my physical machines, and either way it adds
> some crazy complexity.
>
>
> No problem, I get it, exactly same workflow as today:
>
>
> Yep. Which isn’t bad. I do however have two concerns.
>
> (1) What happens if we move to pull request-based workflows? Do we still
> support this workflow?
> (2) If I’m stuck using git-svn I kinda feel like there is no real point in
> changing anything. I dislike this workflow less than the earlier proposals,
> but I see no reason to move to this instead of staying on SVN (other than
> the hosting issues which could be solved in other ways).
>
> -Chris
>
>
> # Clone from the single read-only git repo
> $ git clone https://github.com/joker-eph/compiler-rt.git
> …
> # Configure the SVN remote and initialize the svn metadata
> $ cd compiler-rt
> $ git svn init https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> —username> $ git config svn-remote.svn.fetch :refs/remotes/origin/master
> $ git svn rebase -l
> ...
> # Remove and empty file and commit with git
> $ git rm empty
> $ git commit -m "remove empty file"
> # commit/push with svn to the unified git repo
> $ git svn dcommit
> Committing to https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> ...
> D empty
> Committed r354148
>
>
> Here is the commit:
>
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>
>
> —
> Mehdi
>
>
>
>
>
>
>
>
>
> While admittedly you do get a linear history with using the
mono-repository,
> that isn’t the only way to solve the problem, and I don’t really think that
> the benefit (not needing to write some tooling) justifies the increased
> burden applied to contributors that don’t use the full LLVM family of
> projects.
>
>
> I think the trade-off you're considering here (cost to developers who
> use llvm plus a version-locked subrepo vs. cost to developers who
> don't want an llvm clone) is the right one.
>
>
> I actually think there are *a lot* more considerations we need to be making
> for an infrastructure change like this. While it is true that our SCM
> hosting strategy primarily impacts developers, it also impacts our users.
We
> should be conscious of the impact to downstream users in making
> infrastructure changes like this. That is part of why the idea of a survey
> holds appeal to me; it would give us the opportunity to get feedback from a
> much wider audience than the current “people on llvm-dev who haven’t been
> scared away”.
>
> But as someone who has
> extensively used git submodules and repo (a wrapper script), I
> strongly disagree with the judgement that a monorepo would not be a
> significant improvement.
>
> Our primary disagreement, I think, is over how much cost there is to
> "writing some tooling".  To me, this is a significant barrier
standing
> in the way of developer productivity.  Here at Google I did a quick
> survey, and more than half of us don't have scripts of the sort that
> Justin Bogner described.  We are all just floundering around rebasing
> clang and llvm until it compiles.  It *sucks*.
>
>
> I actually think we’re both talking about solutions that require tooling,
> and while we *could* be disagreeing over how much effort each tooling
> initiative would require (I think they’re pretty close, so I don’t care to
> have that argument), my actual disagreement with your proposal is that it
is
> a change that impacts developers and users universally and I don’t think
> that it is justified. Simply put, I don’t feel that the benefits are
> substantial enough to warrant the kind of disruptive change you’re
> proposing.
>
>
> I suggest that saying that all of these developers are "doing it
> wrong" is not helpful.
>
>
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
> wrong”. Bisecting across multiple git repositories isn’t a great
experience.
> But neither is bisecting across a half dozen separate folders in an SVN
> repository. Both the submodule solution and the mono-repo solution solve
> this problem equivalently well.
>
> Not everyone has the git and python/bash chops
> to write the necessary scripts.  Not everyone has the personality to
> obsessively script around stuff, or the desire to maintain said
> scripts.  Not everyone works on llvm/clang so much that it's worth
> adopting a special-snowflake workflow.  And some of us -- myself
> included -- have extensive git scripts which work with the standard
> git workflow but would be completely broken by adding a custom level
> of indirection around git.
>
> When put this way, maybe it's clear that it's actually a niche set
of
> people for whom "script around the brokenness" is a good
solution.
>
>
> I’m not sure what “brokenness” you’re referring to. We have a collection of
> loosely connected projects by design. As a result of that intentional
design
> certain workflows will be impacted. I don’t think that is brokenness. I
> think our loose coupling is a feature even if it makes some workflows
> harder.
>
> -Chris
>
>
> As I've said a bunch of times above, we have to weigh a cost paid by
> all of us every time we type a command that starts with "git" --
> something we do tens or hundreds of times a day -- versus the one-time
> cost of asking people to download 1gb of data.
>
> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> I’m just now catching up on this massive thread after being on vacation
last
> week, and I have a few thoughts I’d like to share.
>
> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus. The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations. As supporting evidence of this, I was discussing
> this thread yesterday around the office yesterday and had quite a few
people
> responding something along the lines of “they’re proposing what?”.
>
> I think it would be great for us to have several different proposals for
how
> the git-transition could work, and have a survey to get people’s opinions.
I
> know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
>
> WRT the actual proposal in this thread, I’m strongly opposed to a
> mono-repository. While I understand the argument that the full clone’s cost
> on disk space is minimal compared to an LLVM object directory, what about
> for contributors that contribute to the smaller runtimes projects but *not*
> to LLVM or Clang. A contributor that only contributes to libcxx or
> compiler-rt being forced to do a full clone of all the LLVM projects in
> order to push a patch kinda sucks.
>
> I want to point out a few workflows people may not be considering.
>
> Clang can be built against an installed LLVM. I know this workflow is used
> by some people because I’ve broken it in the past and had to fix it. With a
> mono-repo this workflow gets a bit more complicated because you’d need to
do
> sparse checkouts, and it probably means we should just nuke the workflow
> entirely because there is no real value added by having it.
>
> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
> common use case maintaining sparse repository mirrors would limit impact of
> this on users, should any GCC user want to contribute to Compiler-RT,
you’re
> forcing them to clone a much larger repository than necessary.
>
> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
> libcxxabi, libunwind, and potentially any other runtime library projects
> that we may create in the future.
>
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
> I think we have some pretty strong evidence in the form of the github fork
> counts (https://github.com/llvm-mirror/) that most people aren’t using all
> of the LLVM projects. In fact, by that evidence Clang (the second most
> popular project) is forked less than 2/3 as many times as LLVM.
>
> -Chris
>
>
> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Even if it were possible, I would still keep my upstream checkout
> separate just as a safety measure, to keep from sending private stuff
> upstream by accident.
>
>
> Just FYI, this is our (Azul's) workflow as well, and for similar
> reasons.
>
>
> Same here.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

Chris Bieneman via llvm-dev

2016-Aug-09 20:13 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 9, 2016, at 11:27 AM, Justin Lebar <jlebar at google.com>
wrote:
> 
>> (2) If I’m stuck using git-svn I kinda feel like there is no real point
in changing anything.
> 
> No real point *for you specifically*.
> 
> But the vast majority of people would not be stuck using git-svn.
Maybe. Maybe some people prefer to keep existing workflows in tact. We have no
idea how many people will fall on which end of this. Saying things like “vast
majority” imply you are in possession of some empirical data, which I don’t
believe is the case. Please correct me if I’m wrong here.
>  And
> in addition the LLVM project would not be stuck using svn,
Playing devil’s advocate, some people would consider being “stuck” using svn a
good thing. Not I, but some people do like SVN.
> with all
> the baggage, hosting issues, workflow issues (for people other than
There are other solutions to the hosting issues, even many workflow issues could
be solved in SVN.
> you), etc.
> 
> The bar by which this proposal should be measured is not "is it a net
> gain for beanz?"  :)  I think we'd be thrilled with a
"meh" from your
> corner.
I apologize if you think I’m insinuating that your proposal be measured by
whether or not I like it. I’ve previously suggested that governance by loud
voices is not desirable (and I think I’m demonstrating my ability to be loud
here). I’m trying to voice my opinions so that the conversation continues to
evolve.

-Chris
> 
> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at apple.com>
wrote:
>> 
>> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> 
>> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com>
wrote:
>> 
>> 
>> 
>> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> 
>> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
>> 
>> Thanks for your thoughts, Chris.
>> 
>> As supporting evidence of this, I was discussing this thread yesterday
>> around the office yesterday and had quite a few people responding
something
>> along the lines of “they’re proposing what?”.
>> 
>> 
>> I hope they'll join us in this thread.
>> 
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
>> 
>> 
>> I don’t agree with this assertion. I believe that if you put forth
multiple
>> proposals, and have an articulate discussion of the merits and costs of
each
>> solution you can create a survey that can help inform decision making.
I
>> suppose we can agree to disagree.
>> 
>> 
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>> 
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>> 
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
>> 
>> 
>> Ok, let me describe a few workflows I’ve used in the last year that are
(in
>> my mind) adversely impacted by a mono-repo.
>> 
>> Case Study 1 - Simple development on a sub-project
>> 
>> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
>> Compiler-RT. I iterate on some complicated Compiler-RT changes over a
period
>> of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
>> repo, rebuild compiler-rt then commit.
>> 
>> With a mono-repo rebasing the checkout means rebasing the whole tree.
So,
>> either I have to wrangle some crazy git or CMake foo, or when I run
“ninja
>> compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
kinda
>> sucks.
>> 
>> What this example illustrates to me is that today we have loosely
coupled
>> projects with an occasional rev lock. Moving to a mono-repo enforces a
tight
>> coupling that isn’t strictly required today.
>> 
>> Case Study 2 - Working on a sub-project in isolation across many
platforms
>> 
>> I did a lot of work on Compiler-RT last year that had no direct
dependency
>> on any other LLVM project. During the development I was working with a
>> Compiler-RT checkout and a build directory of just Compiler-RT. Every
once
>> in a while (or every other day as it were) I would make a change that
would
>> break a configuration that I wasn’t directly developing on. My workflow
for
>> handling those cases was:
>> 
>> (1) Spin up a VM on a VPS that closely matched the configuration I
broke
>> (2) Checkout Compiler-RT
>> (3) Reproduce, debug, fix the failure
>> (4) Commit the patch from the VM
>> 
>> In a mono-repository doing this would require checking out *all*
>> sub-projects, not just Compiler-RT. I imagine this probably isn’t a
common
>> workflow, but it is one I use that would be adversely impacted by
needing to
>> checkout a full LLVM. Now, you might say I could check out the
sub-project
>> mirror, but then I can’t commit from the VM, which kinda sucks.
>> 
>> 
>> So for the “I spin a VM and want to make a commit but don’t want to
download
>> a few hundred MBs with a git clone” story, it turns out that the github
>> bridge with SVN helps to optimize with a “lean” checkout:
>> 
>> I fork the unified repo here:
>> https://github.com/joker-eph/llvm-project/commits/master and then:  svn
co
>> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> 
>> So that’s a net “no regression” compared to the current state :)
>> 
>> 
>> Is the github SVN interface's "co" magically as fast as a
git clone?
>> 
>> 
>> $ time svn co 
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> ….
>> real 0m8.539s user 0m0.919s  sys 0m1.917s
>> $ time git clone https://github.com/joker-eph/compiler-rt.git
>> real 0m5.487s user 0m1.208s sys 0m0.825s
>> 
>> 
>> That’s actually not terrible! Color me impressed.
>> 
>> 
>> 
>> If not, it is a performance regression because today I use git clone
and
>> git-svn on my VMs just like on my physical machines, and either way it
adds
>> some crazy complexity.
>> 
>> 
>> No problem, I get it, exactly same workflow as today:
>> 
>> 
>> Yep. Which isn’t bad. I do however have two concerns.
>> 
>> (1) What happens if we move to pull request-based workflows? Do we
still
>> support this workflow?
>> (2) If I’m stuck using git-svn I kinda feel like there is no real point
in
>> changing anything. I dislike this workflow less than the earlier
proposals,
>> but I see no reason to move to this instead of staying on SVN (other
than
>> the hosting issues which could be solved in other ways).
>> 
>> -Chris
>> 
>> 
>> # Clone from the single read-only git repo
>> $ git clone https://github.com/joker-eph/compiler-rt.git
>> …
>> # Configure the SVN remote and initialize the svn metadata
>> $ cd compiler-rt
>> $ git svn init
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> —username>> $ git config svn-remote.svn.fetch
:refs/remotes/origin/master
>> $ git svn rebase -l
>> ...
>> # Remove and empty file and commit with git
>> $ git rm empty
>> $ git commit -m "remove empty file"
>> # commit/push with svn to the unified git repo
>> $ git svn dcommit
>> Committing to
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> ...
>> D empty
>> Committed r354148
>> 
>> 
>> Here is the commit:
>>
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>> 
>> 
>> —
>> Mehdi
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> While admittedly you do get a linear history with using the
mono-repository,
>> that isn’t the only way to solve the problem, and I don’t really think
that
>> the benefit (not needing to write some tooling) justifies the increased
>> burden applied to contributors that don’t use the full LLVM family of
>> projects.
>> 
>> 
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.
>> 
>> 
>> I actually think there are *a lot* more considerations we need to be
making
>> for an infrastructure change like this. While it is true that our SCM
>> hosting strategy primarily impacts developers, it also impacts our
users. We
>> should be conscious of the impact to downstream users in making
>> infrastructure changes like this. That is part of why the idea of a
survey
>> holds appeal to me; it would give us the opportunity to get feedback
from a
>> much wider audience than the current “people on llvm-dev who haven’t
been
>> scared away”.
>> 
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>> 
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
>> 
>> 
>> I actually think we’re both talking about solutions that require
tooling,
>> and while we *could* be disagreeing over how much effort each tooling
>> initiative would require (I think they’re pretty close, so I don’t care
to
>> have that argument), my actual disagreement with your proposal is that
it is
>> a change that impacts developers and users universally and I don’t
think
>> that it is justified. Simply put, I don’t feel that the benefits are
>> substantial enough to warrant the kind of disruptive change you’re
>> proposing.
>> 
>> 
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
>> 
>> 
>> Maybe I’m missing something, but I don’t think I said anyone was “doing
it
>> wrong”. Bisecting across multiple git repositories isn’t a great
experience.
>> But neither is bisecting across a half dozen separate folders in an SVN
>> repository. Both the submodule solution and the mono-repo solution
solve
>> this problem equivalently well.
>> 
>> Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>> 
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
>> 
>> 
>> I’m not sure what “brokenness” you’re referring to. We have a
collection of
>> loosely connected projects by design. As a result of that intentional
design
>> certain workflows will be impacted. I don’t think that is brokenness. I
>> think our loose coupling is a feature even if it makes some workflows
>> harder.
>> 
>> -Chris
>> 
>> 
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>> 
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> I’m just now catching up on this massive thread after being on vacation
last
>> week, and I have a few thoughts I’d like to share.
>> 
>> First and foremost please don’t consider lack of dissent on the thread
as
>> presence of consensus. The various git-related threads on LLVM-dev
lately
>> have been so active and contentious that I think a lot of people are
zoning
>> out on the conversations. As supporting evidence of this, I was
discussing
>> this thread yesterday around the office yesterday and had quite a few
people
>> responding something along the lines of “they’re proposing what?”.
>> 
>> I think it would be great for us to have several different proposals
for how
>> the git-transition could work, and have a survey to get people’s
opinions. I
>> know this has been discussed repeatedly, and I want to put in my vote
in
>> favor of having a survey that takes into account multiple different
>> approaches.
>> 
>> WRT the actual proposal in this thread, I’m strongly opposed to a
>> mono-repository. While I understand the argument that the full clone’s
cost
>> on disk space is minimal compared to an LLVM object directory, what
about
>> for contributors that contribute to the smaller runtimes projects but
*not*
>> to LLVM or Clang. A contributor that only contributes to libcxx or
>> compiler-rt being forced to do a full clone of all the LLVM projects in
>> order to push a patch kinda sucks.
>> 
>> I want to point out a few workflows people may not be considering.
>> 
>> Clang can be built against an installed LLVM. I know this workflow is
used
>> by some people because I’ve broken it in the past and had to fix it.
With a
>> mono-repo this workflow gets a bit more complicated because you’d need
to do
>> sparse checkouts, and it probably means we should just nuke the
workflow
>> entirely because there is no real value added by having it.
>> 
>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for
the
>> common use case maintaining sparse repository mirrors would limit
impact of
>> this on users, should any GCC user want to contribute to Compiler-RT,
you’re
>> forcing them to clone a much larger repository than necessary.
>> 
>> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
>> libcxxabi, libunwind, and potentially any other runtime library
projects
>> that we may create in the future.
>> 
>> Beyond all that I want to point out that the git multi-repository story
is
>> basically the same thing we have today with SVN except for the absence
of a
>> monotonically increasing number that corresponds across repositories.
While
>> admittedly you do get a linear history with using the mono-repository,
that
>> isn’t the only way to solve the problem, and I don’t really think that
the
>> benefit (not needing to write some tooling) justifies the increased
burden
>> applied to contributors that don’t use the full LLVM family of
projects.
>> 
>> I think we have some pretty strong evidence in the form of the github
fork
>> counts (https://github.com/llvm-mirror/) that most people aren’t using
all
>> of the LLVM projects. In fact, by that evidence Clang (the second most
>> popular project) is forked less than 2/3 as many times as LLVM.
>> 
>> -Chris
>> 
>> 
>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> Even if it were possible, I would still keep my upstream checkout
>> separate just as a safety measure, to keep from sending private stuff
>> upstream by accident.
>> 
>> 
>> Just FYI, this is our (Azul's) workflow as well, and for similar
>> reasons.
>> 
>> 
>> Same here.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>>

Chris Bieneman via llvm-dev

2016-Aug-09 23:32 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Can we please stop with the attempts at persuading me to a mono-repo approach? I
really hate that my voicing criticism has resulted in a big “let’s sway beanz”
effort.

Justin, the reality is I don’t think the benefits of a mono-repo justify the
costs if it includes all projects. I think the cost of a mono-repo which
excludes the runtime projects is lower, so I dislike it less. I can even see
(and agree with) some arguments in favor of LLVM, Clang and LLD being in the
same repository, but I don’t see it as solving a problem that needs to be
solved.

Mehdi’s performance data and git-svn workflow does *nothing* to win me over to
your argument. All it says is that if we do your proposal I *might* be able to
keep the same git-svn workflows I use today. I say “might” because nobody has
addressed my original concerns about whether or not that workflow would be
dropped if we move to a PR based model, or how we would support something
similar in a PR model. I also think that the mono-repo might discourage pull
requests to the runtimes projects from users that don’t use clang, which
concerns me. Either way Mehdi’s information isn’t going to get me to support
your idea over the other proposal which offers me actual workflow improvements.

It isn’t even going to get me to a “meh”. While I don’t think the proposal
should be valued based on whether or not it gives me specifically benefit, if it
provides no benefit to me maintaining the status quo is better than a change
from my perspective.

All that aside, I really don’t think anyone should be investing that much time
trying to appease me. While I am a loud voice, and I’m flattered that people
seem to want to make me change my opinions they are in fact just opinions and
preferences.

At the end of the day we need some empirical data to drive this decision, and I
don’t consider the anecdotes on these threads to be that data.  While it is
useful to hear the first hand opinions of people, it would be great if we had
some actual data. Even with quantifiable data it is unlikely that I would
consider the mono-repo an ideal workflow for myself, but it *could* be the right
solution for the community.  I would like to think that as a community we will
all be able to put the “greater good” above our own preferences.

-Chris

> On Aug 9, 2016, at 2:12 PM, Justin Lebar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> Sorry, I was specially replying to 'I think we'd be thrilled
with a
> "meh" from your corner.’.  I didn’t feel like that was helping
the
> conversation along.
> 
> Sorry if I offended anyone with this or sent the wrong message.  I was
> trying to say, beanz was originally a strong, categorical opponent to
> the monorepo.  After some discussion, he became not strongly opposed
> to a monorepo, so long as it didn't contain the runtime libraries.
> Now Mehdi had a proposal that I was hoping would take him to
> "not-strongly-opposed" to a monorepo that did contain the runtime
> libraries.  Given where we came from, I would be very happy with that
> outcome.
> 
> On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> On Aug 9, 2016, at 1:57 PM, Pete Cooper <peter_cooper at
apple.com> wrote:
>> 
>> 
>> On Aug 9, 2016, at 1:55 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> 
>> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> (2) If I’m stuck using git-svn I kinda feel like there is no real point
in
>> changing anything.
>> 
>> 
>> No real point *for you specifically*.
>> 
>> But the vast majority of people would not be stuck using git-svn.  And
>> in addition the LLVM project would not be stuck using svn, with all
>> the baggage, hosting issues, workflow issues (for people other than
>> you), etc.
>> 
>> The bar by which this proposal should be measured is not "is it a
net
>> gain for beanz?"  :)  I think we'd be thrilled with a
"meh" from your
>> corner.
>> 
>> Justin, I don’t think this conversation is really going anywhere.
>> 
>> 
>> I’m not sure what you’re referring to exactly, but in the context of
"this
>> thread isn’t getting anywhere”, I strongly disagree.
>> 
>> Sorry, I was specially replying to 'I think we'd be thrilled
with a "meh"
>> from your corner.’.  I didn’t feel like that was helping the
conversation
>> along.
>> 
>> 
>> OK, I agree with you then :)
>> 
>> 
>> I agree with everything else you say about actually talking about the
>> different proposals.  I hope my point is well received that we really
do
>> need to eventually describe the impact to daily workflow, once the
proposals
>> are far enough along to do so.
>> 
>> 
>> I agree with you also on this. I voiced in the past (on IRC toward
>> Justin/David probably) that the proposal should include examples of
workflow
>> and how they translate to whatever the proposal will be.
>> 
>> Cheers,
>> 
>> —
>> Mehdi
>> 
>> 
>> 
>> Pete
>> 
>> 
>> I believe that the recent workflow tests I performed (see my last
emails in
>> this thread) are proof that this thread has been productive, and I
believe
>> discussing here and hearing concerns from people (Chris and others) are
>> necessary before getting a proposal fleshed out and having a survey.
>> 
>> Having a survey without getting to the end of *what* we want to survey
about
>> is non-sense to me.
>> 
>> (That may miss your point, but your point wasn’t clear either…).
>> 
>> —
>> Mehdi
>> 
>> 
>> 
>> 
>> Renato already mentioned talking about this at the conference, and
there has
>> also been talk of a survey.  I think we need those to see how the
community
>> actually feel about the proposals here.
>> 
>> Chris may be the only vocal advocate of an alternative to your
proposal, but
>> then there are people like me who are quiet because we are waiting for
the
>> survey to appear.
>> 
>> I would have been much more vocal if I thought we were actually going
to
>> adopt the monorepo, but for now I believe it is still only a proposal.
>> 
>> Full disclosure, I don’t want a monorepo.  I think it optimizes for the
use
>> case where people want to bisect, and I don’t think its reasonable to
push
>> on everyone to have a monorepo for those who want to bisect.  The
submodules
>> repo has already been demonstrated as one potential solution to this
which
>> would allow those who want to bisect to do so, while everyone else can
>> continue to work more or less as they do today.
>> 
>> In terms of the proposals, I think you, Mehdi, Chris, and a number of
others
>> have proven that there is almost no technical solution beyond our
reach.
>> What we do have are proposals which optimize for different use cases. 
Given
>> this, I think the most useful thing from my point of view (and
hopefully to
>> others) would be for those advocating each different solution to actual
give
>> short examples of each of the different use cases and how to support
them.
>> 
>> For example:
>> 
>> Monorepo, pushing a change to compiler-rt:
>> 1: Git commit …
>> 2: Git pull --rebase
>> 3: test
>> 4 a: Git push /* no commits to any other project so the push works */. 
Goto
>> 5
>> 4 b: Git push /* someone committed to some other project in the
monorepo.
>> Goto 2 */
>> 5: Done
>> 
>> I know that this example appears negative in the case where someone
else
>> committed to another project and a rebase is required, but thats
exactly the
>> point.  This is showing that this particular scenario is potentially a
>> problem compared to today and/or other proposals.  A similar workflow
could
>> (should) be written for the sparse checkout monorepo, GitHub monorepo
with
>> svn, and submodules cases.  The submodules case will likely show that
>> bisecting is more complex than on the monorepo, while pushing is
simpler.
>> 
>> Similarly, the submodules workflow probably isn’t capable of a single
commit
>> to llvm and clang in the revlock case while the monorepo is, but we as
a
>> community need to decide whether we want to optimize for that or not. 
I
>> don’t have any data to suggest that revlock commits are
frequent/infrequent
>> or even a problem in general, and I don’t think we should optimize for
that
>> case unless its worth doing so.
>> 
>> Only by actually showing the use cases we care about can the community
make
>> an educated decision about what these proposals actually mean to our
daily
>> workflow.  We can then choose what we are optimizing for.  I personally
want
>> to have a very simple list of repo’s to clone from (or just one!) and
for
>> pushing to be easy, because those are the actions I perform the most
often.
>> Others will have different use cases they care about and they can
choose the
>> proposal which suits them best.
>> 
>> Cheers,
>> Pete
>> 
>> 
>> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at
apple.com> wrote:
>> 
>> 
>> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> 
>> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com>
wrote:
>> 
>> 
>> 
>> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> 
>> 
>> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
>> 
>> Thanks for your thoughts, Chris.
>> 
>> As supporting evidence of this, I was discussing this thread yesterday
>> around the office yesterday and had quite a few people responding
something
>> along the lines of “they’re proposing what?”.
>> 
>> 
>> I hope they'll join us in this thread.
>> 
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
>> 
>> 
>> I don’t agree with this assertion. I believe that if you put forth
multiple
>> proposals, and have an articulate discussion of the merits and costs of
each
>> solution you can create a survey that can help inform decision making.
I
>> suppose we can agree to disagree.
>> 
>> 
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>> 
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>> 
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
>> 
>> 
>> Ok, let me describe a few workflows I’ve used in the last year that are
(in
>> my mind) adversely impacted by a mono-repo.
>> 
>> Case Study 1 - Simple development on a sub-project
>> 
>> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
>> Compiler-RT. I iterate on some complicated Compiler-RT changes over a
period
>> of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
>> repo, rebuild compiler-rt then commit.
>> 
>> With a mono-repo rebasing the checkout means rebasing the whole tree.
So,
>> either I have to wrangle some crazy git or CMake foo, or when I run
“ninja
>> compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
kinda
>> sucks.
>> 
>> What this example illustrates to me is that today we have loosely
coupled
>> projects with an occasional rev lock. Moving to a mono-repo enforces a
tight
>> coupling that isn’t strictly required today.
>> 
>> Case Study 2 - Working on a sub-project in isolation across many
platforms
>> 
>> I did a lot of work on Compiler-RT last year that had no direct
dependency
>> on any other LLVM project. During the development I was working with a
>> Compiler-RT checkout and a build directory of just Compiler-RT. Every
once
>> in a while (or every other day as it were) I would make a change that
would
>> break a configuration that I wasn’t directly developing on. My workflow
for
>> handling those cases was:
>> 
>> (1) Spin up a VM on a VPS that closely matched the configuration I
broke
>> (2) Checkout Compiler-RT
>> (3) Reproduce, debug, fix the failure
>> (4) Commit the patch from the VM
>> 
>> In a mono-repository doing this would require checking out *all*
>> sub-projects, not just Compiler-RT. I imagine this probably isn’t a
common
>> workflow, but it is one I use that would be adversely impacted by
needing to
>> checkout a full LLVM. Now, you might say I could check out the
sub-project
>> mirror, but then I can’t commit from the VM, which kinda sucks.
>> 
>> 
>> So for the “I spin a VM and want to make a commit but don’t want to
download
>> a few hundred MBs with a git clone” story, it turns out that the github
>> bridge with SVN helps to optimize with a “lean” checkout:
>> 
>> I fork the unified repo here:
>> https://github.com/joker-eph/llvm-project/commits/master and then:  svn
co
>> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> 
>> So that’s a net “no regression” compared to the current state :)
>> 
>> 
>> Is the github SVN interface's "co" magically as fast as a
git clone?
>> 
>> 
>> $ time svn co 
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> ….
>> real 0m8.539s user 0m0.919s  sys 0m1.917s
>> $ time git clone https://github.com/joker-eph/compiler-rt.git
>> real 0m5.487s user 0m1.208s sys 0m0.825s
>> 
>> 
>> That’s actually not terrible! Color me impressed.
>> 
>> 
>> 
>> If not, it is a performance regression because today I use git clone
and
>> git-svn on my VMs just like on my physical machines, and either way it
adds
>> some crazy complexity.
>> 
>> 
>> No problem, I get it, exactly same workflow as today:
>> 
>> 
>> Yep. Which isn’t bad. I do however have two concerns.
>> 
>> (1) What happens if we move to pull request-based workflows? Do we
still
>> support this workflow?
>> (2) If I’m stuck using git-svn I kinda feel like there is no real point
in
>> changing anything. I dislike this workflow less than the earlier
proposals,
>> but I see no reason to move to this instead of staying on SVN (other
than
>> the hosting issues which could be solved in other ways).
>> 
>> -Chris
>> 
>> 
>> # Clone from the single read-only git repo
>> $ git clone https://github.com/joker-eph/compiler-rt.git
>> …
>> # Configure the SVN remote and initialize the svn metadata
>> $ cd compiler-rt
>> $ git svn init
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> —username>> $ git config svn-remote.svn.fetch
:refs/remotes/origin/master
>> $ git svn rebase -l
>> ...
>> # Remove and empty file and commit with git
>> $ git rm empty
>> $ git commit -m "remove empty file"
>> # commit/push with svn to the unified git repo
>> $ git svn dcommit
>> Committing to
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> ...
>> D empty
>> Committed r354148
>> 
>> 
>> Here is the commit:
>>
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>> 
>> 
>> —
>> Mehdi
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> While admittedly you do get a linear history with using the
mono-repository,
>> that isn’t the only way to solve the problem, and I don’t really think
that
>> the benefit (not needing to write some tooling) justifies the increased
>> burden applied to contributors that don’t use the full LLVM family of
>> projects.
>> 
>> 
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.
>> 
>> 
>> I actually think there are *a lot* more considerations we need to be
making
>> for an infrastructure change like this. While it is true that our SCM
>> hosting strategy primarily impacts developers, it also impacts our
users. We
>> should be conscious of the impact to downstream users in making
>> infrastructure changes like this. That is part of why the idea of a
survey
>> holds appeal to me; it would give us the opportunity to get feedback
from a
>> much wider audience than the current “people on llvm-dev who haven’t
been
>> scared away”.
>> 
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>> 
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
>> 
>> 
>> I actually think we’re both talking about solutions that require
tooling,
>> and while we *could* be disagreeing over how much effort each tooling
>> initiative would require (I think they’re pretty close, so I don’t care
to
>> have that argument), my actual disagreement with your proposal is that
it is
>> a change that impacts developers and users universally and I don’t
think
>> that it is justified. Simply put, I don’t feel that the benefits are
>> substantial enough to warrant the kind of disruptive change you’re
>> proposing.
>> 
>> 
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
>> 
>> 
>> Maybe I’m missing something, but I don’t think I said anyone was “doing
it
>> wrong”. Bisecting across multiple git repositories isn’t a great
experience.
>> But neither is bisecting across a half dozen separate folders in an SVN
>> repository. Both the submodule solution and the mono-repo solution
solve
>> this problem equivalently well.
>> 
>> Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>> 
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
>> 
>> 
>> I’m not sure what “brokenness” you’re referring to. We have a
collection of
>> loosely connected projects by design. As a result of that intentional
design
>> certain workflows will be impacted. I don’t think that is brokenness. I
>> think our loose coupling is a feature even if it makes some workflows
>> harder.
>> 
>> -Chris
>> 
>> 
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>> 
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> I’m just now catching up on this massive thread after being on vacation
last
>> week, and I have a few thoughts I’d like to share.
>> 
>> First and foremost please don’t consider lack of dissent on the thread
as
>> presence of consensus. The various git-related threads on LLVM-dev
lately
>> have been so active and contentious that I think a lot of people are
zoning
>> out on the conversations. As supporting evidence of this, I was
discussing
>> this thread yesterday around the office yesterday and had quite a few
people
>> responding something along the lines of “they’re proposing what?”.
>> 
>> I think it would be great for us to have several different proposals
for how
>> the git-transition could work, and have a survey to get people’s
opinions. I
>> know this has been discussed repeatedly, and I want to put in my vote
in
>> favor of having a survey that takes into account multiple different
>> approaches.
>> 
>> WRT the actual proposal in this thread, I’m strongly opposed to a
>> mono-repository. While I understand the argument that the full clone’s
cost
>> on disk space is minimal compared to an LLVM object directory, what
about
>> for contributors that contribute to the smaller runtimes projects but
*not*
>> to LLVM or Clang. A contributor that only contributes to libcxx or
>> compiler-rt being forced to do a full clone of all the LLVM projects in
>> order to push a patch kinda sucks.
>> 
>> I want to point out a few workflows people may not be considering.
>> 
>> Clang can be built against an installed LLVM. I know this workflow is
used
>> by some people because I’ve broken it in the past and had to fix it.
With a
>> mono-repo this workflow gets a bit more complicated because you’d need
to do
>> sparse checkouts, and it probably means we should just nuke the
workflow
>> entirely because there is no real value added by having it.
>> 
>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for
the
>> common use case maintaining sparse repository mirrors would limit
impact of
>> this on users, should any GCC user want to contribute to Compiler-RT,
you’re
>> forcing them to clone a much larger repository than necessary.
>> 
>> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
>> libcxxabi, libunwind, and potentially any other runtime library
projects
>> that we may create in the future.
>> 
>> Beyond all that I want to point out that the git multi-repository story
is
>> basically the same thing we have today with SVN except for the absence
of a
>> monotonically increasing number that corresponds across repositories.
While
>> admittedly you do get a linear history with using the mono-repository,
that
>> isn’t the only way to solve the problem, and I don’t really think that
the
>> benefit (not needing to write some tooling) justifies the increased
burden
>> applied to contributors that don’t use the full LLVM family of
projects.
>> 
>> I think we have some pretty strong evidence in the form of the github
fork
>> counts (https://github.com/llvm-mirror/) that most people aren’t using
all
>> of the LLVM projects. In fact, by that evidence Clang (the second most
>> popular project) is forked less than 2/3 as many times as LLVM.
>> 
>> -Chris
>> 
>> 
>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> Even if it were possible, I would still keep my upstream checkout
>> separate just as a safety measure, to keep from sending private stuff
>> upstream by accident.
>> 
>> 
>> Just FYI, this is our (Azul's) workflow as well, and for similar
>> reasons.
>> 
>> 
>> Same here.
>> 
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Eric Fiselier via llvm-dev

2016-Aug-10 01:01 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

I don't see who would benefit from having libc++/libc++abi in the megarepo
since they are not coupled to either LLVM or Clang. Upstream changes to
other projects have no effect on libc++ and vise-versa so there is no need
to keep them in sync.

For this reason libc++ should stay a separate repository, which can be
included in the megarepo as a sub-module. This avoids concerns about
increasing the cost of checking out and building libc++.

There has also been a secondary discussion about libc++ supporting
out-of-tree builds, I'm unconvinced by arguments about the cost of cloning
LLVM when you only want libc++. Building and testing libc++ already
requires a LLVM checkout somewhere on the machine for LIT and  CMake
modules, so the additional cost is already there. I would be OK dropping
out-of-tree support for libc++ since they are hard to configure correctly
and offer little benefit over in-tree builds.

/Eric


On Tue, Aug 9, 2016 at 5:32 PM, Chris Bieneman via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Can we please stop with the attempts at persuading me to a mono-repo
> approach? I really hate that my voicing criticism has resulted in a big
> “let’s sway beanz” effort.
>
> Justin, the reality is I don’t think the benefits of a mono-repo justify
> the costs if it includes all projects. I think the cost of a mono-repo
> which excludes the runtime projects is lower, so I dislike it less. I can
> even see (and agree with) some arguments in favor of LLVM, Clang and LLD
> being in the same repository, but I don’t see it as solving a problem that
> needs to be solved.
>
> Mehdi’s performance data and git-svn workflow does *nothing* to win me
> over to your argument. All it says is that if we do your proposal I *might*
> be able to keep the same git-svn workflows I use today. I say “might”
> because nobody has addressed my original concerns about whether or not that
> workflow would be dropped if we move to a PR based model, or how we would
> support something similar in a PR model. I also think that the mono-repo
> might discourage pull requests to the runtimes projects from users that
> don’t use clang, which concerns me. Either way Mehdi’s information isn’t
> going to get me to support your idea over the other proposal which offers
> me actual workflow improvements.
>
> It isn’t even going to get me to a “meh”. While I don’t think the proposal
> should be valued based on whether or not it gives me specifically benefit,
> if it provides no benefit to me maintaining the status quo is better than a
> change from my perspective.
>
> All that aside, I really don’t think anyone should be investing that much
> time trying to appease me. While I am a loud voice, and I’m flattered that
> people seem to want to make me change my opinions they are in fact just
> opinions and preferences.
>
> At the end of the day we need some empirical data to drive this decision,
> and I don’t consider the anecdotes on these threads to be that data.  While
> it is useful to hear the first hand opinions of people, it would be great
> if we had some actual data. Even with quantifiable data it is unlikely that
> I would consider the mono-repo an ideal workflow for myself, but it *could*
> be the right solution for the community.  I would like to think that as a
> community we will all be able to put the “greater good” above our own
> preferences.
>
> -Chris
>
>
> > On Aug 9, 2016, at 2:12 PM, Justin Lebar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a
> > "meh" from your corner.’.  I didn’t feel like that was
helping the
> > conversation along.
> >
> > Sorry if I offended anyone with this or sent the wrong message.  I was
> > trying to say, beanz was originally a strong, categorical opponent to
> > the monorepo.  After some discussion, he became not strongly opposed
> > to a monorepo, so long as it didn't contain the runtime libraries.
> > Now Mehdi had a proposal that I was hoping would take him to
> > "not-strongly-opposed" to a monorepo that did contain the
runtime
> > libraries.  Given where we came from, I would be very happy with that
> > outcome.
> >
> > On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
> >>
> >> On Aug 9, 2016, at 1:57 PM, Pete Cooper <peter_cooper at
apple.com> wrote:
> >>
> >>
> >> On Aug 9, 2016, at 1:55 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
> >>
> >>
> >> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >> (2) If I’m stuck using git-svn I kinda feel like there is no real
point
> in
> >> changing anything.
> >>
> >>
> >> No real point *for you specifically*.
> >>
> >> But the vast majority of people would not be stuck using git-svn. 
And
> >> in addition the LLVM project would not be stuck using svn, with
all
> >> the baggage, hosting issues, workflow issues (for people other
than
> >> you), etc.
> >>
> >> The bar by which this proposal should be measured is not "is
it a net
> >> gain for beanz?"  :)  I think we'd be thrilled with a
"meh" from your
> >> corner.
> >>
> >> Justin, I don’t think this conversation is really going anywhere.
> >>
> >>
> >> I’m not sure what you’re referring to exactly, but in the context
of
> "this
> >> thread isn’t getting anywhere”, I strongly disagree.
> >>
> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a
> "meh"
> >> from your corner.’.  I didn’t feel like that was helping the
> conversation
> >> along.
> >>
> >>
> >> OK, I agree with you then :)
> >>
> >>
> >> I agree with everything else you say about actually talking about
the
> >> different proposals.  I hope my point is well received that we
really do
> >> need to eventually describe the impact to daily workflow, once the
> proposals
> >> are far enough along to do so.
> >>
> >>
> >> I agree with you also on this. I voiced in the past (on IRC toward
> >> Justin/David probably) that the proposal should include examples
of
> workflow
> >> and how they translate to whatever the proposal will be.
> >>
> >> Cheers,
> >>
> >> —
> >> Mehdi
> >>
> >>
> >>
> >> Pete
> >>
> >>
> >> I believe that the recent workflow tests I performed (see my last
> emails in
> >> this thread) are proof that this thread has been productive, and I
> believe
> >> discussing here and hearing concerns from people (Chris and
others) are
> >> necessary before getting a proposal fleshed out and having a
survey.
> >>
> >> Having a survey without getting to the end of *what* we want to
survey
> about
> >> is non-sense to me.
> >>
> >> (That may miss your point, but your point wasn’t clear either…).
> >>
> >> —
> >> Mehdi
> >>
> >>
> >>
> >>
> >> Renato already mentioned talking about this at the conference, and
> there has
> >> also been talk of a survey.  I think we need those to see how the
> community
> >> actually feel about the proposals here.
> >>
> >> Chris may be the only vocal advocate of an alternative to your
> proposal, but
> >> then there are people like me who are quiet because we are waiting
for
> the
> >> survey to appear.
> >>
> >> I would have been much more vocal if I thought we were actually
going to
> >> adopt the monorepo, but for now I believe it is still only a
proposal.
> >>
> >> Full disclosure, I don’t want a monorepo.  I think it optimizes
for the
> use
> >> case where people want to bisect, and I don’t think its reasonable
to
> push
> >> on everyone to have a monorepo for those who want to bisect.  The
> submodules
> >> repo has already been demonstrated as one potential solution to
this
> which
> >> would allow those who want to bisect to do so, while everyone else
can
> >> continue to work more or less as they do today.
> >>
> >> In terms of the proposals, I think you, Mehdi, Chris, and a number
of
> others
> >> have proven that there is almost no technical solution beyond our
reach.
> >> What we do have are proposals which optimize for different use
cases.
> Given
> >> this, I think the most useful thing from my point of view (and
> hopefully to
> >> others) would be for those advocating each different solution to
actual
> give
> >> short examples of each of the different use cases and how to
support
> them.
> >>
> >> For example:
> >>
> >> Monorepo, pushing a change to compiler-rt:
> >> 1: Git commit …
> >> 2: Git pull --rebase
> >> 3: test
> >> 4 a: Git push /* no commits to any other project so the push works
*/.
> Goto
> >> 5
> >> 4 b: Git push /* someone committed to some other project in the
> monorepo.
> >> Goto 2 */
> >> 5: Done
> >>
> >> I know that this example appears negative in the case where
someone else
> >> committed to another project and a rebase is required, but thats
> exactly the
> >> point.  This is showing that this particular scenario is
potentially a
> >> problem compared to today and/or other proposals.  A similar
workflow
> could
> >> (should) be written for the sparse checkout monorepo, GitHub
monorepo
> with
> >> svn, and submodules cases.  The submodules case will likely show
that
> >> bisecting is more complex than on the monorepo, while pushing is
> simpler.
> >>
> >> Similarly, the submodules workflow probably isn’t capable of a
single
> commit
> >> to llvm and clang in the revlock case while the monorepo is, but
we as a
> >> community need to decide whether we want to optimize for that or
not.  I
> >> don’t have any data to suggest that revlock commits are
> frequent/infrequent
> >> or even a problem in general, and I don’t think we should optimize
for
> that
> >> case unless its worth doing so.
> >>
> >> Only by actually showing the use cases we care about can the
community
> make
> >> an educated decision about what these proposals actually mean to
our
> daily
> >> workflow.  We can then choose what we are optimizing for.  I
personally
> want
> >> to have a very simple list of repo’s to clone from (or just one!)
and
> for
> >> pushing to be easy, because those are the actions I perform the
most
> often.
> >> Others will have different use cases they care about and they can
> choose the
> >> proposal which suits them best.
> >>
> >> Cheers,
> >> Pete
> >>
> >>
> >> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at
apple.com>
> wrote:
> >>
> >>
> >> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
> >>
> >>
> >> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at
apple.com> wrote:
> >>
> >>
> >>
> >> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
> >>
> >>
> >> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com> wrote:
> >>
> >> Thanks for your thoughts, Chris.
> >>
> >> As supporting evidence of this, I was discussing this thread
yesterday
> >> around the office yesterday and had quite a few people responding
> something
> >> along the lines of “they’re proposing what?”.
> >>
> >>
> >> I hope they'll join us in this thread.
> >>
> >> Ultimately a survey is going to be strongly biased in favor of
"don't
> >> change anything".  There is a strong psychological bias to
weight
> >> losses more than gains, so if one doesn't engage with the
issue, it's
> >> only natural to conclude "keep it as similar as possible to
what it is
> >> today -- that is safe."  But that line of thinking does not
> >> necessarily lead us to the best outcome.
> >>
> >>
> >> I don’t agree with this assertion. I believe that if you put forth
> multiple
> >> proposals, and have an articulate discussion of the merits and
costs of
> each
> >> solution you can create a survey that can help inform decision
making. I
> >> suppose we can agree to disagree.
> >>
> >>
> >> We've heard in thread from a lot of developers about how a
monorepo
> >> would improve their workflow.  I would love to hear from some
> >> developers who are actually affected in the way you describe,
rather
> >> than just considering the hypothetical.
> >>
> >> My expectation is that the effect of the monorepo on said
developers
> >> would be relatively small -- we're talking about 1gb of disk
space.  I
> >> understand that there's a "yuck" factor to this, but
inasmuch as there
> >> aren't other concrete effects, this is just change aversion. 
And
> >> essentially all of the other effects of the monorepo can be hidden
via
> >> sparse checkouts, as we've discussed.
> >>
> >> Maybe I am wrong.  But I don't think we're going to get to
the bottom
> >> of it without actually engaging with people who are actually
affected
> >> in the way you posit.
> >>
> >>
> >> Ok, let me describe a few workflows I’ve used in the last year
that are
> (in
> >> my mind) adversely impacted by a mono-repo.
> >>
> >> Case Study 1 - Simple development on a sub-project
> >>
> >> I build LLVM + Clang + Compiler-RT using the just-built Clang to
build
> >> Compiler-RT. I iterate on some complicated Compiler-RT changes
over a
> period
> >> of a day. Once my Compiler-RT changes are done I rebase the
compiler-rt
> >> repo, rebuild compiler-rt then commit.
> >>
> >> With a mono-repo rebasing the checkout means rebasing the whole
tree.
> So,
> >> either I have to wrangle some crazy git or CMake foo, or when I
run
> “ninja
> >> compiler-rt” after the rebase it will rebuild LLVM and Clang too.
That
> kinda
> >> sucks.
> >>
> >> What this example illustrates to me is that today we have loosely
> coupled
> >> projects with an occasional rev lock. Moving to a mono-repo
enforces a
> tight
> >> coupling that isn’t strictly required today.
> >>
> >> Case Study 2 - Working on a sub-project in isolation across many
> platforms
> >>
> >> I did a lot of work on Compiler-RT last year that had no direct
> dependency
> >> on any other LLVM project. During the development I was working
with a
> >> Compiler-RT checkout and a build directory of just Compiler-RT.
Every
> once
> >> in a while (or every other day as it were) I would make a change
that
> would
> >> break a configuration that I wasn’t directly developing on. My
workflow
> for
> >> handling those cases was:
> >>
> >> (1) Spin up a VM on a VPS that closely matched the configuration I
broke
> >> (2) Checkout Compiler-RT
> >> (3) Reproduce, debug, fix the failure
> >> (4) Commit the patch from the VM
> >>
> >> In a mono-repository doing this would require checking out *all*
> >> sub-projects, not just Compiler-RT. I imagine this probably isn’t
a
> common
> >> workflow, but it is one I use that would be adversely impacted by
> needing to
> >> checkout a full LLVM. Now, you might say I could check out the
> sub-project
> >> mirror, but then I can’t commit from the VM, which kinda sucks.
> >>
> >>
> >> So for the “I spin a VM and want to make a commit but don’t want
to
> download
> >> a few hundred MBs with a git clone” story, it turns out that the
github
> >> bridge with SVN helps to optimize with a “lean” checkout:
> >>
> >> I fork the unified repo here:
> >> https://github.com/joker-eph/llvm-project/commits/master and then:
> svn co
> >> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> >>
> >> So that’s a net “no regression” compared to the current state :)
> >>
> >>
> >> Is the github SVN interface's "co" magically as fast
as a git clone?
> >>
> >>
> >> $ time svn co  https://github.com/joker-eph/
> llvm-project/trunk/compiler-rt
> >> ….
> >> real 0m8.539s user 0m0.919s  sys 0m1.917s
> >> $ time git clone https://github.com/joker-eph/compiler-rt.git
> >> real 0m5.487s user 0m1.208s sys 0m0.825s
> >>
> >>
> >> That’s actually not terrible! Color me impressed.
> >>
> >>
> >>
> >> If not, it is a performance regression because today I use git
clone and
> >> git-svn on my VMs just like on my physical machines, and either
way it
> adds
> >> some crazy complexity.
> >>
> >>
> >> No problem, I get it, exactly same workflow as today:
> >>
> >>
> >> Yep. Which isn’t bad. I do however have two concerns.
> >>
> >> (1) What happens if we move to pull request-based workflows? Do we
still
> >> support this workflow?
> >> (2) If I’m stuck using git-svn I kinda feel like there is no real
point
> in
> >> changing anything. I dislike this workflow less than the earlier
> proposals,
> >> but I see no reason to move to this instead of staying on SVN
(other
> than
> >> the hosting issues which could be solved in other ways).
> >>
> >> -Chris
> >>
> >>
> >> # Clone from the single read-only git repo
> >> $ git clone https://github.com/joker-eph/compiler-rt.git
> >> …
> >> # Configure the SVN remote and initialize the svn metadata
> >> $ cd compiler-rt
> >> $ git svn init https://github.com/joker-eph/
> llvm-project/trunk/compiler-rt
> >> —username> >> $ git config svn-remote.svn.fetch
:refs/remotes/origin/master
> >> $ git svn rebase -l
> >> ...
> >> # Remove and empty file and commit with git
> >> $ git rm empty
> >> $ git commit -m "remove empty file"
> >> # commit/push with svn to the unified git repo
> >> $ git svn dcommit
> >> Committing to https://github.com/joker-eph/
> llvm-project/trunk/compiler-rt
> >> ...
> >> D empty
> >> Committed r354148
> >>
> >>
> >> Here is the commit:
> >> https://github.com/joker-eph/llvm-project/commit/
> 5f7e977c8cf3c33153d91be9b556143b49911ebe
> >>
> >>
> >> —
> >> Mehdi
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> While admittedly you do get a linear history with using the
> mono-repository,
> >> that isn’t the only way to solve the problem, and I don’t really
think
> that
> >> the benefit (not needing to write some tooling) justifies the
increased
> >> burden applied to contributors that don’t use the full LLVM family
of
> >> projects.
> >>
> >>
> >> I think the trade-off you're considering here (cost to
developers who
> >> use llvm plus a version-locked subrepo vs. cost to developers who
> >> don't want an llvm clone) is the right one.
> >>
> >>
> >> I actually think there are *a lot* more considerations we need to
be
> making
> >> for an infrastructure change like this. While it is true that our
SCM
> >> hosting strategy primarily impacts developers, it also impacts our
> users. We
> >> should be conscious of the impact to downstream users in making
> >> infrastructure changes like this. That is part of why the idea of
a
> survey
> >> holds appeal to me; it would give us the opportunity to get
feedback
> from a
> >> much wider audience than the current “people on llvm-dev who
haven’t
> been
> >> scared away”.
> >>
> >> But as someone who has
> >> extensively used git submodules and repo (a wrapper script), I
> >> strongly disagree with the judgement that a monorepo would not be
a
> >> significant improvement.
> >>
> >> Our primary disagreement, I think, is over how much cost there is
to
> >> "writing some tooling".  To me, this is a significant
barrier standing
> >> in the way of developer productivity.  Here at Google I did a
quick
> >> survey, and more than half of us don't have scripts of the
sort that
> >> Justin Bogner described.  We are all just floundering around
rebasing
> >> clang and llvm until it compiles.  It *sucks*.
> >>
> >>
> >> I actually think we’re both talking about solutions that require
> tooling,
> >> and while we *could* be disagreeing over how much effort each
tooling
> >> initiative would require (I think they’re pretty close, so I don’t
care
> to
> >> have that argument), my actual disagreement with your proposal is
that
> it is
> >> a change that impacts developers and users universally and I don’t
think
> >> that it is justified. Simply put, I don’t feel that the benefits
are
> >> substantial enough to warrant the kind of disruptive change you’re
> >> proposing.
> >>
> >>
> >> I suggest that saying that all of these developers are "doing
it
> >> wrong" is not helpful.
> >>
> >>
> >> Maybe I’m missing something, but I don’t think I said anyone was
“doing
> it
> >> wrong”. Bisecting across multiple git repositories isn’t a great
> experience.
> >> But neither is bisecting across a half dozen separate folders in
an SVN
> >> repository. Both the submodule solution and the mono-repo solution
solve
> >> this problem equivalently well.
> >>
> >> Not everyone has the git and python/bash chops
> >> to write the necessary scripts.  Not everyone has the personality
to
> >> obsessively script around stuff, or the desire to maintain said
> >> scripts.  Not everyone works on llvm/clang so much that it's
worth
> >> adopting a special-snowflake workflow.  And some of us -- myself
> >> included -- have extensive git scripts which work with the
standard
> >> git workflow but would be completely broken by adding a custom
level
> >> of indirection around git.
> >>
> >> When put this way, maybe it's clear that it's actually a
niche set of
> >> people for whom "script around the brokenness" is a good
solution.
> >>
> >>
> >> I’m not sure what “brokenness” you’re referring to. We have a
> collection of
> >> loosely connected projects by design. As a result of that
intentional
> design
> >> certain workflows will be impacted. I don’t think that is
brokenness. I
> >> think our loose coupling is a feature even if it makes some
workflows
> >> harder.
> >>
> >> -Chris
> >>
> >>
> >> As I've said a bunch of times above, we have to weigh a cost
paid by
> >> all of us every time we type a command that starts with
"git" --
> >> something we do tens or hundreds of times a day -- versus the
one-time
> >> cost of asking people to download 1gb of data.
> >>
> >> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >> I’m just now catching up on this massive thread after being on
vacation
> last
> >> week, and I have a few thoughts I’d like to share.
> >>
> >> First and foremost please don’t consider lack of dissent on the
thread
> as
> >> presence of consensus. The various git-related threads on LLVM-dev
> lately
> >> have been so active and contentious that I think a lot of people
are
> zoning
> >> out on the conversations. As supporting evidence of this, I was
> discussing
> >> this thread yesterday around the office yesterday and had quite a
few
> people
> >> responding something along the lines of “they’re proposing what?”.
> >>
> >> I think it would be great for us to have several different
proposals
> for how
> >> the git-transition could work, and have a survey to get people’s
> opinions. I
> >> know this has been discussed repeatedly, and I want to put in my
vote in
> >> favor of having a survey that takes into account multiple
different
> >> approaches.
> >>
> >> WRT the actual proposal in this thread, I’m strongly opposed to a
> >> mono-repository. While I understand the argument that the full
clone’s
> cost
> >> on disk space is minimal compared to an LLVM object directory,
what
> about
> >> for contributors that contribute to the smaller runtimes projects
but
> *not*
> >> to LLVM or Clang. A contributor that only contributes to libcxx or
> >> compiler-rt being forced to do a full clone of all the LLVM
projects in
> >> order to push a patch kinda sucks.
> >>
> >> I want to point out a few workflows people may not be considering.
> >>
> >> Clang can be built against an installed LLVM. I know this workflow
is
> used
> >> by some people because I’ve broken it in the past and had to fix
it.
> With a
> >> mono-repo this workflow gets a bit more complicated because you’d
need
> to do
> >> sparse checkouts, and it probably means we should just nuke the
workflow
> >> entirely because there is no real value added by having it.
> >>
> >> Compiler-RT’s sanitizers are used with GCC; no LLVM required.
While for
> the
> >> common use case maintaining sparse repository mirrors would limit
> impact of
> >> this on users, should any GCC user want to contribute to
Compiler-RT,
> you’re
> >> forcing them to clone a much larger repository than necessary.
> >>
> >> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
> >> libcxxabi, libunwind, and potentially any other runtime library
projects
> >> that we may create in the future.
> >>
> >> Beyond all that I want to point out that the git multi-repository
story
> is
> >> basically the same thing we have today with SVN except for the
absence
> of a
> >> monotonically increasing number that corresponds across
repositories.
> While
> >> admittedly you do get a linear history with using the
mono-repository,
> that
> >> isn’t the only way to solve the problem, and I don’t really think
that
> the
> >> benefit (not needing to write some tooling) justifies the
increased
> burden
> >> applied to contributors that don’t use the full LLVM family of
projects.
> >>
> >> I think we have some pretty strong evidence in the form of the
github
> fork
> >> counts (https://github.com/llvm-mirror/) that most people aren’t
using
> all
> >> of the LLVM projects. In fact, by that evidence Clang (the second
most
> >> popular project) is forked less than 2/3 as many times as LLVM.
> >>
> >> -Chris
> >>
> >>
> >> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >> Even if it were possible, I would still keep my upstream checkout
> >> separate just as a safety measure, to keep from sending private
stuff
> >> upstream by accident.
> >>
> >>
> >> Just FYI, this is our (Azul's) workflow as well, and for
similar
> >> reasons.
> >>
> >>
> >> Same here.
> >>
> >> cheers,
> >> --renato
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >>
> >>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/8e667df4/attachment.html>

Chris Bieneman via llvm-dev

2016-Aug-10 01:14 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 9, 2016, at 6:01 PM, Eric Fiselier <eric at efcs.ca> wrote:
> 
> I don't see who would benefit from having libc++/libc++abi in the
megarepo since they are not coupled to either LLVM or Clang. Upstream changes to
other projects have no effect on libc++ and vise-versa so there is no need to
keep them in sync.
> 
> For this reason libc++ should stay a separate repository, which can be
included in the megarepo as a sub-module. This avoids concerns about increasing
the cost of checking out and building libc++.
> 
> There has also been a secondary discussion about libc++ supporting
out-of-tree builds, I'm unconvinced by arguments about the cost of cloning
LLVM when you only want libc++. Building and testing libc++ already requires a
LLVM checkout somewhere on the machine for LIT and  CMake modules, so the
additional cost is already there. I would be OK dropping out-of-tree support for
libc++ since they are hard to configure correctly and offer little benefit over
in-tree builds.
Two reasons libcxx needs to support out of tree builds.

(1) from a build system perspective we need to be able to build libcxx using
just-built clang which means an out-of-tree style configuration even if it seems
in-tree to the user.
(2) lots of people use libcxx without clang. Removing support for libcxx
building out of tree is unreasonable for end users. It might be reasonable for
developers, but end users need that ability.

-Chris
> 
> /Eric
> 
> 
>> On Tue, Aug 9, 2016 at 5:32 PM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> Can we please stop with the attempts at persuading me to a mono-repo
approach? I really hate that my voicing criticism has resulted in a big “let’s
sway beanz” effort.
>> 
>> Justin, the reality is I don’t think the benefits of a mono-repo
justify the costs if it includes all projects. I think the cost of a mono-repo
which excludes the runtime projects is lower, so I dislike it less. I can even
see (and agree with) some arguments in favor of LLVM, Clang and LLD being in the
same repository, but I don’t see it as solving a problem that needs to be
solved.
>> 
>> Mehdi’s performance data and git-svn workflow does *nothing* to win me
over to your argument. All it says is that if we do your proposal I *might* be
able to keep the same git-svn workflows I use today. I say “might” because
nobody has addressed my original concerns about whether or not that workflow
would be dropped if we move to a PR based model, or how we would support
something similar in a PR model. I also think that the mono-repo might
discourage pull requests to the runtimes projects from users that don’t use
clang, which concerns me. Either way Mehdi’s information isn’t going to get me
to support your idea over the other proposal which offers me actual workflow
improvements.
>> 
>> It isn’t even going to get me to a “meh”. While I don’t think the
proposal should be valued based on whether or not it gives me specifically
benefit, if it provides no benefit to me maintaining the status quo is better
than a change from my perspective.
>> 
>> All that aside, I really don’t think anyone should be investing that
much time trying to appease me. While I am a loud voice, and I’m flattered that
people seem to want to make me change my opinions they are in fact just opinions
and preferences.
>> 
>> At the end of the day we need some empirical data to drive this
decision, and I don’t consider the anecdotes on these threads to be that data. 
While it is useful to hear the first hand opinions of people, it would be great
if we had some actual data. Even with quantifiable data it is unlikely that I
would consider the mono-repo an ideal workflow for myself, but it *could* be the
right solution for the community.  I would like to think that as a community we
will all be able to put the “greater good” above our own preferences.
>> 
>> -Chris
>> 
>> 
>> > On Aug 9, 2016, at 2:12 PM, Justin Lebar via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> >
>> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a
>> > "meh" from your corner.’.  I didn’t feel like that was
helping the
>> > conversation along.
>> >
>> > Sorry if I offended anyone with this or sent the wrong message.  I
was
>> > trying to say, beanz was originally a strong, categorical opponent
to
>> > the monorepo.  After some discussion, he became not strongly
opposed
>> > to a monorepo, so long as it didn't contain the runtime
libraries.
>> > Now Mehdi had a proposal that I was hoping would take him to
>> > "not-strongly-opposed" to a monorepo that did contain
the runtime
>> > libraries.  Given where we came from, I would be very happy with
that
>> > outcome.
>> >
>> > On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >> On Aug 9, 2016, at 1:57 PM, Pete Cooper <peter_cooper at
apple.com> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 1:55 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> (2) If I’m stuck using git-svn I kinda feel like there is no
real point in
>> >> changing anything.
>> >>
>> >>
>> >> No real point *for you specifically*.
>> >>
>> >> But the vast majority of people would not be stuck using
git-svn.  And
>> >> in addition the LLVM project would not be stuck using svn,
with all
>> >> the baggage, hosting issues, workflow issues (for people other
than
>> >> you), etc.
>> >>
>> >> The bar by which this proposal should be measured is not
"is it a net
>> >> gain for beanz?"  :)  I think we'd be thrilled with a
"meh" from your
>> >> corner.
>> >>
>> >> Justin, I don’t think this conversation is really going
anywhere.
>> >>
>> >>
>> >> I’m not sure what you’re referring to exactly, but in the
context of "this
>> >> thread isn’t getting anywhere”, I strongly disagree.
>> >>
>> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a "meh"
>> >> from your corner.’.  I didn’t feel like that was helping the
conversation
>> >> along.
>> >>
>> >>
>> >> OK, I agree with you then :)
>> >>
>> >>
>> >> I agree with everything else you say about actually talking
about the
>> >> different proposals.  I hope my point is well received that we
really do
>> >> need to eventually describe the impact to daily workflow, once
the proposals
>> >> are far enough along to do so.
>> >>
>> >>
>> >> I agree with you also on this. I voiced in the past (on IRC
toward
>> >> Justin/David probably) that the proposal should include
examples of workflow
>> >> and how they translate to whatever the proposal will be.
>> >>
>> >> Cheers,
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >> Pete
>> >>
>> >>
>> >> I believe that the recent workflow tests I performed (see my
last emails in
>> >> this thread) are proof that this thread has been productive,
and I believe
>> >> discussing here and hearing concerns from people (Chris and
others) are
>> >> necessary before getting a proposal fleshed out and having a
survey.
>> >>
>> >> Having a survey without getting to the end of *what* we want
to survey about
>> >> is non-sense to me.
>> >>
>> >> (That may miss your point, but your point wasn’t clear
either…).
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >>
>> >> Renato already mentioned talking about this at the conference,
and there has
>> >> also been talk of a survey.  I think we need those to see how
the community
>> >> actually feel about the proposals here.
>> >>
>> >> Chris may be the only vocal advocate of an alternative to your
proposal, but
>> >> then there are people like me who are quiet because we are
waiting for the
>> >> survey to appear.
>> >>
>> >> I would have been much more vocal if I thought we were
actually going to
>> >> adopt the monorepo, but for now I believe it is still only a
proposal.
>> >>
>> >> Full disclosure, I don’t want a monorepo.  I think it
optimizes for the use
>> >> case where people want to bisect, and I don’t think its
reasonable to push
>> >> on everyone to have a monorepo for those who want to bisect. 
The submodules
>> >> repo has already been demonstrated as one potential solution
to this which
>> >> would allow those who want to bisect to do so, while everyone
else can
>> >> continue to work more or less as they do today.
>> >>
>> >> In terms of the proposals, I think you, Mehdi, Chris, and a
number of others
>> >> have proven that there is almost no technical solution beyond
our reach.
>> >> What we do have are proposals which optimize for different use
cases.  Given
>> >> this, I think the most useful thing from my point of view (and
hopefully to
>> >> others) would be for those advocating each different solution
to actual give
>> >> short examples of each of the different use cases and how to
support them.
>> >>
>> >> For example:
>> >>
>> >> Monorepo, pushing a change to compiler-rt:
>> >> 1: Git commit …
>> >> 2: Git pull --rebase
>> >> 3: test
>> >> 4 a: Git push /* no commits to any other project so the push
works */.  Goto
>> >> 5
>> >> 4 b: Git push /* someone committed to some other project in
the monorepo.
>> >> Goto 2 */
>> >> 5: Done
>> >>
>> >> I know that this example appears negative in the case where
someone else
>> >> committed to another project and a rebase is required, but
thats exactly the
>> >> point.  This is showing that this particular scenario is
potentially a
>> >> problem compared to today and/or other proposals.  A similar
workflow could
>> >> (should) be written for the sparse checkout monorepo, GitHub
monorepo with
>> >> svn, and submodules cases.  The submodules case will likely
show that
>> >> bisecting is more complex than on the monorepo, while pushing
is simpler.
>> >>
>> >> Similarly, the submodules workflow probably isn’t capable of a
single commit
>> >> to llvm and clang in the revlock case while the monorepo is,
but we as a
>> >> community need to decide whether we want to optimize for that
or not.  I
>> >> don’t have any data to suggest that revlock commits are
frequent/infrequent
>> >> or even a problem in general, and I don’t think we should
optimize for that
>> >> case unless its worth doing so.
>> >>
>> >> Only by actually showing the use cases we care about can the
community make
>> >> an educated decision about what these proposals actually mean
to our daily
>> >> workflow.  We can then choose what we are optimizing for.  I
personally want
>> >> to have a very simple list of repo’s to clone from (or just
one!) and for
>> >> pushing to be easy, because those are the actions I perform
the most often.
>> >> Others will have different use cases they care about and they
can choose the
>> >> proposal which suits them best.
>> >>
>> >> Cheers,
>> >> Pete
>> >>
>> >>
>> >> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at
apple.com> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >>
>> >> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at
apple.com> wrote:
>> >>
>> >>
>> >>
>> >> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >>
>> >> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com> wrote:
>> >>
>> >> Thanks for your thoughts, Chris.
>> >>
>> >> As supporting evidence of this, I was discussing this thread
yesterday
>> >> around the office yesterday and had quite a few people
responding something
>> >> along the lines of “they’re proposing what?”.
>> >>
>> >>
>> >> I hope they'll join us in this thread.
>> >>
>> >> Ultimately a survey is going to be strongly biased in favor of
"don't
>> >> change anything".  There is a strong psychological bias
to weight
>> >> losses more than gains, so if one doesn't engage with the
issue, it's
>> >> only natural to conclude "keep it as similar as possible
to what it is
>> >> today -- that is safe."  But that line of thinking does
not
>> >> necessarily lead us to the best outcome.
>> >>
>> >>
>> >> I don’t agree with this assertion. I believe that if you put
forth multiple
>> >> proposals, and have an articulate discussion of the merits and
costs of each
>> >> solution you can create a survey that can help inform decision
making. I
>> >> suppose we can agree to disagree.
>> >>
>> >>
>> >> We've heard in thread from a lot of developers about how a
monorepo
>> >> would improve their workflow.  I would love to hear from some
>> >> developers who are actually affected in the way you describe,
rather
>> >> than just considering the hypothetical.
>> >>
>> >> My expectation is that the effect of the monorepo on said
developers
>> >> would be relatively small -- we're talking about 1gb of
disk space.  I
>> >> understand that there's a "yuck" factor to this,
but inasmuch as there
>> >> aren't other concrete effects, this is just change
aversion.  And
>> >> essentially all of the other effects of the monorepo can be
hidden via
>> >> sparse checkouts, as we've discussed.
>> >>
>> >> Maybe I am wrong.  But I don't think we're going to
get to the bottom
>> >> of it without actually engaging with people who are actually
affected
>> >> in the way you posit.
>> >>
>> >>
>> >> Ok, let me describe a few workflows I’ve used in the last year
that are (in
>> >> my mind) adversely impacted by a mono-repo.
>> >>
>> >> Case Study 1 - Simple development on a sub-project
>> >>
>> >> I build LLVM + Clang + Compiler-RT using the just-built Clang
to build
>> >> Compiler-RT. I iterate on some complicated Compiler-RT changes
over a period
>> >> of a day. Once my Compiler-RT changes are done I rebase the
compiler-rt
>> >> repo, rebuild compiler-rt then commit.
>> >>
>> >> With a mono-repo rebasing the checkout means rebasing the
whole tree. So,
>> >> either I have to wrangle some crazy git or CMake foo, or when
I run “ninja
>> >> compiler-rt” after the rebase it will rebuild LLVM and Clang
too. That kinda
>> >> sucks.
>> >>
>> >> What this example illustrates to me is that today we have
loosely coupled
>> >> projects with an occasional rev lock. Moving to a mono-repo
enforces a tight
>> >> coupling that isn’t strictly required today.
>> >>
>> >> Case Study 2 - Working on a sub-project in isolation across
many platforms
>> >>
>> >> I did a lot of work on Compiler-RT last year that had no
direct dependency
>> >> on any other LLVM project. During the development I was
working with a
>> >> Compiler-RT checkout and a build directory of just
Compiler-RT. Every once
>> >> in a while (or every other day as it were) I would make a
change that would
>> >> break a configuration that I wasn’t directly developing on. My
workflow for
>> >> handling those cases was:
>> >>
>> >> (1) Spin up a VM on a VPS that closely matched the
configuration I broke
>> >> (2) Checkout Compiler-RT
>> >> (3) Reproduce, debug, fix the failure
>> >> (4) Commit the patch from the VM
>> >>
>> >> In a mono-repository doing this would require checking out
*all*
>> >> sub-projects, not just Compiler-RT. I imagine this probably
isn’t a common
>> >> workflow, but it is one I use that would be adversely impacted
by needing to
>> >> checkout a full LLVM. Now, you might say I could check out the
sub-project
>> >> mirror, but then I can’t commit from the VM, which kinda
sucks.
>> >>
>> >>
>> >> So for the “I spin a VM and want to make a commit but don’t
want to download
>> >> a few hundred MBs with a git clone” story, it turns out that
the github
>> >> bridge with SVN helps to optimize with a “lean” checkout:
>> >>
>> >> I fork the unified repo here:
>> >> https://github.com/joker-eph/llvm-project/commits/master and
then:  svn co
>> >> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> >>
>> >> So that’s a net “no regression” compared to the current state
:)
>> >>
>> >>
>> >> Is the github SVN interface's "co" magically as
fast as a git clone?
>> >>
>> >>
>> >> $ time svn co 
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> >> ….
>> >> real 0m8.539s user 0m0.919s  sys 0m1.917s
>> >> $ time git clone https://github.com/joker-eph/compiler-rt.git
>> >> real 0m5.487s user 0m1.208s sys 0m0.825s
>> >>
>> >>
>> >> That’s actually not terrible! Color me impressed.
>> >>
>> >>
>> >>
>> >> If not, it is a performance regression because today I use git
clone and
>> >> git-svn on my VMs just like on my physical machines, and
either way it adds
>> >> some crazy complexity.
>> >>
>> >>
>> >> No problem, I get it, exactly same workflow as today:
>> >>
>> >>
>> >> Yep. Which isn’t bad. I do however have two concerns.
>> >>
>> >> (1) What happens if we move to pull request-based workflows?
Do we still
>> >> support this workflow?
>> >> (2) If I’m stuck using git-svn I kinda feel like there is no
real point in
>> >> changing anything. I dislike this workflow less than the
earlier proposals,
>> >> but I see no reason to move to this instead of staying on SVN
(other than
>> >> the hosting issues which could be solved in other ways).
>> >>
>> >> -Chris
>> >>
>> >>
>> >> # Clone from the single read-only git repo
>> >> $ git clone https://github.com/joker-eph/compiler-rt.git
>> >> …
>> >> # Configure the SVN remote and initialize the svn metadata
>> >> $ cd compiler-rt
>> >> $ git svn init
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> >> —username>> >> $ git config svn-remote.svn.fetch
:refs/remotes/origin/master
>> >> $ git svn rebase -l
>> >> ...
>> >> # Remove and empty file and commit with git
>> >> $ git rm empty
>> >> $ git commit -m "remove empty file"
>> >> # commit/push with svn to the unified git repo
>> >> $ git svn dcommit
>> >> Committing to
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> >> ...
>> >> D empty
>> >> Committed r354148
>> >>
>> >>
>> >> Here is the commit:
>> >>
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>> >>
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> While admittedly you do get a linear history with using the
mono-repository,
>> >> that isn’t the only way to solve the problem, and I don’t
really think that
>> >> the benefit (not needing to write some tooling) justifies the
increased
>> >> burden applied to contributors that don’t use the full LLVM
family of
>> >> projects.
>> >>
>> >>
>> >> I think the trade-off you're considering here (cost to
developers who
>> >> use llvm plus a version-locked subrepo vs. cost to developers
who
>> >> don't want an llvm clone) is the right one.
>> >>
>> >>
>> >> I actually think there are *a lot* more considerations we need
to be making
>> >> for an infrastructure change like this. While it is true that
our SCM
>> >> hosting strategy primarily impacts developers, it also impacts
our users. We
>> >> should be conscious of the impact to downstream users in
making
>> >> infrastructure changes like this. That is part of why the idea
of a survey
>> >> holds appeal to me; it would give us the opportunity to get
feedback from a
>> >> much wider audience than the current “people on llvm-dev who
haven’t been
>> >> scared away”.
>> >>
>> >> But as someone who has
>> >> extensively used git submodules and repo (a wrapper script), I
>> >> strongly disagree with the judgement that a monorepo would not
be a
>> >> significant improvement.
>> >>
>> >> Our primary disagreement, I think, is over how much cost there
is to
>> >> "writing some tooling".  To me, this is a
significant barrier standing
>> >> in the way of developer productivity.  Here at Google I did a
quick
>> >> survey, and more than half of us don't have scripts of the
sort that
>> >> Justin Bogner described.  We are all just floundering around
rebasing
>> >> clang and llvm until it compiles.  It *sucks*.
>> >>
>> >>
>> >> I actually think we’re both talking about solutions that
require tooling,
>> >> and while we *could* be disagreeing over how much effort each
tooling
>> >> initiative would require (I think they’re pretty close, so I
don’t care to
>> >> have that argument), my actual disagreement with your proposal
is that it is
>> >> a change that impacts developers and users universally and I
don’t think
>> >> that it is justified. Simply put, I don’t feel that the
benefits are
>> >> substantial enough to warrant the kind of disruptive change
you’re
>> >> proposing.
>> >>
>> >>
>> >> I suggest that saying that all of these developers are
"doing it
>> >> wrong" is not helpful.
>> >>
>> >>
>> >> Maybe I’m missing something, but I don’t think I said anyone
was “doing it
>> >> wrong”. Bisecting across multiple git repositories isn’t a
great experience.
>> >> But neither is bisecting across a half dozen separate folders
in an SVN
>> >> repository. Both the submodule solution and the mono-repo
solution solve
>> >> this problem equivalently well.
>> >>
>> >> Not everyone has the git and python/bash chops
>> >> to write the necessary scripts.  Not everyone has the
personality to
>> >> obsessively script around stuff, or the desire to maintain
said
>> >> scripts.  Not everyone works on llvm/clang so much that
it's worth
>> >> adopting a special-snowflake workflow.  And some of us --
myself
>> >> included -- have extensive git scripts which work with the
standard
>> >> git workflow but would be completely broken by adding a custom
level
>> >> of indirection around git.
>> >>
>> >> When put this way, maybe it's clear that it's actually
a niche set of
>> >> people for whom "script around the brokenness" is a
good solution.
>> >>
>> >>
>> >> I’m not sure what “brokenness” you’re referring to. We have a
collection of
>> >> loosely connected projects by design. As a result of that
intentional design
>> >> certain workflows will be impacted. I don’t think that is
brokenness. I
>> >> think our loose coupling is a feature even if it makes some
workflows
>> >> harder.
>> >>
>> >> -Chris
>> >>
>> >>
>> >> As I've said a bunch of times above, we have to weigh a
cost paid by
>> >> all of us every time we type a command that starts with
"git" --
>> >> something we do tens or hundreds of times a day -- versus the
one-time
>> >> cost of asking people to download 1gb of data.
>> >>
>> >> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> I’m just now catching up on this massive thread after being on
vacation last
>> >> week, and I have a few thoughts I’d like to share.
>> >>
>> >> First and foremost please don’t consider lack of dissent on
the thread as
>> >> presence of consensus. The various git-related threads on
LLVM-dev lately
>> >> have been so active and contentious that I think a lot of
people are zoning
>> >> out on the conversations. As supporting evidence of this, I
was discussing
>> >> this thread yesterday around the office yesterday and had
quite a few people
>> >> responding something along the lines of “they’re proposing
what?”.
>> >>
>> >> I think it would be great for us to have several different
proposals for how
>> >> the git-transition could work, and have a survey to get
people’s opinions. I
>> >> know this has been discussed repeatedly, and I want to put in
my vote in
>> >> favor of having a survey that takes into account multiple
different
>> >> approaches.
>> >>
>> >> WRT the actual proposal in this thread, I’m strongly opposed
to a
>> >> mono-repository. While I understand the argument that the full
clone’s cost
>> >> on disk space is minimal compared to an LLVM object directory,
what about
>> >> for contributors that contribute to the smaller runtimes
projects but *not*
>> >> to LLVM or Clang. A contributor that only contributes to
libcxx or
>> >> compiler-rt being forced to do a full clone of all the LLVM
projects in
>> >> order to push a patch kinda sucks.
>> >>
>> >> I want to point out a few workflows people may not be
considering.
>> >>
>> >> Clang can be built against an installed LLVM. I know this
workflow is used
>> >> by some people because I’ve broken it in the past and had to
fix it. With a
>> >> mono-repo this workflow gets a bit more complicated because
you’d need to do
>> >> sparse checkouts, and it probably means we should just nuke
the workflow
>> >> entirely because there is no real value added by having it.
>> >>
>> >> Compiler-RT’s sanitizers are used with GCC; no LLVM required.
While for the
>> >> common use case maintaining sparse repository mirrors would
limit impact of
>> >> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>> >> forcing them to clone a much larger repository than necessary.
>> >>
>> >> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>> >> libcxxabi, libunwind, and potentially any other runtime
library projects
>> >> that we may create in the future.
>> >>
>> >> Beyond all that I want to point out that the git
multi-repository story is
>> >> basically the same thing we have today with SVN except for the
absence of a
>> >> monotonically increasing number that corresponds across
repositories. While
>> >> admittedly you do get a linear history with using the
mono-repository, that
>> >> isn’t the only way to solve the problem, and I don’t really
think that the
>> >> benefit (not needing to write some tooling) justifies the
increased burden
>> >> applied to contributors that don’t use the full LLVM family of
projects.
>> >>
>> >> I think we have some pretty strong evidence in the form of the
github fork
>> >> counts (https://github.com/llvm-mirror/) that most people
aren’t using all
>> >> of the LLVM projects. In fact, by that evidence Clang (the
second most
>> >> popular project) is forked less than 2/3 as many times as
LLVM.
>> >>
>> >> -Chris
>> >>
>> >>
>> >> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Even if it were possible, I would still keep my upstream
checkout
>> >> separate just as a safety measure, to keep from sending
private stuff
>> >> upstream by accident.
>> >>
>> >>
>> >> Just FYI, this is our (Azul's) workflow as well, and for
similar
>> >> reasons.
>> >>
>> >>
>> >> Same here.
>> >>
>> >> cheers,
>> >> --renato
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >>
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/53636101/attachment-0001.html>

Eric Fiselier via llvm-dev

2016-Aug-10 01:56 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

By "in-tree" all I mean is the libc++ source directory exists within
llvm/projects, nothing more.

(1) from a build system perspective we need to be able to build libcxx
using just-built clang which means an out-of-tree style configuration even
if it seems in-tree to the user.

True. I'm currently working on supporting a just-built Clang and my plan
was for the in-tree libc++ to recursively add itself as an external project.
As you said this is technically "out-of-tree" but not to the users
perspective.


(2) lots of people use libcxx without clang. Removing support for libcxx
building out of tree is unreasonable for end users. It might be reasonable
for developers, but end users need that ability.

Not sure what Clang has to do with this. The only difference is that
end-users would have to checkout libc++ inside LLVM, not beside it.


On Tue, Aug 9, 2016 at 7:14 PM, Chris Bieneman <beanz at apple.com> wrote:
>
>
> On Aug 9, 2016, at 6:01 PM, Eric Fiselier <eric at efcs.ca> wrote:
>
> I don't see who would benefit from having libc++/libc++abi in the
megarepo
> since they are not coupled to either LLVM or Clang. Upstream changes to
> other projects have no effect on libc++ and vise-versa so there is no need
> to keep them in sync.
>
> For this reason libc++ should stay a separate repository, which can be
> included in the megarepo as a sub-module. This avoids concerns about
> increasing the cost of checking out and building libc++.
>
> There has also been a secondary discussion about libc++ supporting
> out-of-tree builds, I'm unconvinced by arguments about the cost of
cloning
> LLVM when you only want libc++. Building and testing libc++ already
> requires a LLVM checkout somewhere on the machine for LIT and  CMake
> modules, so the additional cost is already there. I would be OK dropping
> out-of-tree support for libc++ since they are hard to configure correctly
> and offer little benefit over in-tree builds.
>
>
> Two reasons libcxx needs to support out of tree builds.
>
> (1) from a build system perspective we need to be able to build libcxx
> using just-built clang which means an out-of-tree style configuration even
> if it seems in-tree to the user.
> (2) lots of people use libcxx without clang. Removing support for libcxx
> building out of tree is unreasonable for end users. It might be reasonable
> for developers, but end users need that ability.
>
> -Chris
>
>
> /Eric
>
>
> On Tue, Aug 9, 2016 at 5:32 PM, Chris Bieneman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Can we please stop with the attempts at persuading me to a mono-repo
>> approach? I really hate that my voicing criticism has resulted in a big
>> “let’s sway beanz” effort.
>>
>> Justin, the reality is I don’t think the benefits of a mono-repo
justify
>> the costs if it includes all projects. I think the cost of a mono-repo
>> which excludes the runtime projects is lower, so I dislike it less. I
can
>> even see (and agree with) some arguments in favor of LLVM, Clang and
LLD
>> being in the same repository, but I don’t see it as solving a problem
that
>> needs to be solved.
>>
>> Mehdi’s performance data and git-svn workflow does *nothing* to win me
>> over to your argument. All it says is that if we do your proposal I
*might*
>> be able to keep the same git-svn workflows I use today. I say “might”
>> because nobody has addressed my original concerns about whether or not
that
>> workflow would be dropped if we move to a PR based model, or how we
would
>> support something similar in a PR model. I also think that the
mono-repo
>> might discourage pull requests to the runtimes projects from users that
>> don’t use clang, which concerns me. Either way Mehdi’s information
isn’t
>> going to get me to support your idea over the other proposal which
offers
>> me actual workflow improvements.
>>
>> It isn’t even going to get me to a “meh”. While I don’t think the
>> proposal should be valued based on whether or not it gives me
specifically
>> benefit, if it provides no benefit to me maintaining the status quo is
>> better than a change from my perspective.
>>
>> All that aside, I really don’t think anyone should be investing that
much
>> time trying to appease me. While I am a loud voice, and I’m flattered
that
>> people seem to want to make me change my opinions they are in fact just
>> opinions and preferences.
>>
>> At the end of the day we need some empirical data to drive this
decision,
>> and I don’t consider the anecdotes on these threads to be that data. 
While
>> it is useful to hear the first hand opinions of people, it would be
great
>> if we had some actual data. Even with quantifiable data it is unlikely
that
>> I would consider the mono-repo an ideal workflow for myself, but it
*could*
>> be the right solution for the community.  I would like to think that as
a
>> community we will all be able to put the “greater good” above our own
>> preferences.
>>
>> -Chris
>>
>>
>> > On Aug 9, 2016, at 2:12 PM, Justin Lebar via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a
>> > "meh" from your corner.’.  I didn’t feel like that was
helping the
>> > conversation along.
>> >
>> > Sorry if I offended anyone with this or sent the wrong message.  I
was
>> > trying to say, beanz was originally a strong, categorical opponent
to
>> > the monorepo.  After some discussion, he became not strongly
opposed
>> > to a monorepo, so long as it didn't contain the runtime
libraries.
>> > Now Mehdi had a proposal that I was hoping would take him to
>> > "not-strongly-opposed" to a monorepo that did contain
the runtime
>> > libraries.  Given where we came from, I would be very happy with
that
>> > outcome.
>> >
>> > On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>> >>
>> >> On Aug 9, 2016, at 1:57 PM, Pete Cooper <peter_cooper at
apple.com>
>> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 1:55 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> (2) If I’m stuck using git-svn I kinda feel like there is no
real
>> point in
>> >> changing anything.
>> >>
>> >>
>> >> No real point *for you specifically*.
>> >>
>> >> But the vast majority of people would not be stuck using
git-svn.  And
>> >> in addition the LLVM project would not be stuck using svn,
with all
>> >> the baggage, hosting issues, workflow issues (for people other
than
>> >> you), etc.
>> >>
>> >> The bar by which this proposal should be measured is not
"is it a net
>> >> gain for beanz?"  :)  I think we'd be thrilled with a
"meh" from your
>> >> corner.
>> >>
>> >> Justin, I don’t think this conversation is really going
anywhere.
>> >>
>> >>
>> >> I’m not sure what you’re referring to exactly, but in the
context of
>> "this
>> >> thread isn’t getting anywhere”, I strongly disagree.
>> >>
>> >> Sorry, I was specially replying to 'I think we'd be
thrilled with a
>> "meh"
>> >> from your corner.’.  I didn’t feel like that was helping the
>> conversation
>> >> along.
>> >>
>> >>
>> >> OK, I agree with you then :)
>> >>
>> >>
>> >> I agree with everything else you say about actually talking
about the
>> >> different proposals.  I hope my point is well received that we
really
>> do
>> >> need to eventually describe the impact to daily workflow, once
the
>> proposals
>> >> are far enough along to do so.
>> >>
>> >>
>> >> I agree with you also on this. I voiced in the past (on IRC
toward
>> >> Justin/David probably) that the proposal should include
examples of
>> workflow
>> >> and how they translate to whatever the proposal will be.
>> >>
>> >> Cheers,
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >> Pete
>> >>
>> >>
>> >> I believe that the recent workflow tests I performed (see my
last
>> emails in
>> >> this thread) are proof that this thread has been productive,
and I
>> believe
>> >> discussing here and hearing concerns from people (Chris and
others) are
>> >> necessary before getting a proposal fleshed out and having a
survey.
>> >>
>> >> Having a survey without getting to the end of *what* we want
to survey
>> about
>> >> is non-sense to me.
>> >>
>> >> (That may miss your point, but your point wasn’t clear
either…).
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >>
>> >> Renato already mentioned talking about this at the conference,
and
>> there has
>> >> also been talk of a survey.  I think we need those to see how
the
>> community
>> >> actually feel about the proposals here.
>> >>
>> >> Chris may be the only vocal advocate of an alternative to your
>> proposal, but
>> >> then there are people like me who are quiet because we are
waiting for
>> the
>> >> survey to appear.
>> >>
>> >> I would have been much more vocal if I thought we were
actually going
>> to
>> >> adopt the monorepo, but for now I believe it is still only a
proposal.
>> >>
>> >> Full disclosure, I don’t want a monorepo.  I think it
optimizes for
>> the use
>> >> case where people want to bisect, and I don’t think its
reasonable to
>> push
>> >> on everyone to have a monorepo for those who want to bisect. 
The
>> submodules
>> >> repo has already been demonstrated as one potential solution
to this
>> which
>> >> would allow those who want to bisect to do so, while everyone
else can
>> >> continue to work more or less as they do today.
>> >>
>> >> In terms of the proposals, I think you, Mehdi, Chris, and a
number of
>> others
>> >> have proven that there is almost no technical solution beyond
our
>> reach.
>> >> What we do have are proposals which optimize for different use
cases.
>> Given
>> >> this, I think the most useful thing from my point of view (and
>> hopefully to
>> >> others) would be for those advocating each different solution
to
>> actual give
>> >> short examples of each of the different use cases and how to
support
>> them.
>> >>
>> >> For example:
>> >>
>> >> Monorepo, pushing a change to compiler-rt:
>> >> 1: Git commit …
>> >> 2: Git pull --rebase
>> >> 3: test
>> >> 4 a: Git push /* no commits to any other project so the push
works
>> */.  Goto
>> >> 5
>> >> 4 b: Git push /* someone committed to some other project in
the
>> monorepo.
>> >> Goto 2 */
>> >> 5: Done
>> >>
>> >> I know that this example appears negative in the case where
someone
>> else
>> >> committed to another project and a rebase is required, but
thats
>> exactly the
>> >> point.  This is showing that this particular scenario is
potentially a
>> >> problem compared to today and/or other proposals.  A similar
workflow
>> could
>> >> (should) be written for the sparse checkout monorepo, GitHub
monorepo
>> with
>> >> svn, and submodules cases.  The submodules case will likely
show that
>> >> bisecting is more complex than on the monorepo, while pushing
is
>> simpler.
>> >>
>> >> Similarly, the submodules workflow probably isn’t capable of a
single
>> commit
>> >> to llvm and clang in the revlock case while the monorepo is,
but we as
>> a
>> >> community need to decide whether we want to optimize for that
or not.
>> I
>> >> don’t have any data to suggest that revlock commits are
>> frequent/infrequent
>> >> or even a problem in general, and I don’t think we should
optimize for
>> that
>> >> case unless its worth doing so.
>> >>
>> >> Only by actually showing the use cases we care about can the
community
>> make
>> >> an educated decision about what these proposals actually mean
to our
>> daily
>> >> workflow.  We can then choose what we are optimizing for.  I
>> personally want
>> >> to have a very simple list of repo’s to clone from (or just
one!) and
>> for
>> >> pushing to be easy, because those are the actions I perform
the most
>> often.
>> >> Others will have different use cases they care about and they
can
>> choose the
>> >> proposal which suits them best.
>> >>
>> >> Cheers,
>> >> Pete
>> >>
>> >>
>> >> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at
apple.com>
>> wrote:
>> >>
>> >>
>> >> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>> >>
>> >>
>> >> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at
apple.com> wrote:
>> >>
>> >>
>> >>
>> >> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>> >>
>> >>
>> >> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at
google.com> wrote:
>> >>
>> >> Thanks for your thoughts, Chris.
>> >>
>> >> As supporting evidence of this, I was discussing this thread
yesterday
>> >> around the office yesterday and had quite a few people
responding
>> something
>> >> along the lines of “they’re proposing what?”.
>> >>
>> >>
>> >> I hope they'll join us in this thread.
>> >>
>> >> Ultimately a survey is going to be strongly biased in favor of
"don't
>> >> change anything".  There is a strong psychological bias
to weight
>> >> losses more than gains, so if one doesn't engage with the
issue, it's
>> >> only natural to conclude "keep it as similar as possible
to what it is
>> >> today -- that is safe."  But that line of thinking does
not
>> >> necessarily lead us to the best outcome.
>> >>
>> >>
>> >> I don’t agree with this assertion. I believe that if you put
forth
>> multiple
>> >> proposals, and have an articulate discussion of the merits and
costs
>> of each
>> >> solution you can create a survey that can help inform decision
making.
>> I
>> >> suppose we can agree to disagree.
>> >>
>> >>
>> >> We've heard in thread from a lot of developers about how a
monorepo
>> >> would improve their workflow.  I would love to hear from some
>> >> developers who are actually affected in the way you describe,
rather
>> >> than just considering the hypothetical.
>> >>
>> >> My expectation is that the effect of the monorepo on said
developers
>> >> would be relatively small -- we're talking about 1gb of
disk space.  I
>> >> understand that there's a "yuck" factor to this,
but inasmuch as there
>> >> aren't other concrete effects, this is just change
aversion.  And
>> >> essentially all of the other effects of the monorepo can be
hidden via
>> >> sparse checkouts, as we've discussed.
>> >>
>> >> Maybe I am wrong.  But I don't think we're going to
get to the bottom
>> >> of it without actually engaging with people who are actually
affected
>> >> in the way you posit.
>> >>
>> >>
>> >> Ok, let me describe a few workflows I’ve used in the last year
that
>> are (in
>> >> my mind) adversely impacted by a mono-repo.
>> >>
>> >> Case Study 1 - Simple development on a sub-project
>> >>
>> >> I build LLVM + Clang + Compiler-RT using the just-built Clang
to build
>> >> Compiler-RT. I iterate on some complicated Compiler-RT changes
over a
>> period
>> >> of a day. Once my Compiler-RT changes are done I rebase the
compiler-rt
>> >> repo, rebuild compiler-rt then commit.
>> >>
>> >> With a mono-repo rebasing the checkout means rebasing the
whole tree.
>> So,
>> >> either I have to wrangle some crazy git or CMake foo, or when
I run
>> “ninja
>> >> compiler-rt” after the rebase it will rebuild LLVM and Clang
too. That
>> kinda
>> >> sucks.
>> >>
>> >> What this example illustrates to me is that today we have
loosely
>> coupled
>> >> projects with an occasional rev lock. Moving to a mono-repo
enforces a
>> tight
>> >> coupling that isn’t strictly required today.
>> >>
>> >> Case Study 2 - Working on a sub-project in isolation across
many
>> platforms
>> >>
>> >> I did a lot of work on Compiler-RT last year that had no
direct
>> dependency
>> >> on any other LLVM project. During the development I was
working with a
>> >> Compiler-RT checkout and a build directory of just
Compiler-RT. Every
>> once
>> >> in a while (or every other day as it were) I would make a
change that
>> would
>> >> break a configuration that I wasn’t directly developing on. My
>> workflow for
>> >> handling those cases was:
>> >>
>> >> (1) Spin up a VM on a VPS that closely matched the
configuration I
>> broke
>> >> (2) Checkout Compiler-RT
>> >> (3) Reproduce, debug, fix the failure
>> >> (4) Commit the patch from the VM
>> >>
>> >> In a mono-repository doing this would require checking out
*all*
>> >> sub-projects, not just Compiler-RT. I imagine this probably
isn’t a
>> common
>> >> workflow, but it is one I use that would be adversely impacted
by
>> needing to
>> >> checkout a full LLVM. Now, you might say I could check out the
>> sub-project
>> >> mirror, but then I can’t commit from the VM, which kinda
sucks.
>> >>
>> >>
>> >> So for the “I spin a VM and want to make a commit but don’t
want to
>> download
>> >> a few hundred MBs with a git clone” story, it turns out that
the github
>> >> bridge with SVN helps to optimize with a “lean” checkout:
>> >>
>> >> I fork the unified repo here:
>> >> https://github.com/joker-eph/llvm-project/commits/master and
then:
>> svn co
>> >> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>> >>
>> >> So that’s a net “no regression” compared to the current state
:)
>> >>
>> >>
>> >> Is the github SVN interface's "co" magically as
fast as a git clone?
>> >>
>> >>
>> >> $ time svn co  https://github.com/joker-eph/l
>> lvm-project/trunk/compiler-rt
>> >> ….
>> >> real 0m8.539s user 0m0.919s  sys 0m1.917s
>> >> $ time git clone https://github.com/joker-eph/compiler-rt.git
>> >> real 0m5.487s user 0m1.208s sys 0m0.825s
>> >>
>> >>
>> >> That’s actually not terrible! Color me impressed.
>> >>
>> >>
>> >>
>> >> If not, it is a performance regression because today I use git
clone
>> and
>> >> git-svn on my VMs just like on my physical machines, and
either way it
>> adds
>> >> some crazy complexity.
>> >>
>> >>
>> >> No problem, I get it, exactly same workflow as today:
>> >>
>> >>
>> >> Yep. Which isn’t bad. I do however have two concerns.
>> >>
>> >> (1) What happens if we move to pull request-based workflows?
Do we
>> still
>> >> support this workflow?
>> >> (2) If I’m stuck using git-svn I kinda feel like there is no
real
>> point in
>> >> changing anything. I dislike this workflow less than the
earlier
>> proposals,
>> >> but I see no reason to move to this instead of staying on SVN
(other
>> than
>> >> the hosting issues which could be solved in other ways).
>> >>
>> >> -Chris
>> >>
>> >>
>> >> # Clone from the single read-only git repo
>> >> $ git clone https://github.com/joker-eph/compiler-rt.git
>> >> …
>> >> # Configure the SVN remote and initialize the svn metadata
>> >> $ cd compiler-rt
>> >> $ git svn init https://github.com/joker-eph/l
>> lvm-project/trunk/compiler-rt
>> >> —username>> >> $ git config svn-remote.svn.fetch
:refs/remotes/origin/master
>> >> $ git svn rebase -l
>> >> ...
>> >> # Remove and empty file and commit with git
>> >> $ git rm empty
>> >> $ git commit -m "remove empty file"
>> >> # commit/push with svn to the unified git repo
>> >> $ git svn dcommit
>> >> Committing to https://github.com/joker-eph/l
>> lvm-project/trunk/compiler-rt
>> >> ...
>> >> D empty
>> >> Committed r354148
>> >>
>> >>
>> >> Here is the commit:
>> >> https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf
>> 3c33153d91be9b556143b49911ebe
>> >>
>> >>
>> >> —
>> >> Mehdi
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> While admittedly you do get a linear history with using the
>> mono-repository,
>> >> that isn’t the only way to solve the problem, and I don’t
really think
>> that
>> >> the benefit (not needing to write some tooling) justifies the
increased
>> >> burden applied to contributors that don’t use the full LLVM
family of
>> >> projects.
>> >>
>> >>
>> >> I think the trade-off you're considering here (cost to
developers who
>> >> use llvm plus a version-locked subrepo vs. cost to developers
who
>> >> don't want an llvm clone) is the right one.
>> >>
>> >>
>> >> I actually think there are *a lot* more considerations we need
to be
>> making
>> >> for an infrastructure change like this. While it is true that
our SCM
>> >> hosting strategy primarily impacts developers, it also impacts
our
>> users. We
>> >> should be conscious of the impact to downstream users in
making
>> >> infrastructure changes like this. That is part of why the idea
of a
>> survey
>> >> holds appeal to me; it would give us the opportunity to get
feedback
>> from a
>> >> much wider audience than the current “people on llvm-dev who
haven’t
>> been
>> >> scared away”.
>> >>
>> >> But as someone who has
>> >> extensively used git submodules and repo (a wrapper script), I
>> >> strongly disagree with the judgement that a monorepo would not
be a
>> >> significant improvement.
>> >>
>> >> Our primary disagreement, I think, is over how much cost there
is to
>> >> "writing some tooling".  To me, this is a
significant barrier standing
>> >> in the way of developer productivity.  Here at Google I did a
quick
>> >> survey, and more than half of us don't have scripts of the
sort that
>> >> Justin Bogner described.  We are all just floundering around
rebasing
>> >> clang and llvm until it compiles.  It *sucks*.
>> >>
>> >>
>> >> I actually think we’re both talking about solutions that
require
>> tooling,
>> >> and while we *could* be disagreeing over how much effort each
tooling
>> >> initiative would require (I think they’re pretty close, so I
don’t
>> care to
>> >> have that argument), my actual disagreement with your proposal
is that
>> it is
>> >> a change that impacts developers and users universally and I
don’t
>> think
>> >> that it is justified. Simply put, I don’t feel that the
benefits are
>> >> substantial enough to warrant the kind of disruptive change
you’re
>> >> proposing.
>> >>
>> >>
>> >> I suggest that saying that all of these developers are
"doing it
>> >> wrong" is not helpful.
>> >>
>> >>
>> >> Maybe I’m missing something, but I don’t think I said anyone
was
>> “doing it
>> >> wrong”. Bisecting across multiple git repositories isn’t a
great
>> experience.
>> >> But neither is bisecting across a half dozen separate folders
in an SVN
>> >> repository. Both the submodule solution and the mono-repo
solution
>> solve
>> >> this problem equivalently well.
>> >>
>> >> Not everyone has the git and python/bash chops
>> >> to write the necessary scripts.  Not everyone has the
personality to
>> >> obsessively script around stuff, or the desire to maintain
said
>> >> scripts.  Not everyone works on llvm/clang so much that
it's worth
>> >> adopting a special-snowflake workflow.  And some of us --
myself
>> >> included -- have extensive git scripts which work with the
standard
>> >> git workflow but would be completely broken by adding a custom
level
>> >> of indirection around git.
>> >>
>> >> When put this way, maybe it's clear that it's actually
a niche set of
>> >> people for whom "script around the brokenness" is a
good solution.
>> >>
>> >>
>> >> I’m not sure what “brokenness” you’re referring to. We have a
>> collection of
>> >> loosely connected projects by design. As a result of that
intentional
>> design
>> >> certain workflows will be impacted. I don’t think that is
brokenness. I
>> >> think our loose coupling is a feature even if it makes some
workflows
>> >> harder.
>> >>
>> >> -Chris
>> >>
>> >>
>> >> As I've said a bunch of times above, we have to weigh a
cost paid by
>> >> all of us every time we type a command that starts with
"git" --
>> >> something we do tens or hundreds of times a day -- versus the
one-time
>> >> cost of asking people to download 1gb of data.
>> >>
>> >> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> I’m just now catching up on this massive thread after being on
>> vacation last
>> >> week, and I have a few thoughts I’d like to share.
>> >>
>> >> First and foremost please don’t consider lack of dissent on
the thread
>> as
>> >> presence of consensus. The various git-related threads on
LLVM-dev
>> lately
>> >> have been so active and contentious that I think a lot of
people are
>> zoning
>> >> out on the conversations. As supporting evidence of this, I
was
>> discussing
>> >> this thread yesterday around the office yesterday and had
quite a few
>> people
>> >> responding something along the lines of “they’re proposing
what?”.
>> >>
>> >> I think it would be great for us to have several different
proposals
>> for how
>> >> the git-transition could work, and have a survey to get
people’s
>> opinions. I
>> >> know this has been discussed repeatedly, and I want to put in
my vote
>> in
>> >> favor of having a survey that takes into account multiple
different
>> >> approaches.
>> >>
>> >> WRT the actual proposal in this thread, I’m strongly opposed
to a
>> >> mono-repository. While I understand the argument that the full
clone’s
>> cost
>> >> on disk space is minimal compared to an LLVM object directory,
what
>> about
>> >> for contributors that contribute to the smaller runtimes
projects but
>> *not*
>> >> to LLVM or Clang. A contributor that only contributes to
libcxx or
>> >> compiler-rt being forced to do a full clone of all the LLVM
projects in
>> >> order to push a patch kinda sucks.
>> >>
>> >> I want to point out a few workflows people may not be
considering.
>> >>
>> >> Clang can be built against an installed LLVM. I know this
workflow is
>> used
>> >> by some people because I’ve broken it in the past and had to
fix it.
>> With a
>> >> mono-repo this workflow gets a bit more complicated because
you’d need
>> to do
>> >> sparse checkouts, and it probably means we should just nuke
the
>> workflow
>> >> entirely because there is no real value added by having it.
>> >>
>> >> Compiler-RT’s sanitizers are used with GCC; no LLVM required.
While
>> for the
>> >> common use case maintaining sparse repository mirrors would
limit
>> impact of
>> >> this on users, should any GCC user want to contribute to
Compiler-RT,
>> you’re
>> >> forcing them to clone a much larger repository than necessary.
>> >>
>> >> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>> >> libcxxabi, libunwind, and potentially any other runtime
library
>> projects
>> >> that we may create in the future.
>> >>
>> >> Beyond all that I want to point out that the git
multi-repository
>> story is
>> >> basically the same thing we have today with SVN except for the
absence
>> of a
>> >> monotonically increasing number that corresponds across
repositories.
>> While
>> >> admittedly you do get a linear history with using the
mono-repository,
>> that
>> >> isn’t the only way to solve the problem, and I don’t really
think that
>> the
>> >> benefit (not needing to write some tooling) justifies the
increased
>> burden
>> >> applied to contributors that don’t use the full LLVM family of
>> projects.
>> >>
>> >> I think we have some pretty strong evidence in the form of the
github
>> fork
>> >> counts (https://github.com/llvm-mirror/) that most people
aren’t
>> using all
>> >> of the LLVM projects. In fact, by that evidence Clang (the
second most
>> >> popular project) is forked less than 2/3 as many times as
LLVM.
>> >>
>> >> -Chris
>> >>
>> >>
>> >> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Even if it were possible, I would still keep my upstream
checkout
>> >> separate just as a safety measure, to keep from sending
private stuff
>> >> upstream by accident.
>> >>
>> >>
>> >> Just FYI, this is our (Azul's) workflow as well, and for
similar
>> >> reasons.
>> >>
>> >>
>> >> Same here.
>> >>
>> >> cheers,
>> >> --renato
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >>
>> >>
>> >>
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/86564a30/attachment.html>

Chris Bieneman via llvm-dev

2016-Aug-10 04:21 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Aug 9, 2016, at 6:56 PM, Eric Fiselier <eric at efcs.ca> wrote:
> 
> By "in-tree" all I mean is the libc++ source directory exists
within llvm/projects, nothing more.
> 
> (1) from a build system perspective we need to be able to build libcxx
using just-built clang which means an out-of-tree style configuration even if it
seems in-tree to the user.
> 
> True. I'm currently working on supporting a just-built Clang and my
plan was for the in-tree libc++ to recursively add itself as an external
project.
> As you said this is technically "out-of-tree" but not to the
users perspective.
This almost works with the new LLVM runtimes directory. I have some WIP patches
to libcxx that I need to finish to get it fully there, but that isn't
relevant to the discussion at hand.
> 
> 
> (2) lots of people use libcxx without clang. Removing support for libcxx
building out of tree is unreasonable for end users. It might be reasonable for
developers, but end users need that ability.
> 
> Not sure what Clang has to do with this. The only difference is that
end-users would have to checkout libc++ inside LLVM, not beside it.
Today end users don't need to check out LLVM at all, and forcing users (like
OS distributions) to check out LLVM when they aren't using it would be
highly undesirable, and I don't see a justification.

-Chris
> 
>> On Tue, Aug 9, 2016 at 7:14 PM, Chris Bieneman <beanz at
apple.com> wrote:
>> 
>> 
>>> On Aug 9, 2016, at 6:01 PM, Eric Fiselier <eric at efcs.ca>
wrote:
>>> 
>>> I don't see who would benefit from having libc++/libc++abi in
the megarepo since they are not coupled to either LLVM or Clang. Upstream
changes to other projects have no effect on libc++ and vise-versa so there is no
need to keep them in sync.
>>> 
>>> For this reason libc++ should stay a separate repository, which can
be included in the megarepo as a sub-module. This avoids concerns about
increasing the cost of checking out and building libc++.
>>> 
>>> There has also been a secondary discussion about libc++ supporting
out-of-tree builds, I'm unconvinced by arguments about the cost of cloning
LLVM when you only want libc++. Building and testing libc++ already requires a
LLVM checkout somewhere on the machine for LIT and  CMake modules, so the
additional cost is already there. I would be OK dropping out-of-tree support for
libc++ since they are hard to configure correctly and offer little benefit over
in-tree builds.
>> 
>> Two reasons libcxx needs to support out of tree builds.
>> 
>> (1) from a build system perspective we need to be able to build libcxx
using just-built clang which means an out-of-tree style configuration even if it
seems in-tree to the user.
>> (2) lots of people use libcxx without clang. Removing support for
libcxx building out of tree is unreasonable for end users. It might be
reasonable for developers, but end users need that ability.
>> 
>> -Chris
>> 
>>> 
>>> /Eric
>>> 
>>> 
>>>> On Tue, Aug 9, 2016 at 5:32 PM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>> Can we please stop with the attempts at persuading me to a
mono-repo approach? I really hate that my voicing criticism has resulted in a
big “let’s sway beanz” effort.
>>>> 
>>>> Justin, the reality is I don’t think the benefits of a
mono-repo justify the costs if it includes all projects. I think the cost of a
mono-repo which excludes the runtime projects is lower, so I dislike it less. I
can even see (and agree with) some arguments in favor of LLVM, Clang and LLD
being in the same repository, but I don’t see it as solving a problem that needs
to be solved.
>>>> 
>>>> Mehdi’s performance data and git-svn workflow does *nothing* to
win me over to your argument. All it says is that if we do your proposal I
*might* be able to keep the same git-svn workflows I use today. I say “might”
because nobody has addressed my original concerns about whether or not that
workflow would be dropped if we move to a PR based model, or how we would
support something similar in a PR model. I also think that the mono-repo might
discourage pull requests to the runtimes projects from users that don’t use
clang, which concerns me. Either way Mehdi’s information isn’t going to get me
to support your idea over the other proposal which offers me actual workflow
improvements.
>>>> 
>>>> It isn’t even going to get me to a “meh”. While I don’t think
the proposal should be valued based on whether or not it gives me specifically
benefit, if it provides no benefit to me maintaining the status quo is better
than a change from my perspective.
>>>> 
>>>> All that aside, I really don’t think anyone should be investing
that much time trying to appease me. While I am a loud voice, and I’m flattered
that people seem to want to make me change my opinions they are in fact just
opinions and preferences.
>>>> 
>>>> At the end of the day we need some empirical data to drive this
decision, and I don’t consider the anecdotes on these threads to be that data. 
While it is useful to hear the first hand opinions of people, it would be great
if we had some actual data. Even with quantifiable data it is unlikely that I
would consider the mono-repo an ideal workflow for myself, but it *could* be the
right solution for the community.  I would like to think that as a community we
will all be able to put the “greater good” above our own preferences.
>>>> 
>>>> -Chris
>>>> 
>>>> 
>>>> > On Aug 9, 2016, at 2:12 PM, Justin Lebar via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>> >
>>>> >> Sorry, I was specially replying to 'I think
we'd be thrilled with a
>>>> > "meh" from your corner.’.  I didn’t feel like
that was helping the
>>>> > conversation along.
>>>> >
>>>> > Sorry if I offended anyone with this or sent the wrong
message.  I was
>>>> > trying to say, beanz was originally a strong, categorical
opponent to
>>>> > the monorepo.  After some discussion, he became not
strongly opposed
>>>> > to a monorepo, so long as it didn't contain the
runtime libraries.
>>>> > Now Mehdi had a proposal that I was hoping would take him
to
>>>> > "not-strongly-opposed" to a monorepo that did
contain the runtime
>>>> > libraries.  Given where we came from, I would be very
happy with that
>>>> > outcome.
>>>> >
>>>> > On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini
<mehdi.amini at apple.com> wrote:
>>>> >>
>>>> >> On Aug 9, 2016, at 1:57 PM, Pete Cooper
<peter_cooper at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Aug 9, 2016, at 1:55 PM, Mehdi Amini
<mehdi.amini at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >>
>>>> >> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >> (2) If I’m stuck using git-svn I kinda feel like there
is no real point in
>>>> >> changing anything.
>>>> >>
>>>> >>
>>>> >> No real point *for you specifically*.
>>>> >>
>>>> >> But the vast majority of people would not be stuck
using git-svn.  And
>>>> >> in addition the LLVM project would not be stuck using
svn, with all
>>>> >> the baggage, hosting issues, workflow issues (for
people other than
>>>> >> you), etc.
>>>> >>
>>>> >> The bar by which this proposal should be measured is
not "is it a net
>>>> >> gain for beanz?"  :)  I think we'd be
thrilled with a "meh" from your
>>>> >> corner.
>>>> >>
>>>> >> Justin, I don’t think this conversation is really
going anywhere.
>>>> >>
>>>> >>
>>>> >> I’m not sure what you’re referring to exactly, but in
the context of "this
>>>> >> thread isn’t getting anywhere”, I strongly disagree.
>>>> >>
>>>> >> Sorry, I was specially replying to 'I think
we'd be thrilled with a "meh"
>>>> >> from your corner.’.  I didn’t feel like that was
helping the conversation
>>>> >> along.
>>>> >>
>>>> >>
>>>> >> OK, I agree with you then :)
>>>> >>
>>>> >>
>>>> >> I agree with everything else you say about actually
talking about the
>>>> >> different proposals.  I hope my point is well received
that we really do
>>>> >> need to eventually describe the impact to daily
workflow, once the proposals
>>>> >> are far enough along to do so.
>>>> >>
>>>> >>
>>>> >> I agree with you also on this. I voiced in the past
(on IRC toward
>>>> >> Justin/David probably) that the proposal should
include examples of workflow
>>>> >> and how they translate to whatever the proposal will
be.
>>>> >>
>>>> >> Cheers,
>>>> >>
>>>> >> —
>>>> >> Mehdi
>>>> >>
>>>> >>
>>>> >>
>>>> >> Pete
>>>> >>
>>>> >>
>>>> >> I believe that the recent workflow tests I performed
(see my last emails in
>>>> >> this thread) are proof that this thread has been
productive, and I believe
>>>> >> discussing here and hearing concerns from people
(Chris and others) are
>>>> >> necessary before getting a proposal fleshed out and
having a survey.
>>>> >>
>>>> >> Having a survey without getting to the end of *what*
we want to survey about
>>>> >> is non-sense to me.
>>>> >>
>>>> >> (That may miss your point, but your point wasn’t clear
either…).
>>>> >>
>>>> >> —
>>>> >> Mehdi
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> Renato already mentioned talking about this at the
conference, and there has
>>>> >> also been talk of a survey.  I think we need those to
see how the community
>>>> >> actually feel about the proposals here.
>>>> >>
>>>> >> Chris may be the only vocal advocate of an alternative
to your proposal, but
>>>> >> then there are people like me who are quiet because we
are waiting for the
>>>> >> survey to appear.
>>>> >>
>>>> >> I would have been much more vocal if I thought we were
actually going to
>>>> >> adopt the monorepo, but for now I believe it is still
only a proposal.
>>>> >>
>>>> >> Full disclosure, I don’t want a monorepo.  I think it
optimizes for the use
>>>> >> case where people want to bisect, and I don’t think
its reasonable to push
>>>> >> on everyone to have a monorepo for those who want to
bisect.  The submodules
>>>> >> repo has already been demonstrated as one potential
solution to this which
>>>> >> would allow those who want to bisect to do so, while
everyone else can
>>>> >> continue to work more or less as they do today.
>>>> >>
>>>> >> In terms of the proposals, I think you, Mehdi, Chris,
and a number of others
>>>> >> have proven that there is almost no technical solution
beyond our reach.
>>>> >> What we do have are proposals which optimize for
different use cases.  Given
>>>> >> this, I think the most useful thing from my point of
view (and hopefully to
>>>> >> others) would be for those advocating each different
solution to actual give
>>>> >> short examples of each of the different use cases and
how to support them.
>>>> >>
>>>> >> For example:
>>>> >>
>>>> >> Monorepo, pushing a change to compiler-rt:
>>>> >> 1: Git commit …
>>>> >> 2: Git pull --rebase
>>>> >> 3: test
>>>> >> 4 a: Git push /* no commits to any other project so
the push works */.  Goto
>>>> >> 5
>>>> >> 4 b: Git push /* someone committed to some other
project in the monorepo.
>>>> >> Goto 2 */
>>>> >> 5: Done
>>>> >>
>>>> >> I know that this example appears negative in the case
where someone else
>>>> >> committed to another project and a rebase is required,
but thats exactly the
>>>> >> point.  This is showing that this particular scenario
is potentially a
>>>> >> problem compared to today and/or other proposals.  A
similar workflow could
>>>> >> (should) be written for the sparse checkout monorepo,
GitHub monorepo with
>>>> >> svn, and submodules cases.  The submodules case will
likely show that
>>>> >> bisecting is more complex than on the monorepo, while
pushing is simpler.
>>>> >>
>>>> >> Similarly, the submodules workflow probably isn’t
capable of a single commit
>>>> >> to llvm and clang in the revlock case while the
monorepo is, but we as a
>>>> >> community need to decide whether we want to optimize
for that or not.  I
>>>> >> don’t have any data to suggest that revlock commits
are frequent/infrequent
>>>> >> or even a problem in general, and I don’t think we
should optimize for that
>>>> >> case unless its worth doing so.
>>>> >>
>>>> >> Only by actually showing the use cases we care about
can the community make
>>>> >> an educated decision about what these proposals
actually mean to our daily
>>>> >> workflow.  We can then choose what we are optimizing
for.  I personally want
>>>> >> to have a very simple list of repo’s to clone from (or
just one!) and for
>>>> >> pushing to be easy, because those are the actions I
perform the most often.
>>>> >> Others will have different use cases they care about
and they can choose the
>>>> >> proposal which suits them best.
>>>> >>
>>>> >> Cheers,
>>>> >> Pete
>>>> >>
>>>> >>
>>>> >> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman
<beanz at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Aug 9, 2016, at 10:08 AM, Mehdi Amini
<mehdi.amini at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz
at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Aug 8, 2016, at 5:09 PM, Mehdi Amini
<mehdi.amini at apple.com> wrote:
>>>> >>
>>>> >>
>>>> >> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via
llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >>
>>>> >> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar
at google.com> wrote:
>>>> >>
>>>> >> Thanks for your thoughts, Chris.
>>>> >>
>>>> >> As supporting evidence of this, I was discussing this
thread yesterday
>>>> >> around the office yesterday and had quite a few people
responding something
>>>> >> along the lines of “they’re proposing what?”.
>>>> >>
>>>> >>
>>>> >> I hope they'll join us in this thread.
>>>> >>
>>>> >> Ultimately a survey is going to be strongly biased in
favor of "don't
>>>> >> change anything".  There is a strong
psychological bias to weight
>>>> >> losses more than gains, so if one doesn't engage
with the issue, it's
>>>> >> only natural to conclude "keep it as similar as
possible to what it is
>>>> >> today -- that is safe."  But that line of
thinking does not
>>>> >> necessarily lead us to the best outcome.
>>>> >>
>>>> >>
>>>> >> I don’t agree with this assertion. I believe that if
you put forth multiple
>>>> >> proposals, and have an articulate discussion of the
merits and costs of each
>>>> >> solution you can create a survey that can help inform
decision making. I
>>>> >> suppose we can agree to disagree.
>>>> >>
>>>> >>
>>>> >> We've heard in thread from a lot of developers
about how a monorepo
>>>> >> would improve their workflow.  I would love to hear
from some
>>>> >> developers who are actually affected in the way you
describe, rather
>>>> >> than just considering the hypothetical.
>>>> >>
>>>> >> My expectation is that the effect of the monorepo on
said developers
>>>> >> would be relatively small -- we're talking about
1gb of disk space.  I
>>>> >> understand that there's a "yuck" factor
to this, but inasmuch as there
>>>> >> aren't other concrete effects, this is just change
aversion.  And
>>>> >> essentially all of the other effects of the monorepo
can be hidden via
>>>> >> sparse checkouts, as we've discussed.
>>>> >>
>>>> >> Maybe I am wrong.  But I don't think we're
going to get to the bottom
>>>> >> of it without actually engaging with people who are
actually affected
>>>> >> in the way you posit.
>>>> >>
>>>> >>
>>>> >> Ok, let me describe a few workflows I’ve used in the
last year that are (in
>>>> >> my mind) adversely impacted by a mono-repo.
>>>> >>
>>>> >> Case Study 1 - Simple development on a sub-project
>>>> >>
>>>> >> I build LLVM + Clang + Compiler-RT using the
just-built Clang to build
>>>> >> Compiler-RT. I iterate on some complicated Compiler-RT
changes over a period
>>>> >> of a day. Once my Compiler-RT changes are done I
rebase the compiler-rt
>>>> >> repo, rebuild compiler-rt then commit.
>>>> >>
>>>> >> With a mono-repo rebasing the checkout means rebasing
the whole tree. So,
>>>> >> either I have to wrangle some crazy git or CMake foo,
or when I run “ninja
>>>> >> compiler-rt” after the rebase it will rebuild LLVM and
Clang too. That kinda
>>>> >> sucks.
>>>> >>
>>>> >> What this example illustrates to me is that today we
have loosely coupled
>>>> >> projects with an occasional rev lock. Moving to a
mono-repo enforces a tight
>>>> >> coupling that isn’t strictly required today.
>>>> >>
>>>> >> Case Study 2 - Working on a sub-project in isolation
across many platforms
>>>> >>
>>>> >> I did a lot of work on Compiler-RT last year that had
no direct dependency
>>>> >> on any other LLVM project. During the development I
was working with a
>>>> >> Compiler-RT checkout and a build directory of just
Compiler-RT. Every once
>>>> >> in a while (or every other day as it were) I would
make a change that would
>>>> >> break a configuration that I wasn’t directly
developing on. My workflow for
>>>> >> handling those cases was:
>>>> >>
>>>> >> (1) Spin up a VM on a VPS that closely matched the
configuration I broke
>>>> >> (2) Checkout Compiler-RT
>>>> >> (3) Reproduce, debug, fix the failure
>>>> >> (4) Commit the patch from the VM
>>>> >>
>>>> >> In a mono-repository doing this would require checking
out *all*
>>>> >> sub-projects, not just Compiler-RT. I imagine this
probably isn’t a common
>>>> >> workflow, but it is one I use that would be adversely
impacted by needing to
>>>> >> checkout a full LLVM. Now, you might say I could check
out the sub-project
>>>> >> mirror, but then I can’t commit from the VM, which
kinda sucks.
>>>> >>
>>>> >>
>>>> >> So for the “I spin a VM and want to make a commit but
don’t want to download
>>>> >> a few hundred MBs with a git clone” story, it turns
out that the github
>>>> >> bridge with SVN helps to optimize with a “lean”
checkout:
>>>> >>
>>>> >> I fork the unified repo here:
>>>> >>
https://github.com/joker-eph/llvm-project/commits/master and then:  svn co
>>>> >>
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>>>> >>
>>>> >> So that’s a net “no regression” compared to the
current state :)
>>>> >>
>>>> >>
>>>> >> Is the github SVN interface's "co"
magically as fast as a git clone?
>>>> >>
>>>> >>
>>>> >> $ time svn co 
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>>>> >> ….
>>>> >> real 0m8.539s user 0m0.919s  sys 0m1.917s
>>>> >> $ time git clone
https://github.com/joker-eph/compiler-rt.git
>>>> >> real 0m5.487s user 0m1.208s sys 0m0.825s
>>>> >>
>>>> >>
>>>> >> That’s actually not terrible! Color me impressed.
>>>> >>
>>>> >>
>>>> >>
>>>> >> If not, it is a performance regression because today I
use git clone and
>>>> >> git-svn on my VMs just like on my physical machines,
and either way it adds
>>>> >> some crazy complexity.
>>>> >>
>>>> >>
>>>> >> No problem, I get it, exactly same workflow as today:
>>>> >>
>>>> >>
>>>> >> Yep. Which isn’t bad. I do however have two concerns.
>>>> >>
>>>> >> (1) What happens if we move to pull request-based
workflows? Do we still
>>>> >> support this workflow?
>>>> >> (2) If I’m stuck using git-svn I kinda feel like there
is no real point in
>>>> >> changing anything. I dislike this workflow less than
the earlier proposals,
>>>> >> but I see no reason to move to this instead of staying
on SVN (other than
>>>> >> the hosting issues which could be solved in other
ways).
>>>> >>
>>>> >> -Chris
>>>> >>
>>>> >>
>>>> >> # Clone from the single read-only git repo
>>>> >> $ git clone
https://github.com/joker-eph/compiler-rt.git
>>>> >> …
>>>> >> # Configure the SVN remote and initialize the svn
metadata
>>>> >> $ cd compiler-rt
>>>> >> $ git svn init
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>>>> >> —username>>>> >> $ git config
svn-remote.svn.fetch :refs/remotes/origin/master
>>>> >> $ git svn rebase -l
>>>> >> ...
>>>> >> # Remove and empty file and commit with git
>>>> >> $ git rm empty
>>>> >> $ git commit -m "remove empty file"
>>>> >> # commit/push with svn to the unified git repo
>>>> >> $ git svn dcommit
>>>> >> Committing to
https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>>>> >> ...
>>>> >> D empty
>>>> >> Committed r354148
>>>> >>
>>>> >>
>>>> >> Here is the commit:
>>>> >>
https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>>>> >>
>>>> >>
>>>> >> —
>>>> >> Mehdi
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> While admittedly you do get a linear history with
using the mono-repository,
>>>> >> that isn’t the only way to solve the problem, and I
don’t really think that
>>>> >> the benefit (not needing to write some tooling)
justifies the increased
>>>> >> burden applied to contributors that don’t use the full
LLVM family of
>>>> >> projects.
>>>> >>
>>>> >>
>>>> >> I think the trade-off you're considering here
(cost to developers who
>>>> >> use llvm plus a version-locked subrepo vs. cost to
developers who
>>>> >> don't want an llvm clone) is the right one.
>>>> >>
>>>> >>
>>>> >> I actually think there are *a lot* more considerations
we need to be making
>>>> >> for an infrastructure change like this. While it is
true that our SCM
>>>> >> hosting strategy primarily impacts developers, it also
impacts our users. We
>>>> >> should be conscious of the impact to downstream users
in making
>>>> >> infrastructure changes like this. That is part of why
the idea of a survey
>>>> >> holds appeal to me; it would give us the opportunity
to get feedback from a
>>>> >> much wider audience than the current “people on
llvm-dev who haven’t been
>>>> >> scared away”.
>>>> >>
>>>> >> But as someone who has
>>>> >> extensively used git submodules and repo (a wrapper
script), I
>>>> >> strongly disagree with the judgement that a monorepo
would not be a
>>>> >> significant improvement.
>>>> >>
>>>> >> Our primary disagreement, I think, is over how much
cost there is to
>>>> >> "writing some tooling".  To me, this is a
significant barrier standing
>>>> >> in the way of developer productivity.  Here at Google
I did a quick
>>>> >> survey, and more than half of us don't have
scripts of the sort that
>>>> >> Justin Bogner described.  We are all just floundering
around rebasing
>>>> >> clang and llvm until it compiles.  It *sucks*.
>>>> >>
>>>> >>
>>>> >> I actually think we’re both talking about solutions
that require tooling,
>>>> >> and while we *could* be disagreeing over how much
effort each tooling
>>>> >> initiative would require (I think they’re pretty
close, so I don’t care to
>>>> >> have that argument), my actual disagreement with your
proposal is that it is
>>>> >> a change that impacts developers and users universally
and I don’t think
>>>> >> that it is justified. Simply put, I don’t feel that
the benefits are
>>>> >> substantial enough to warrant the kind of disruptive
change you’re
>>>> >> proposing.
>>>> >>
>>>> >>
>>>> >> I suggest that saying that all of these developers are
"doing it
>>>> >> wrong" is not helpful.
>>>> >>
>>>> >>
>>>> >> Maybe I’m missing something, but I don’t think I said
anyone was “doing it
>>>> >> wrong”. Bisecting across multiple git repositories
isn’t a great experience.
>>>> >> But neither is bisecting across a half dozen separate
folders in an SVN
>>>> >> repository. Both the submodule solution and the
mono-repo solution solve
>>>> >> this problem equivalently well.
>>>> >>
>>>> >> Not everyone has the git and python/bash chops
>>>> >> to write the necessary scripts.  Not everyone has the
personality to
>>>> >> obsessively script around stuff, or the desire to
maintain said
>>>> >> scripts.  Not everyone works on llvm/clang so much
that it's worth
>>>> >> adopting a special-snowflake workflow.  And some of us
-- myself
>>>> >> included -- have extensive git scripts which work with
the standard
>>>> >> git workflow but would be completely broken by adding
a custom level
>>>> >> of indirection around git.
>>>> >>
>>>> >> When put this way, maybe it's clear that it's
actually a niche set of
>>>> >> people for whom "script around the
brokenness" is a good solution.
>>>> >>
>>>> >>
>>>> >> I’m not sure what “brokenness” you’re referring to. We
have a collection of
>>>> >> loosely connected projects by design. As a result of
that intentional design
>>>> >> certain workflows will be impacted. I don’t think that
is brokenness. I
>>>> >> think our loose coupling is a feature even if it makes
some workflows
>>>> >> harder.
>>>> >>
>>>> >> -Chris
>>>> >>
>>>> >>
>>>> >> As I've said a bunch of times above, we have to
weigh a cost paid by
>>>> >> all of us every time we type a command that starts
with "git" --
>>>> >> something we do tens or hundreds of times a day --
versus the one-time
>>>> >> cost of asking people to download 1gb of data.
>>>> >>
>>>> >> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via
llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >> I’m just now catching up on this massive thread after
being on vacation last
>>>> >> week, and I have a few thoughts I’d like to share.
>>>> >>
>>>> >> First and foremost please don’t consider lack of
dissent on the thread as
>>>> >> presence of consensus. The various git-related threads
on LLVM-dev lately
>>>> >> have been so active and contentious that I think a lot
of people are zoning
>>>> >> out on the conversations. As supporting evidence of
this, I was discussing
>>>> >> this thread yesterday around the office yesterday and
had quite a few people
>>>> >> responding something along the lines of “they’re
proposing what?”.
>>>> >>
>>>> >> I think it would be great for us to have several
different proposals for how
>>>> >> the git-transition could work, and have a survey to
get people’s opinions. I
>>>> >> know this has been discussed repeatedly, and I want to
put in my vote in
>>>> >> favor of having a survey that takes into account
multiple different
>>>> >> approaches.
>>>> >>
>>>> >> WRT the actual proposal in this thread, I’m strongly
opposed to a
>>>> >> mono-repository. While I understand the argument that
the full clone’s cost
>>>> >> on disk space is minimal compared to an LLVM object
directory, what about
>>>> >> for contributors that contribute to the smaller
runtimes projects but *not*
>>>> >> to LLVM or Clang. A contributor that only contributes
to libcxx or
>>>> >> compiler-rt being forced to do a full clone of all the
LLVM projects in
>>>> >> order to push a patch kinda sucks.
>>>> >>
>>>> >> I want to point out a few workflows people may not be
considering.
>>>> >>
>>>> >> Clang can be built against an installed LLVM. I know
this workflow is used
>>>> >> by some people because I’ve broken it in the past and
had to fix it. With a
>>>> >> mono-repo this workflow gets a bit more complicated
because you’d need to do
>>>> >> sparse checkouts, and it probably means we should just
nuke the workflow
>>>> >> entirely because there is no real value added by
having it.
>>>> >>
>>>> >> Compiler-RT’s sanitizers are used with GCC; no LLVM
required. While for the
>>>> >> common use case maintaining sparse repository mirrors
would limit impact of
>>>> >> this on users, should any GCC user want to contribute
to Compiler-RT, you’re
>>>> >> forcing them to clone a much larger repository than
necessary.
>>>> >>
>>>> >> The same problem with Compiler-RT’s sanitizers also
applies to libcxx,
>>>> >> libcxxabi, libunwind, and potentially any other
runtime library projects
>>>> >> that we may create in the future.
>>>> >>
>>>> >> Beyond all that I want to point out that the git
multi-repository story is
>>>> >> basically the same thing we have today with SVN except
for the absence of a
>>>> >> monotonically increasing number that corresponds
across repositories. While
>>>> >> admittedly you do get a linear history with using the
mono-repository, that
>>>> >> isn’t the only way to solve the problem, and I don’t
really think that the
>>>> >> benefit (not needing to write some tooling) justifies
the increased burden
>>>> >> applied to contributors that don’t use the full LLVM
family of projects.
>>>> >>
>>>> >> I think we have some pretty strong evidence in the
form of the github fork
>>>> >> counts (https://github.com/llvm-mirror/) that most
people aren’t using all
>>>> >> of the LLVM projects. In fact, by that evidence Clang
(the second most
>>>> >> popular project) is forked less than 2/3 as many times
as LLVM.
>>>> >>
>>>> >> -Chris
>>>> >>
>>>> >>
>>>> >> On Jul 26, 2016, at 11:31 AM, Renato Golin via
llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>> >>
>>>> >> Even if it were possible, I would still keep my
upstream checkout
>>>> >> separate just as a safety measure, to keep from
sending private stuff
>>>> >> upstream by accident.
>>>> >>
>>>> >>
>>>> >> Just FYI, this is our (Azul's) workflow as well,
and for similar
>>>> >> reasons.
>>>> >>
>>>> >>
>>>> >> Same here.
>>>> >>
>>>> >> cheers,
>>>> >> --renato
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> LLVM Developers mailing list
>>>> >> llvm-dev at lists.llvm.org
>>>> >>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > llvm-dev at lists.llvm.org
>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160809/2d9dbf84/attachment-0001.html>

Simon Taylor via llvm-dev

2016-Aug-10 09:05 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 10 Aug 2016, at 00:32, Chris Bieneman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> […] I say “might” because nobody has addressed my original concerns about
whether or not that workflow would be dropped if we move to a PR based model, or
how we would support something similar in a PR model. I also think that the
mono-repo might discourage pull requests to the runtimes projects from users
that don’t use clang, which concerns me. Either way Mehdi’s information isn’t
going to get me to support your idea over the other proposal which offers me
actual workflow improvements.
From any source of truth, there are ways to project to an unlimited number of
other read-only views of subsets of the repository. Some of those views may be
independent repos for each project, some may combine a subset of those
independent projects together again using submodules, some may be a subset of
projects in monorepo form (clang, llvm, lld for example).

One thought I had is that even if the projections are read-only it may still be
possible to accept pull requests on any of them. It should be a relatively
simple bot that scrapes those read-only pull requests, applies the path changes
so they can be used on the writable “source of truth” (or splits it into
multiple PRs if the “source of truth” is multiple repos and the PR is to some
combined projection), opens new pull request(s) on that writable
repository(ies), and closes the original PR with a reference to the one that
could actually be applied.

I’m not attempting to argue that a complete monorepo with all projects is the
ideal source of truth, but it does seem to me where revlock is important or
where changes touching multiple projects are common then life is simplified by
having the upstream be a monorepo containing those projects. With a PR model
(similar to bisecting, branching, patching, CI integration, etc) it is much
easier to apply cross-cutting changes by accepting a single PR rather than by
specifying that this PR on clang depends on PR xyz on llvm being applied too.

The main debate here is what is the ultimate “source of truth”; which are the
repos that are actually writable and have new commits added to them, and what
other projections can be defined from these. Given projections are possible, the
following question is how usable is it for downstream developers to make use of
these projections? Consumption is obviously straightforward; using them for
contributing is where most of the objections stem from and the best workflows
here are still somewhat unclear (Mehdi’s svn bridge suggestion is certainly a
reasonable workflow, though relying on svn for the commit does feel a little
weird).

So I guess my main point is if a workflow for accepting contributions from
read-only projections can be made simple then the ultimate upstream
representation doesn’t matter all that much.

All sorts of structures are possible, with “all submodules” and “all monorepo”
simply the two possible extremes.

Simon

Renato Golin via llvm-dev

2016-Aug-10 12:30 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

Folks,

Can we get the proposal settled u stead of discussing the merits of it?
Those in favour should write a document in /Proposals, propose on
phabricator, and get it ready for the survey.

I really really don't want to overshoot the survey and not having the two
proposals up would be a major blow to everything we're trying to achieve.

We need data. Discussions on the list won't give us that, the survey will.
The monorepo can be discussed to death and we won't have final agreement,
so please, let's get something written down and let's leave the opinions
for the survey and the US meeting.

It's already the 10th, we really need the survey to be out in ~20 days max.
Can we all focus on getting that rolling?

Thanks!
Renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160810/1c929219/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Apparently Analagous Threads