thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Justin Lebar via llvm-dev

2016-Jul-27 20:32 UTC

[llvm-dev] [RFC] One or many git repositories?

Thanks for elaborating, Chris.
> Case Study 1 - Simple development on a sub-project
I explicitly addressed this workflow in my original e-mail.  I know it
was a while ago, but it sounds like it may be worth a read if you
haven't checked it out.

In the mail I described how to use sparse checkouts to create a
repository structure that functions virtually identically to what you
have today.  It takes a few copy-pastable commands to set up.  If
these few commands are a pain, we can write a script and check it in
to llvm.
> Case Study 2 - Working on a sub-project in isolation across many platforms
I am less clear on what exactly this is about, but it seems to me that
a sparse checkout would mitigate most or all of the issues you raise
here, as well.  Again, a sparse checkout is three copy-pasteable
commands.
> We should be conscious of the impact to downstream users in making
infrastructure changes like this.
I agree.  The proposal to continue the read-only llvm-mirror
repositories will help minimize the effect on read-only downstream
consumers.
> I think our loose coupling is a feature even if it makes some workflows
harder.
If this is something that you want in your checkout of the monorepo,
it is something you can have using sparse checkouts.  It takes a small
amount of one-time work on your part when you clone the repo.  If it's
a problem, we can reduce to running a single command.

I understand that running a single command still isn't zero cost to
you.  I also understand that you may not see the benefit that others
see in the monorepo.  That's cool.  But those of us who do want a
monorepo have no way to get it today, whereas those who want a
multirepo can get something that behaves very similar by configuring
their monorepo.

On Wed, Jul 27, 2016 at 12:50 PM, Chris Bieneman <beanz at apple.com>
wrote:>
>> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com>
wrote:
>>
>> Thanks for your thoughts, Chris.
>>
>>> As supporting evidence of this, I was discussing this thread
yesterday around the office yesterday and had quite a few people responding
something along the lines of “they’re proposing what?”.
>>
>> I hope they'll join us in this thread.
>>
>> Ultimately a survey is going to be strongly biased in favor of
"don't
>> change anything".  There is a strong psychological bias to weight
>> losses more than gains, so if one doesn't engage with the issue,
it's
>> only natural to conclude "keep it as similar as possible to what
it is
>> today -- that is safe."  But that line of thinking does not
>> necessarily lead us to the best outcome.
>
> I don’t agree with this assertion. I believe that if you put forth multiple
proposals, and have an articulate discussion of the merits and costs of each
solution you can create a survey that can help inform decision making. I suppose
we can agree to disagree.
>
>>
>> We've heard in thread from a lot of developers about how a monorepo
>> would improve their workflow.  I would love to hear from some
>> developers who are actually affected in the way you describe, rather
>> than just considering the hypothetical.
>>
>> My expectation is that the effect of the monorepo on said developers
>> would be relatively small -- we're talking about 1gb of disk space.
I
>> understand that there's a "yuck" factor to this, but
inasmuch as there
>> aren't other concrete effects, this is just change aversion.  And
>> essentially all of the other effects of the monorepo can be hidden via
>> sparse checkouts, as we've discussed.
>>
>> Maybe I am wrong.  But I don't think we're going to get to the
bottom
>> of it without actually engaging with people who are actually affected
>> in the way you posit.
>
> Ok, let me describe a few workflows I’ve used in the last year that are (in
my mind) adversely impacted by a mono-repo.
>
> Case Study 1 - Simple development on a sub-project
>
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
Compiler-RT. I iterate on some complicated Compiler-RT changes over a period of
a day. Once my Compiler-RT changes are done I rebase the compiler-rt repo,
rebuild compiler-rt then commit.
>
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
either I have to wrangle some crazy git or CMake foo, or when I run “ninja
compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
sucks.
>
> What this example illustrates to me is that today we have loosely coupled
projects with an occasional rev lock. Moving to a mono-repo enforces a tight
coupling that isn’t strictly required today.
>
> Case Study 2 - Working on a sub-project in isolation across many platforms
>
> I did a lot of work on Compiler-RT last year that had no direct dependency
on any other LLVM project. During the development I was working with a
Compiler-RT checkout and a build directory of just Compiler-RT. Every once in a
while (or every other day as it were) I would make a change that would break a
configuration that I wasn’t directly developing on. My workflow for handling
those cases was:
>
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
>
> In a mono-repository doing this would require checking out *all*
sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
workflow, but it is one I use that would be adversely impacted by needing to
checkout a full LLVM. Now, you might say I could check out the sub-project
mirror, but then I can’t commit from the VM, which kinda sucks.
>
>
>>
>>> While admittedly you do get a linear history with using the
mono-repository, that isn’t the only way to solve the problem, and I don’t
really think that the benefit (not needing to write some tooling) justifies the
increased burden applied to contributors that don’t use the full LLVM family of
projects.
>>
>> I think the trade-off you're considering here (cost to developers
who
>> use llvm plus a version-locked subrepo vs. cost to developers who
>> don't want an llvm clone) is the right one.
>
> I actually think there are *a lot* more considerations we need to be making
for an infrastructure change like this. While it is true that our SCM hosting
strategy primarily impacts developers, it also impacts our users. We should be
conscious of the impact to downstream users in making infrastructure changes
like this. That is part of why the idea of a survey holds appeal to me; it would
give us the opportunity to get feedback from a much wider audience than the
current “people on llvm-dev who haven’t been scared away”.
>
>> But as someone who has
>> extensively used git submodules and repo (a wrapper script), I
>> strongly disagree with the judgement that a monorepo would not be a
>> significant improvement.
>>
>> Our primary disagreement, I think, is over how much cost there is to
>> "writing some tooling".  To me, this is a significant barrier
standing
>> in the way of developer productivity.  Here at Google I did a quick
>> survey, and more than half of us don't have scripts of the sort
that
>> Justin Bogner described.  We are all just floundering around rebasing
>> clang and llvm until it compiles.  It *sucks*.
>
> I actually think we’re both talking about solutions that require tooling,
and while we *could* be disagreeing over how much effort each tooling initiative
would require (I think they’re pretty close, so I don’t care to have that
argument), my actual disagreement with your proposal is that it is a change that
impacts developers and users universally and I don’t think that it is justified.
Simply put, I don’t feel that the benefits are substantial enough to warrant the
kind of disruptive change you’re proposing.
>
>>
>> I suggest that saying that all of these developers are "doing it
>> wrong" is not helpful.
>
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
wrong”. Bisecting across multiple git repositories isn’t a great experience. But
neither is bisecting across a half dozen separate folders in an SVN repository.
Both the submodule solution and the mono-repo solution solve this problem
equivalently well.
>
>>  Not everyone has the git and python/bash chops
>> to write the necessary scripts.  Not everyone has the personality to
>> obsessively script around stuff, or the desire to maintain said
>> scripts.  Not everyone works on llvm/clang so much that it's worth
>> adopting a special-snowflake workflow.  And some of us -- myself
>> included -- have extensive git scripts which work with the standard
>> git workflow but would be completely broken by adding a custom level
>> of indirection around git.
>>
>> When put this way, maybe it's clear that it's actually a niche
set of
>> people for whom "script around the brokenness" is a good
solution.
>
> I’m not sure what “brokenness” you’re referring to. We have a collection of
loosely connected projects by design. As a result of that intentional design
certain workflows will be impacted. I don’t think that is brokenness. I think
our loose coupling is a feature even if it makes some workflows harder.
>
> -Chris
>
>>
>> As I've said a bunch of times above, we have to weigh a cost paid
by
>> all of us every time we type a command that starts with "git"
--
>> something we do tens or hundreds of times a day -- versus the one-time
>> cost of asking people to download 1gb of data.
>>
>> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I’m just now catching up on this massive thread after being on
vacation last
>>> week, and I have a few thoughts I’d like to share.
>>>
>>> First and foremost please don’t consider lack of dissent on the
thread as
>>> presence of consensus. The various git-related threads on LLVM-dev
lately
>>> have been so active and contentious that I think a lot of people
are zoning
>>> out on the conversations. As supporting evidence of this, I was
discussing
>>> this thread yesterday around the office yesterday and had quite a
few people
>>> responding something along the lines of “they’re proposing what?”.
>>>
>>> I think it would be great for us to have several different
proposals for how
>>> the git-transition could work, and have a survey to get people’s
opinions. I
>>> know this has been discussed repeatedly, and I want to put in my
vote in
>>> favor of having a survey that takes into account multiple different
>>> approaches.
>>>
>>> WRT the actual proposal in this thread, I’m strongly opposed to a
>>> mono-repository. While I understand the argument that the full
clone’s cost
>>> on disk space is minimal compared to an LLVM object directory, what
about
>>> for contributors that contribute to the smaller runtimes projects
but *not*
>>> to LLVM or Clang. A contributor that only contributes to libcxx or
>>> compiler-rt being forced to do a full clone of all the LLVM
projects in
>>> order to push a patch kinda sucks.
>>>
>>> I want to point out a few workflows people may not be considering.
>>>
>>> Clang can be built against an installed LLVM. I know this workflow
is used
>>> by some people because I’ve broken it in the past and had to fix
it. With a
>>> mono-repo this workflow gets a bit more complicated because you’d
need to do
>>> sparse checkouts, and it probably means we should just nuke the
workflow
>>> entirely because there is no real value added by having it.
>>>
>>> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While
for the
>>> common use case maintaining sparse repository mirrors would limit
impact of
>>> this on users, should any GCC user want to contribute to
Compiler-RT, you’re
>>> forcing them to clone a much larger repository than necessary.
>>>
>>> The same problem with Compiler-RT’s sanitizers also applies to
libcxx,
>>> libcxxabi, libunwind, and potentially any other runtime library
projects
>>> that we may create in the future.
>>>
>>> Beyond all that I want to point out that the git multi-repository
story is
>>> basically the same thing we have today with SVN except for the
absence of a
>>> monotonically increasing number that corresponds across
repositories. While
>>> admittedly you do get a linear history with using the
mono-repository, that
>>> isn’t the only way to solve the problem, and I don’t really think
that the
>>> benefit (not needing to write some tooling) justifies the increased
burden
>>> applied to contributors that don’t use the full LLVM family of
projects.
>>>
>>> I think we have some pretty strong evidence in the form of the
github fork
>>> counts (https://github.com/llvm-mirror/) that most people aren’t
using all
>>> of the LLVM projects. In fact, by that evidence Clang (the second
most
>>> popular project) is forked less than 2/3 as many times as LLVM.
>>>
>>> -Chris
>>>
>>>
>>> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> Even if it were possible, I would still keep my upstream checkout
>>> separate just as a safety measure, to keep from sending private
stuff
>>> upstream by accident.
>>>
>>>
>>> Just FYI, this is our (Azul's) workflow as well, and for
similar
>>> reasons.
>>>
>>>
>>> Same here.
>>>
>>> cheers,
>>> --renato
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>

Robinson, Paul via llvm-dev

2016-Jul-27 21:24 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Justin Lebar via llvm-dev
> Sent: Wednesday, July 27, 2016 1:32 PM
> To: Chris Bieneman
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
> 
> Thanks for elaborating, Chris.
> 
> > Case Study 1 - Simple development on a sub-project
> 
> I explicitly addressed this workflow in my original e-mail.  I know it
> was a while ago, but it sounds like it may be worth a read if you
> haven't checked it out.
> 
> In the mail I described how to use sparse checkouts to create a
> repository structure that functions virtually identically to what you
> have today.  It takes a few copy-pastable commands to set up.  If
> these few commands are a pain, we can write a script and check it in
> to llvm.
If I try to push from my sparse checkout, but something that I don't
have in the checkout has changed, do I still need to rebase first?
--paulr

Justin Lebar via llvm-dev

2016-Jul-27 21:29 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> If I try to push from my sparse checkout, but something that I don't
> have in the checkout has changed, do I still need to rebase first?
Without trying it, yes, because git commits are git commits.

But you do not have to rebuild and retest everything, nor does the act
of rebasing necessarily update your llvm checkout, if you've made it
independent.  And the rebase will have conflicts if and only if you
would have had rebase conflicts in the multirepo world.

One could have a script that does the rebase and push in one shot if
typing two commands on submit is really a big deal.

FWIW git svn dcommit does the rebase implicitly -- in the monorepo or
multirepo worlds, we are all going to have to type "git rebase" more
often (unless we use a script).

On Wed, Jul 27, 2016 at 2:24 PM, Robinson, Paul <paul.robinson at
sony.com> wrote:>
>
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
>> Justin Lebar via llvm-dev
>> Sent: Wednesday, July 27, 2016 1:32 PM
>> To: Chris Bieneman
>> Cc: llvm-dev at lists.llvm.org
>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>>
>> Thanks for elaborating, Chris.
>>
>> > Case Study 1 - Simple development on a sub-project
>>
>> I explicitly addressed this workflow in my original e-mail.  I know it
>> was a while ago, but it sounds like it may be worth a read if you
>> haven't checked it out.
>>
>> In the mail I described how to use sparse checkouts to create a
>> repository structure that functions virtually identically to what you
>> have today.  It takes a few copy-pastable commands to set up.  If
>> these few commands are a pain, we can write a script and check it in
>> to llvm.
>
> If I try to push from my sparse checkout, but something that I don't
> have in the checkout has changed, do I still need to rebase first?
> --paulr
>

Bruce Hoult via llvm-dev

2016-Jul-27 21:59 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Thu, Jul 28, 2016 at 9:24 AM, Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> > -----Original Message-----
> > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf
Of
> > Justin Lebar via llvm-dev
> > Sent: Wednesday, July 27, 2016 1:32 PM
> > To: Chris Bieneman
> > Cc: llvm-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [RFC] One or many git repositories?
> >
> > Thanks for elaborating, Chris.
> >
> > > Case Study 1 - Simple development on a sub-project
> >
> > I explicitly addressed this workflow in my original e-mail.  I know it
> > was a while ago, but it sounds like it may be worth a read if you
> > haven't checked it out.
> >
> > In the mail I described how to use sparse checkouts to create a
> > repository structure that functions virtually identically to what you
> > have today.  It takes a few copy-pastable commands to set up.  If
> > these few commands are a pain, we can write a script and check it in
> > to llvm.
>
> If I try to push from my sparse checkout, but something that I don't
> have in the checkout has changed, do I still need to rebase first?
>
That's assuming you're pushing directly to master. Very few people
should
be doing that!

If llvm is run like other github projects then you push to a branch in your
own forked copy of llvm. No one else can push to that, so nothing can have
changed.

If you're developing your branch over a long time period then you should
pull from master and rebase on that from time to time, and certainly just
before making a pull request.

The person responding to your pull request will rebase on the latest master
and kick it back to you if there are problems.

If gerrit is used for code review then the UI tells you if your patch is
out of date compared to master and there's a big fat painless
"rebase"
button right there.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160728/f241b16c/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Reasonably Related Threads