thr3ads.net - llvm dev - [llvm-dev] [RFC] One or many git repositories? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Dean Michael Berris via llvm-dev

2016-Jul-29 12:04 UTC

[llvm-dev] [RFC] One or many git repositories?

> On 29 Jul 2016, at 21:58, David Chisnall <david.chisnall at
cl.cam.ac.uk> wrote:
> 
> On 29 Jul 2016, at 12:35, Dean Michael Berris <dean.berris at
gmail.com> wrote:
>> 
>> I understand this, but why isn't "the repo you're
interested in" just the megarepo (or monorepo) where every LLVM project
resides?
> 
> Your assumption is a downstream user of LLVM.  As previously pointed out,
we have downstream users of libc++ and the sanitizer runtimes who compile with
gcc.  For a downstream user of LLVM, the cost of getting everything else is in
the noise.  For a downstream user of libc++ who may want to contribute upstream,
the overhead is huge.
> 
Even then, are we seriously ignoring the fact that even if you did clone the
whole repository including everything, that you can still build just the libc++
and sanitiser runtimes if you wanted to? Why is this "noise" of any
importance to the users who get what they want and then some?

I know some people use only numbered releases of LLVM and the projects. They can
keep using those as long as LLVM provides them.

Is it really impossible to just build non-LLVM dependent versions of libc++ or
the sanitiser runtimes if they reside in one git megarepo?

Renato Golin via llvm-dev

2016-Jul-29 12:47 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 29 July 2016 at 13:04, Dean Michael Berris via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Is it really impossible to just build non-LLVM dependent versions of libc++
or the sanitiser runtimes if they reside in one git megarepo?
The more intricate the relationship between the components, the less
we'll test for the alternative solutions.

My use is solely from a toolchain point of view. For me, having it all
in one blob would be perfect, and I would never have to worry about
integrations again. (in a perfect world, etc...)

But a good number of projects (and products) use LLVM trunk (not
releases) and they use in slightly different ways. This has driven a
lot of refactoring around the libraries over the last few years and I
think it's a positive thing. A good number of *upstream* developers
contribute to LLVM under those premises, and the harder we make for
them, the less of them we'll have. I don't think that's a wise move.

Furthermore, losing the ability to clearly separate things makes them
become one disparate group, rather than two independent ones.

cheers,
--renato

Robinson, Paul via llvm-dev

2016-Jul-29 14:26 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Dean
> Michael Berris via llvm-dev
> Sent: Friday, July 29, 2016 5:04 AM
> To: David Chisnall
> Cc: LLVM Developers; Bruce Hoult
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
> 
> 
> > On 29 Jul 2016, at 21:58, David Chisnall <david.chisnall at
cl.cam.ac.uk>
> wrote:
> >
> > On 29 Jul 2016, at 12:35, Dean Michael Berris <dean.berris at
gmail.com>
> wrote:
> >>
> >> I understand this, but why isn't "the repo you're
interested in" just
> the megarepo (or monorepo) where every LLVM project resides?
> >
> > Your assumption is a downstream user of LLVM.  As previously pointed
> out, we have downstream users of libc++ and the sanitizer runtimes who
> compile with gcc.  For a downstream user of LLVM, the cost of getting
> everything else is in the noise.  For a downstream user of libc++ who may
> want to contribute upstream, the overhead is huge.
> >
> 
> Even then, are we seriously ignoring the fact that even if you did clone
> the whole repository including everything, that you can still build just
> the libc++ and sanitiser runtimes if you wanted to?
Is it that easy to build a subset of a large checked-out tree?  I haven't
tried it but my impression is: not so much.  Certainly the advertised
tactics for configuring/building don't tell you how to do that.  Somebody
figuring out what it takes would be very constructive here, instead of
just asserting it can't possibly be that hard.
> Why is this "noise" of
> any importance to the users who get what they want and then some?
You want to drive to work? Here, have this semi-trailer; everything
you want and then some.

I believe David Chisnall up-thread cited a difference in checkout times
on the order of a handful of seconds versus a couple of minutes.  While
naively it might seem not a big deal, over time and depending on what you
are trying to do yes it can be a big burden.

For example right now I have a glitch somewhere in my merge process.
It's taking an extra 10-12 seconds longer to do something than I think
it should, per commit.  NBD right?  Except when you're 100 commits behind
and trying to catch up, now you're talking about >15 minutes wasted.
Again in the grand scheme of things 15 minutes doesn't seem like much
but it seriously affects my productivity; it's actually hard to come up
with tasks that small that I can context-switch to and back easily.
Interruptions like that really are bad for your ability to concentrate
on the intellectual task of getting your patch to work.
--paulr
> 
> I know some people use only numbered releases of LLVM and the projects.
> They can keep using those as long as LLVM provides them.
> 
> Is it really impossible to just build non-LLVM dependent versions of
> libc++ or the sanitiser runtimes if they reside in one git megarepo?
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Bruce Hoult via llvm-dev

2016-Jul-29 14:51 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On Sat, Jul 30, 2016 at 2:26 AM, Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> > Even then, are we seriously ignoring the fact that even if you did
clone
> > the whole repository including everything, that you can still build
just
> > the libc++ and sanitiser runtimes if you wanted to?
>
> Is it that easy to build a subset of a large checked-out tree?  I
haven't
> tried it but my impression is: not so much.  Certainly the advertised
> tactics for configuring/building don't tell you how to do that. 
Somebody
> figuring out what it takes would be very constructive here, instead of
> just asserting it can't possibly be that hard.
>
Right now, no. The build system assumes that if you checked someone out
then you want to build it.

This needs to change.

 I believe David Chisnall up-thread cited a difference in checkout times
> on the order of a handful of seconds versus a couple of minutes.  While
> naively it might seem not a big deal, over time and depending on what you
> are trying to do yes it can be a big burden
>
That's a one time cost, not every time you do an update.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160730/456e6bab/attachment.html>

Renato Golin via llvm-dev

2016-Jul-29 14:52 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

On 29 July 2016 at 15:26, Robinson, Paul via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I believe David Chisnall up-thread cited a difference in checkout times
> on the order of a handful of seconds versus a couple of minutes. While
> naively it might seem not a big deal, over time and depending on what you
> are trying to do yes it can be a big burden.
TL;DR: This thread is dead. Let's move on.

I think the biggest fallacy in this thread is that changing process is cheap.

It is certainly cheap for me to do "git foo" instead of "git
bar" from
now on. It's moderately expensive to change my buildbot
configurations, Zorg's builders and re-test everything for public CI.
It's a lot more expensive to change how distributions build their
hundreds of thousands of packages over multiple LTS releases, or how
downstream users like Sony, Apple or ARM re-factor their entire build
systems (which very likely link to a lot of non-LLVM stuff), and then
some.

None of that is impossible, most of that is a "one off". Most of the
companies and big projects "could" afford to do that.

But there are two big points that people like me, Paul and David have
been unsuccessfully trying to make obvious:

1. Not every LLVM user is as big as FreeBSD, Sony or Apple. There are
a lot of very interesting projects (hobbyists, academia, professional)
using Clang, LLVM, libc++, etc. that don't have the staff to do that
move. Being a hobbyist myself, I know too well that, when a library
radically changes the way they behave (like boost did every new
release about 10 years ago), I will stop using it.

2. Changes in complex systems have unwanted larger consequences. Build
systems are some of the most complex systems in existence because
they're mostly irrational and patched together with duct tape and
paper clips. What may be very simple for some build systems, could be
impossible for others, and that's not the other's team's fault.

So, if you have a complex build system yourself, and you spent some
time and have figured out that it would be easy, you *cannot* assume
it should be easy for everyone with an less or equally complex build
systems.

If you find it simple to change your own workflow towards this or that
solution, you *cannot* assume everyone else should feel the same. This
also doesn't diminish their intelligence or competence. Intelligent
and competent people work in very different ways, and it's actually
because of that fact that we can do such complex software works in a
multitude of systems. If we were all equal, we wouldn't need to
discuss anything. :)

Mehdi said very early, and repeated many time, on some of the threads,
something to the effect of: "Saying how hard or easy it is for you is
an invalid argument, we need more concrete facts".

I absolutely agree with that statement, but interpreting how easy or
hard concrete facts would be fall on the same fallacy, so it doesn't
bring us closer to consensus, it brings us closer to dissent.

That is why I think this thread has already surpassed it's usefulness
(for a long time), and we need a concrete write up on the proposal. (I
hear it's in progress, let's wait for it).
>From now on, I'd propose the discussion to be *just* about thisspecific proposal, preferably over a Phabricator review on the
document. People that have strong opinions about it should wait for
the survey.

Just to reiterate, the survey is to collect opinions in a formal and
non-passionate manner. It will not be a "majority vote", and we're
not
locked between these two solutions as they're absolutely drawn out in
the documents, nor we are forced to take any decision if the community
is clearly split. The last think I want is to destroy part of the
community while trying to make it better.

But this long thread is not doing any good either.

cheers,
--renato

Mehdi Amini via llvm-dev

2016-Jul-29 17:06 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On Jul 29, 2016, at 7:26 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Dean
>> Michael Berris via llvm-dev
>> Sent: Friday, July 29, 2016 5:04 AM
>> To: David Chisnall
>> Cc: LLVM Developers; Bruce Hoult
>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>> 
>> 
>>> On 29 Jul 2016, at 21:58, David Chisnall <david.chisnall at
cl.cam.ac.uk>
>> wrote:
>>> 
>>> On 29 Jul 2016, at 12:35, Dean Michael Berris <dean.berris at
gmail.com>
>> wrote:
>>>> 
>>>> I understand this, but why isn't "the repo you're
interested in" just
>> the megarepo (or monorepo) where every LLVM project resides?
>>> 
>>> Your assumption is a downstream user of LLVM.  As previously
pointed
>> out, we have downstream users of libc++ and the sanitizer runtimes who
>> compile with gcc.  For a downstream user of LLVM, the cost of getting
>> everything else is in the noise.  For a downstream user of libc++ who
may
>> want to contribute upstream, the overhead is huge.
>>> 
>> 
>> Even then, are we seriously ignoring the fact that even if you did
clone
>> the whole repository including everything, that you can still build
just
>> the libc++ and sanitiser runtimes if you wanted to?
> 
> Is it that easy to build a subset of a large checked-out tree?  I
haven't
> tried it but my impression is: not so much.
If the layout is flat, what difficulty do you expect compared to today’s
situation?


>  Certainly the advertised
> tactics for configuring/building don't tell you how to do that. 
Somebody
> figuring out what it takes would be very constructive here, instead of
> just asserting it can't possibly be that hard.
> 
>> Why is this "noise" of
>> any importance to the users who get what they want and then some?
> 
> You want to drive to work? Here, have this semi-trailer; everything
> you want and then some.
> 
> I believe David Chisnall up-thread cited a difference in checkout times
> on the order of a handful of seconds versus a couple of minutes.  While
> naively it might seem not a big deal, over time and depending on what you
> are trying to do yes it can be a big burden.
There are still the read-only views, and the shallow clones that address
non-commiters cases.
For commiters, this is a one time cost, I have some difficulty to consider this
seriously a “burden”.


> 
> For example right now I have a glitch somewhere in my merge process.
> It's taking an extra 10-12 seconds longer to do something than I think
> it should, per commit.  NBD right?  Except when you're 100 commits
behind
> and trying to catch up, now you're talking about >15 minutes wasted.
I don’t really understand what you’re talking about, but for downstream with
complex integration (like we do), a single monorepo should be an important
simplification of the process.
(Otherwise you can still integrate for the read-only repos anyway).

— 
Mehdi

> Again in the grand scheme of things 15 minutes doesn't seem like much
> but it seriously affects my productivity; it's actually hard to come up
> with tasks that small that I can context-switch to and back easily.
> Interruptions like that really are bad for your ability to concentrate
> on the intellectual task of getting your patch to work.
> --paulr
> 
>> 
>> I know some people use only numbered releases of LLVM and the projects.
>> They can keep using those as long as LLVM provides them.
>> 
>> Is it really impossible to just build non-LLVM dependent versions of
>> libc++ or the sanitiser runtimes if they reside in one git megarepo?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/e9365dd0/attachment.html>

Dean Michael Berris via llvm-dev

2016-Jul-30 06:10 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 29 Jul 2016, at 22:47, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 29 July 2016 at 13:04, Dean Michael Berris via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Is it really impossible to just build non-LLVM dependent versions of
libc++ or the sanitiser runtimes if they reside in one git megarepo?
> 
> The more intricate the relationship between the components, the less
> we'll test for the alternative solutions.
> 
I agree with this gem of an insight, thank you.

But that doesn't mean we wouldn't test for those -- just that we should
be vigilant about it and do make sure we support the various use-cases we
already do, and then some.
> My use is solely from a toolchain point of view. For me, having it all
> in one blob would be perfect, and I would never have to worry about
> integrations again. (in a perfect world, etc...)
> 
> But a good number of projects (and products) use LLVM trunk (not
> releases) and they use in slightly different ways. This has driven a
> lot of refactoring around the libraries over the last few years and I
> think it's a positive thing. A good number of *upstream* developers
> contribute to LLVM under those premises, and the harder we make for
> them, the less of them we'll have. I don't think that's a wise
move.
> 
I don't see how making it a mono-repo would make it harder for them (LLVM
developers) to keep things un-broken for this use-case *if* we have
infrastructure already testing the standalone builds (which, AFAICT, we do,
because I've broken them a couple of times now :D). Note this is predicated
on making sure we do have explicit tests for these situations and I 100% agree
that we should have those.

But that is beside the point of whether we have a mega-repo or 100 different
smaller ones. (I exaggerate, we only have ~10 or so, I've already lost
count). In fact I think having the many "independent" repositories
makes it harder to test (as is already the case).
> Furthermore, losing the ability to clearly separate things makes them
> become one disparate group, rather than two independent ones.
> 
Can you elaborate more on why keeping things separate is beneficial to:

- Current and future LLVM vertical developers (those working on LLVM, Clang,
compiler-rt, parallel_libs, etc.)
- Downstream users who have to keep track of the separate projects and
repositories in their local workflows
- Casual contributors who find bugs and want to help clean something up

From someone who's new to all this LLVM development, I'd really like to
understand why it _seems_ like we really want to keep the status quo of
"too hard to make changes and maintain". I understand the
"engineering tradeoff" between not changing something that's
already working, but there's also the principle of "continuous
improvement" -- i.e. if a megarepo makes the development process simpler
and enables us to support *more* downstream users *better*, maybe that's a
strictly better situation than what we have now?

Cheers

Dean Michael Berris via llvm-dev

2016-Jul-30 06:19 UTC

head link

[llvm-dev] [RFC] One or many git repositories?

> On 30 Jul 2016, at 00:26, Robinson, Paul <paul.robinson at sony.com>
wrote:
> 
> 
> 
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Dean
>> Michael Berris via llvm-dev
>> Sent: Friday, July 29, 2016 5:04 AM
>> To: David Chisnall
>> Cc: LLVM Developers; Bruce Hoult
>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>> 
>> 
>>> On 29 Jul 2016, at 21:58, David Chisnall <david.chisnall at
cl.cam.ac.uk>
>> wrote:
>>> 
>>> On 29 Jul 2016, at 12:35, Dean Michael Berris <dean.berris at
gmail.com>
>> wrote:
>>>> 
>>>> I understand this, but why isn't "the repo you're
interested in" just
>> the megarepo (or monorepo) where every LLVM project resides?
>>> 
>>> Your assumption is a downstream user of LLVM.  As previously
pointed
>> out, we have downstream users of libc++ and the sanitizer runtimes who
>> compile with gcc.  For a downstream user of LLVM, the cost of getting
>> everything else is in the noise.  For a downstream user of libc++ who
may
>> want to contribute upstream, the overhead is huge.
>>> 
>> 
>> Even then, are we seriously ignoring the fact that even if you did
clone
>> the whole repository including everything, that you can still build
just
>> the libc++ and sanitiser runtimes if you wanted to?
> 
> Is it that easy to build a subset of a large checked-out tree?  I
haven't
> tried it but my impression is: not so much.
I tried it for compiler-rt hosted in an LLVM checkout and it works just fine. I
can't say for other libraries/tools in LLVM but if it isn't then
that's something worth fixing (if that's something that's explicitly
supported by the community).
>  Certainly the advertised
> tactics for configuring/building don't tell you how to do that. 
Somebody
> figuring out what it takes would be very constructive here, instead of
> just asserting it can't possibly be that hard.
> 
Sorry, I wasn't asserting anything, I was conjecturing (if that's even a
word).
>> Why is this "noise" of
>> any importance to the users who get what they want and then some?
> 
> You want to drive to work? Here, have this semi-trailer; everything
> you want and then some.
> 
I think that's a tenuous analogy -- if I only have to drive to work *once*
and then get a new faster and easier to navigate mode of transport once I get
there (maybe because the workplace will provide a better way once you've
gotten there), then sure, I'll take that trailer and haul some stuff along
the way too. :)
> I believe David Chisnall up-thread cited a difference in checkout times
> on the order of a handful of seconds versus a couple of minutes.  While
> naively it might seem not a big deal, over time and depending on what you
> are trying to do yes it can be a big burden.
> 
Sorry, not checkout times -- clone times.

Or did you mean to be using SVN still? In that case then you should still be
able to use the per-project mirror on GitHub using the SVN interface.

If you were going to use git, then you clone it once and then 'git pull
--rebase upstream master' or something similar.
> For example right now I have a glitch somewhere in my merge process.
> It's taking an extra 10-12 seconds longer to do something than I think
> it should, per commit.  NBD right?  Except when you're 100 commits
behind
> and trying to catch up, now you're talking about >15 minutes wasted.
> Again in the grand scheme of things 15 minutes doesn't seem like much
> but it seriously affects my productivity; it's actually hard to come up
> with tasks that small that I can context-switch to and back easily.
> Interruptions like that really are bad for your ability to concentrate
> on the intellectual task of getting your patch to work.
I find this too if I do "git svn rebase". What I do today now though
is I pull from the git mirrors, then "git svn rebase -l" -- the git
pull takes a lot less time, and the local rebase just works like lightning.

Your mileage may vary.

Cheers

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jul 2016 - [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

[llvm-dev] [RFC] One or many git repositories?

Maybe Matching Threads