thr3ads.net - llvm dev - [llvm-dev] Add more projects into Git monorepo [May 2017]

If this information is useful, please help other people find it:
Share via:

David Chisnall via llvm-dev

2017-May-09 12:47 UTC

[llvm-dev] Add more projects into Git monorepo

On 8 May 2017, at 20:51, Mehdi AMINI <joker.eph at gmail.com>
wrote:> 
> 
> 2017-05-07 1:01 GMT-07:00 David Chisnall via llvm-dev <llvm-dev at
lists.llvm.org>:
> Is this intended to be the monorepo that eventually becomes the official
repo, because if so I strongly object to putting libunwind, libc++ and libc++abi
in it.  I have recently been working on bring-up for libc++ and libunwind on a
new platform and the integration of libunwind with the LLVM build system is
already annoying (you can’t build it unless you have a working C++ standard
library implementation for your target, even thought it’s a dependency for
libc++), having to have a complete LLVM checkout would be even more overhead.
> 
> Please clarify the overhead.
My clone of libunwind is around 4MB.  A clone of LLVM is 2-3 orders of magnitude
bigger.  The clone on my local system doesn’t matter too much (though it would
be an annoying waste), because I have spare disk space, but each project, once
it’s working, also gets cloned to our CI system, which is always short on disk
space because it archives build artefacts.  Network bandwidth is also an issue.

There’s also the secondary issue that it is valuable to be able to build these
components out of tree, yet this is currently fragile and is likely to be broken
even more if we’re insisting on the monorepo.

We are currently able to target our platform from LLVM (as a cross-compiler),
but not build LLVM to run on it, so it is unhelpful to have stuff that we
compile for x86 and stuff that we compile for our target in the same repo,
because we aggregate the stuff that we build for the target (libunwind, libc++,
and so on) when we build images.

Finally, there’s the philosophical / software engineering issue.  There should
be no tight coupling between libunwind and anything else in the LLVM tree. 
Libunwind implements a set of well-documented and stable APIs.  These are used
by other components, but are equally useful in other contexts (i.e. any compiler
for any language that uses the Itanium unwind model).  From the perspective of
someone hacking on libunwind, LLVM is an unrelated project (though one that
shares coding conventions - an analogy would be two projects under the Apache
umbrella) and there is absolutely no reason to insist that libunwind developers
should clone a massive unrelated project to work on the code that they want to
work on.

All of this applies to libc++ and libc++abi as well.

David

Mehdi AMINI via llvm-dev

2017-May-09 14:58 UTC

head link

[llvm-dev] Add more projects into Git monorepo

2017-05-09 5:47 GMT-07:00 David Chisnall <David.Chisnall at cl.cam.ac.uk>:
> On 8 May 2017, at 20:51, Mehdi AMINI <joker.eph at gmail.com> wrote:
> >
> >
> > 2017-05-07 1:01 GMT-07:00 David Chisnall via llvm-dev <
> llvm-dev at lists.llvm.org>:
> > Is this intended to be the monorepo that eventually becomes the
official
> repo, because if so I strongly object to putting libunwind, libc++ and
> libc++abi in it.  I have recently been working on bring-up for libc++ and
> libunwind on a new platform and the integration of libunwind with the LLVM
> build system is already annoying (you can’t build it unless you have a
> working C++ standard library implementation for your target, even thought
> it’s a dependency for libc++), having to have a complete LLVM checkout
> would be even more overhead.
> >
> > Please clarify the overhead.
>
> My clone of libunwind is around 4MB.  A clone of LLVM is 2-3 orders of
> magnitude bigger.  The clone on my local system doesn’t matter too much
> (though it would be an annoying waste), because I have spare disk space,
> but each project, once it’s working, also gets cloned to our CI system,
> which is always short on disk space because it archives build artefacts.
> Network bandwidth is also an issue.
>
I'd expect any CI system to be able to cache this.
Also if you're issue is archiving a lot of build artifacts, the constant
cost of the checkout isn't gonna matter that much.
Finally, the read-only individual repo can still be used by CI, which
address this entirely.


> There’s also the secondary issue that it is valuable to be able to build
> these components out of tree, yet this is currently fragile and is likely
> to be broken even more if we’re insisting on the monorepo.
>
I don't see any rational for this.
Whatever has a CI is gonna continue to work. This is already the case
today: if you care about a configuration, provide CI for it and it'll
continue to work.


>
> We are currently able to target our platform from LLVM (as a
> cross-compiler), but not build LLVM to run on it, so it is unhelpful to
> have stuff that we compile for x86 and stuff that we compile for our target
> in the same repo, because we aggregate the stuff that we build for the
> target (libunwind, libc++, and so on) when we build images.
>
> Finally, there’s the philosophical / software engineering issue.  There
> should be no tight coupling between libunwind and anything else in the LLVM
> tree.  Libunwind implements a set of well-documented and stable APIs.
> These are used by other components, but are equally useful in other
> contexts (i.e. any compiler for any language that uses the Itanium unwind
> model).  From the perspective of someone hacking on libunwind, LLVM is an
> unrelated project (though one that shares coding conventions - an analogy
> would be two projects under the Apache umbrella) and there is absolutely no
> reason to insist that libunwind developers should clone a massive unrelated
> project to work on the code that they want to work on.
>
There is another philosophical perspective: encouraging communities to get
closer together. You talking about "libunwind developers", and there
are
"lldb developers" as well, I rather get closer to: "we're
working on the
same project", with shared practices and goals. And ultimately, to come
back to your software engineering practices, encouraging code motion and
code reuse between sub-projects.

> All of this applies to libc++ and libc++abi as well.
>
Ultimately I don't know about libunwind, and if it has to live separately
it is not a big deal. The others (libc++ and libc++abi for instance) are
more tied to the rest of the project though.
We duplicate the demangler from libc++abi in llvm for instance, and this is
quite an important software engineer issue to me.


-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170509/0da44929/attachment.html>

David Chisnall via llvm-dev

2017-May-09 15:17 UTC

head link

[llvm-dev] Add more projects into Git monorepo

On 9 May 2017, at 15:58, Mehdi AMINI <joker.eph at gmail.com> wrote:
> I'd expect any CI system to be able to cache this.
> Also if you're issue is archiving a lot of build artifacts, the
constant cost of the checkout isn't gonna matter that much.
> Finally, the read-only individual repo can still be used by CI, which
address this entirely.
If we want to pull in new libunwind fixes from upstream, we’ll also pull in
irrelevant LLVM, clang, lldb, lld, and so on changes.  This translates to extra
bandwidth and storage requirements for *every* copy of the libunwind repo that
we need.

If we follow the monorepo approach downstream and merge these independent repos,
then we add extra merges for everyone downstream because people committing
improvements to our LLVM and clang trees will require rebase pulls for anyone
working on libc++ or libunwind, even though the changes were to a component that
they’re not needing to build, let alone modify.
> There is another philosophical perspective: encouraging communities to get
closer together. You talking about "libunwind developers", and there
are "lldb developers" as well, I rather get closer to: "we're
working on the same project", with shared practices and goals. And
ultimately, to come back to your software engineering practices, encouraging
code motion and code reuse between sub-projects.
I disagree, as someone who wears hats as a libunwind, libc++, clang and LLVM
developer: I am no more engaged between the different groups by having the repos
combined, but I am inconvenienced by having to carry around clones of unrelated
code when I am working on one component and by having to rebase my libunwind
repo because someone committed to clang.

Combining the clang and LLVM repos is a necessary evil.  If we could have clean
layering and well-defined APIs for the LLVM APIs needed for clang, then I would
be opposed to this as well, but unfortunately this has too high an engineering
cost and so we need to be able to perform atomic commits of LLVM and LLVM-using
projects (this, unfortunately, means that we often don’t see the cost that this
imposes on developers of other front ends).  In contrast, if we need to perform
an atomic commit between libc++ and clang or libunwind and clang then this tells
us that we have a bug: a new version of clang may introduce a feature that
relies on a new libc++ or libunwind, but a new libunwind or libc++ should always
work with an old clang (or an old gcc, or any other compiler that targets it).
>> All of this applies to libc++ and libc++abi as well.
> 
> Ultimately I don't know about libunwind, and if it has to live
separately it is not a big deal. The others (libc++ and libc++abi for instance)
are more tied to the rest of the project though.
> We duplicate the demangler from libc++abi in llvm for instance, and this is
quite an important software engineer issue to me.
The requirements for a libc++abi demangler and a generic LLVM one are very
different.  For libc++abi, the requirements are:

 - Must be small (the binary size of libc++abi is very important)

 - Must be tolerant of out-of-memory conditions (it is used for generating error
messages when an out-of-memory exception is thrown)

 - Must use malloc() / realloc() for providing the demangled string (a
requirement of the Itanium ABI public APIs)

In contrast, the demangler for the rest of LLVM:

 - Must be flexible (e.g. lldb wants to be able to get the base name of a
demangled function, so that it can insert breakpoints on all overloads)

 - Must be fast (e.g. lldb wants to demangle every symbol in a library in a
UI-critical path)

 - Must provide structured information about the demangled symbol, not just a
string as output.

 - Must integrate with other memory allocation mechanisms (e.g. support
std::allocator)

Copying the demangler was a quick way of getting something to work portably, but
it wasn’t a good solution given the different requirements (the libc++abi
demangler doesn’t do a good job of meeting either set of requirements), so this
is a very bad justification for merging the repos.

David

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - May 2017 - Add more projects into Git monorepo

[llvm-dev] Add more projects into Git monorepo

[llvm-dev] Add more projects into Git monorepo

[llvm-dev] Add more projects into Git monorepo

Maybe Matching Threads