David Chisnall via llvm-dev
2017-May-09  12:47 UTC
[llvm-dev] Add more projects into Git monorepo
On 8 May 2017, at 20:51, Mehdi AMINI <joker.eph at gmail.com> wrote:> > > 2017-05-07 1:01 GMT-07:00 David Chisnall via llvm-dev <llvm-dev at lists.llvm.org>: > Is this intended to be the monorepo that eventually becomes the official repo, because if so I strongly object to putting libunwind, libc++ and libc++abi in it. I have recently been working on bring-up for libc++ and libunwind on a new platform and the integration of libunwind with the LLVM build system is already annoying (you can’t build it unless you have a working C++ standard library implementation for your target, even thought it’s a dependency for libc++), having to have a complete LLVM checkout would be even more overhead. > > Please clarify the overhead.My clone of libunwind is around 4MB. A clone of LLVM is 2-3 orders of magnitude bigger. The clone on my local system doesn’t matter too much (though it would be an annoying waste), because I have spare disk space, but each project, once it’s working, also gets cloned to our CI system, which is always short on disk space because it archives build artefacts. Network bandwidth is also an issue. There’s also the secondary issue that it is valuable to be able to build these components out of tree, yet this is currently fragile and is likely to be broken even more if we’re insisting on the monorepo. We are currently able to target our platform from LLVM (as a cross-compiler), but not build LLVM to run on it, so it is unhelpful to have stuff that we compile for x86 and stuff that we compile for our target in the same repo, because we aggregate the stuff that we build for the target (libunwind, libc++, and so on) when we build images. Finally, there’s the philosophical / software engineering issue. There should be no tight coupling between libunwind and anything else in the LLVM tree. Libunwind implements a set of well-documented and stable APIs. These are used by other components, but are equally useful in other contexts (i.e. any compiler for any language that uses the Itanium unwind model). From the perspective of someone hacking on libunwind, LLVM is an unrelated project (though one that shares coding conventions - an analogy would be two projects under the Apache umbrella) and there is absolutely no reason to insist that libunwind developers should clone a massive unrelated project to work on the code that they want to work on. All of this applies to libc++ and libc++abi as well. David
Mehdi AMINI via llvm-dev
2017-May-09  14:58 UTC
[llvm-dev] Add more projects into Git monorepo
2017-05-09 5:47 GMT-07:00 David Chisnall <David.Chisnall at cl.cam.ac.uk>:> On 8 May 2017, at 20:51, Mehdi AMINI <joker.eph at gmail.com> wrote: > > > > > > 2017-05-07 1:01 GMT-07:00 David Chisnall via llvm-dev < > llvm-dev at lists.llvm.org>: > > Is this intended to be the monorepo that eventually becomes the official > repo, because if so I strongly object to putting libunwind, libc++ and > libc++abi in it. I have recently been working on bring-up for libc++ and > libunwind on a new platform and the integration of libunwind with the LLVM > build system is already annoying (you can’t build it unless you have a > working C++ standard library implementation for your target, even thought > it’s a dependency for libc++), having to have a complete LLVM checkout > would be even more overhead. > > > > Please clarify the overhead. > > My clone of libunwind is around 4MB. A clone of LLVM is 2-3 orders of > magnitude bigger. The clone on my local system doesn’t matter too much > (though it would be an annoying waste), because I have spare disk space, > but each project, once it’s working, also gets cloned to our CI system, > which is always short on disk space because it archives build artefacts. > Network bandwidth is also an issue. >I'd expect any CI system to be able to cache this. Also if you're issue is archiving a lot of build artifacts, the constant cost of the checkout isn't gonna matter that much. Finally, the read-only individual repo can still be used by CI, which address this entirely.> There’s also the secondary issue that it is valuable to be able to build > these components out of tree, yet this is currently fragile and is likely > to be broken even more if we’re insisting on the monorepo. >I don't see any rational for this. Whatever has a CI is gonna continue to work. This is already the case today: if you care about a configuration, provide CI for it and it'll continue to work.> > We are currently able to target our platform from LLVM (as a > cross-compiler), but not build LLVM to run on it, so it is unhelpful to > have stuff that we compile for x86 and stuff that we compile for our target > in the same repo, because we aggregate the stuff that we build for the > target (libunwind, libc++, and so on) when we build images. > > Finally, there’s the philosophical / software engineering issue. There > should be no tight coupling between libunwind and anything else in the LLVM > tree. Libunwind implements a set of well-documented and stable APIs. > These are used by other components, but are equally useful in other > contexts (i.e. any compiler for any language that uses the Itanium unwind > model). From the perspective of someone hacking on libunwind, LLVM is an > unrelated project (though one that shares coding conventions - an analogy > would be two projects under the Apache umbrella) and there is absolutely no > reason to insist that libunwind developers should clone a massive unrelated > project to work on the code that they want to work on. >There is another philosophical perspective: encouraging communities to get closer together. You talking about "libunwind developers", and there are "lldb developers" as well, I rather get closer to: "we're working on the same project", with shared practices and goals. And ultimately, to come back to your software engineering practices, encouraging code motion and code reuse between sub-projects.> All of this applies to libc++ and libc++abi as well. >Ultimately I don't know about libunwind, and if it has to live separately it is not a big deal. The others (libc++ and libc++abi for instance) are more tied to the rest of the project though. We duplicate the demangler from libc++abi in llvm for instance, and this is quite an important software engineer issue to me. -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170509/0da44929/attachment.html>
David Chisnall via llvm-dev
2017-May-09  15:17 UTC
[llvm-dev] Add more projects into Git monorepo
On 9 May 2017, at 15:58, Mehdi AMINI <joker.eph at gmail.com> wrote:> I'd expect any CI system to be able to cache this. > Also if you're issue is archiving a lot of build artifacts, the constant cost of the checkout isn't gonna matter that much. > Finally, the read-only individual repo can still be used by CI, which address this entirely.If we want to pull in new libunwind fixes from upstream, we’ll also pull in irrelevant LLVM, clang, lldb, lld, and so on changes. This translates to extra bandwidth and storage requirements for *every* copy of the libunwind repo that we need. If we follow the monorepo approach downstream and merge these independent repos, then we add extra merges for everyone downstream because people committing improvements to our LLVM and clang trees will require rebase pulls for anyone working on libc++ or libunwind, even though the changes were to a component that they’re not needing to build, let alone modify.> There is another philosophical perspective: encouraging communities to get closer together. You talking about "libunwind developers", and there are "lldb developers" as well, I rather get closer to: "we're working on the same project", with shared practices and goals. And ultimately, to come back to your software engineering practices, encouraging code motion and code reuse between sub-projects.I disagree, as someone who wears hats as a libunwind, libc++, clang and LLVM developer: I am no more engaged between the different groups by having the repos combined, but I am inconvenienced by having to carry around clones of unrelated code when I am working on one component and by having to rebase my libunwind repo because someone committed to clang. Combining the clang and LLVM repos is a necessary evil. If we could have clean layering and well-defined APIs for the LLVM APIs needed for clang, then I would be opposed to this as well, but unfortunately this has too high an engineering cost and so we need to be able to perform atomic commits of LLVM and LLVM-using projects (this, unfortunately, means that we often don’t see the cost that this imposes on developers of other front ends). In contrast, if we need to perform an atomic commit between libc++ and clang or libunwind and clang then this tells us that we have a bug: a new version of clang may introduce a feature that relies on a new libc++ or libunwind, but a new libunwind or libc++ should always work with an old clang (or an old gcc, or any other compiler that targets it).>> All of this applies to libc++ and libc++abi as well. > > Ultimately I don't know about libunwind, and if it has to live separately it is not a big deal. The others (libc++ and libc++abi for instance) are more tied to the rest of the project though. > We duplicate the demangler from libc++abi in llvm for instance, and this is quite an important software engineer issue to me.The requirements for a libc++abi demangler and a generic LLVM one are very different. For libc++abi, the requirements are: - Must be small (the binary size of libc++abi is very important) - Must be tolerant of out-of-memory conditions (it is used for generating error messages when an out-of-memory exception is thrown) - Must use malloc() / realloc() for providing the demangled string (a requirement of the Itanium ABI public APIs) In contrast, the demangler for the rest of LLVM: - Must be flexible (e.g. lldb wants to be able to get the base name of a demangled function, so that it can insert breakpoints on all overloads) - Must be fast (e.g. lldb wants to demangle every symbol in a library in a UI-critical path) - Must provide structured information about the demangled symbol, not just a string as output. - Must integrate with other memory allocation mechanisms (e.g. support std::allocator) Copying the demangler was a quick way of getting something to work portably, but it wasn’t a good solution given the different requirements (the libc++abi demangler doesn’t do a good job of meeting either set of requirements), so this is a very bad justification for merging the repos. David