Chris Bieneman via llvm-dev
2018-Nov-01 17:26 UTC
[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo
I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016. http://lists.llvm.org/pipermail/llvm-dev/2016-July/102657.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102657.html> As James said in that email:> That we'll be getting incompatible history has been glossed over, and it is > indeed really important to make it clear and have a good plan there. This > doesn't only affect actual "forks", it also affects every single developer > with a local git clone which contains unfinished work.So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo. I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years. -Chris> On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> >> Thanks for starting this discussion Justin! >> >> On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote: >>> Hi all, >>> >>> I've spent some time in the last couple of days trying to figure out how >>> to adopt the [LLVM git monorepo prototype] for an out of tree backend. >>> TLDR: I'm not convinced that this prototype is the right approach to >>> converting to the monorepo, and I have a possible alternative. >>> >>> The main problems I'm running into stem from the fact that this >>> prototype rewrites all of history from scratch rather than leverage the >>> existing [official git mirrors]. This makes migrating out-of-tree work >>> from the official git mirrors to this repo very difficult, since there >>> is no shared history. Some efforts have gone into [documenting how to >>> port in-progress patches], but this doesn't attempt to discuss how to >>> handle more substantial out of tree work. >>> >>> Issues with integrating the prototype >>> ------------------------------------- >>> >>> As far as I can tell, my options for trying to integrate with this >>> monorepo are fairly limited. >>> >>> If I merge my trees directly into the monorepo prototype at head, I end >>> up with two copies of every commit, one of which is a monorepo style >>> commit and one with the singular repo history. These commits are >>> completely unrelated to each other, and exist in two separate parallel >>> histories, making it difficult to correlate one to the other or even to >>> tell which is which. >>> >>> An arguably cleaner solution would be try to recreate all of my trees' >>> history artificially as if they were based on the monorepo prototype >>> history all along, but this has two problems. First, it's a very >>> significant tooling effort to do this - I'd need to match up several >>> years of merge points to their corresponding spots in the monorepo >>> prototype and somehow redo all of the merges in the same ways. Tools >>> like "rebase --preserve-merges" don't really help here, since they abort >>> on merge conflicts and ask a human to resolve them again. Even if I were >>> to come up with tooling that managed this, I'm still left with a >>> completely new set of hashes for commits and no easy way to map them to >>> existing references in emails, bug trackers, and release notes. >>> >>> Finally, there's the option of throwing away all of my history and >>> applying my out of tree work in a single patch. This makes git-log and >>> git-blame useless for investigating issues in my codebase for a few >>> years. It also means that when fixes go into older branches they can't >>> be merged forward and need to be redone by hand. >>> >>> All of these have very significant drawbacks, and none of them really >>> sounds like a good option at all. >>> >> >> We're in this situation. We have over 7 years of git history for our >> out-of-tree target and it would be a huge pain and drawback if we were >> to lose that history by e.g. needing to apply all our changes as a >> single patch to the new monorepo. >> >> We haven't started moving to the monorepo yet so while we haven't hit >> the issues in practice yet, we will. Preserving the history from the git >> mirrors would surely be beneficial. >> > > We are also in the same situation for our out-of-tree CHERI backend > (https://github.com/CTSRD-CHERI/llvm <https://github.com/CTSRD-CHERI/llvm> > https://github.com/CTSRD-CHERI/clang <https://github.com/CTSRD-CHERI/clang> > https://github.com/CTSRD-CHERI/lld <https://github.com/CTSRD-CHERI/lld>). I am aware there were some > attempts at converting our repos to a monorepo structure a few years > ago according to > <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>>. > However, I'm not sure if the script mentioned there can be reused with > the new git monorepo and it seems that it only handles clang. We would > have to also include our forks of llvm,lld,libunwind and libc++. > > Thanks, > Alex > >>> An alternative approach >>> ----------------------- >>> >>> All of these problems could be mitigated if we could preserve the >>> history of the existing git mirrors when generating the monorepo. There >>> are two ways to do this. >>> >>> 1. Start the monorepo by subtree-merging the various repos together at >>> an arbitrary point in time. >>> >>> 2. "Zip" together the commits in each official git mirror repo by >>> merging them into a combined view after each commit. >>> >>> While I personally don't see a problem with (1), I've heard people claim >>> that they want to use the monorepo to bisect arbitrarily far back into >>> history. If this is the case, we'd prefer an approach like (2). >>> >>> A zippered repository gives us a lot of the benefits of the prototype, >>> without a lot of the issues that are caused by rewriting history: >>> >>> - The commits from the official git mirrors exist as they are now, and >>> we don't need to deal with changing hashes. >>> >>> - Out-of-tree branches have all of their history whether they opt in to >>> creating a monorepo style history or not >>> >>> - All of the repo's history is visible as a monorepo by looking only at >>> the merge commits. Bisect scripts can easily filter to these. >>> >>> - The monorepo commits and individual repo commits are easily >>> discernible and have a direct link between them in git's DAG, making >>> it easy to find one from the other. >>> >>> To demonstrate this approach, I've put up a snapshot of what LLVM might >>> look like if we did this, using some scripts that Duncan wrote a while >>> back to experiment with the idea: >>> >>> https://github.com/bogner/llvm-zipper-prototype >> >> I took a quick look at the zipper prototype and I think it looks awesome! >> >> (Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of >> memory (and continued grabbing more) but I don't know if that's a >> problem that is perhaps solved in a more recent git version than I'm >> running or what the problem really is.) >> >> Thanks, >> Mikael >> >>> >>> Note that this is just a demo/prototype. It has some minor issues, isn't >>> being automatically updated, and I may regenerate it at some point. >>> >>> Thoughts? >>> >>> Thanks, >>> -- Justin Bogner >>> >>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm >>> [official git mirrors]: https://git.llvm.org/git/llvm.git >>> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414 >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181101/212b97db/attachment.html>
via llvm-dev
2018-Nov-01 18:00 UTC
[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo
While my team doesn't have one, it's clear that out-of-tree backends are an important long-standing valuable use-case for downstream consumers of LLVM, and the new monorepo should try very hard NOT to make their lives difficult. --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chris Bieneman via llvm-dev Sent: Thursday, November 01, 2018 1:27 PM To: llvm-dev Subject: Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016. http://lists.llvm.org/pipermail/llvm-dev/2016-July/102657.html As James said in that email: That we'll be getting incompatible history has been glossed over, and it is indeed really important to make it clear and have a good plan there. This doesn't only affect actual "forks", it also affects every single developer with a local git clone which contains unfinished work. So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo. I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years. -Chris On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, Thanks for starting this discussion Justin! On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote: Hi all, I've spent some time in the last couple of days trying to figure out how to adopt the [LLVM git monorepo prototype] for an out of tree backend. TLDR: I'm not convinced that this prototype is the right approach to converting to the monorepo, and I have a possible alternative. The main problems I'm running into stem from the fact that this prototype rewrites all of history from scratch rather than leverage the existing [official git mirrors]. This makes migrating out-of-tree work from the official git mirrors to this repo very difficult, since there is no shared history. Some efforts have gone into [documenting how to port in-progress patches], but this doesn't attempt to discuss how to handle more substantial out of tree work. Issues with integrating the prototype ------------------------------------- As far as I can tell, my options for trying to integrate with this monorepo are fairly limited. If I merge my trees directly into the monorepo prototype at head, I end up with two copies of every commit, one of which is a monorepo style commit and one with the singular repo history. These commits are completely unrelated to each other, and exist in two separate parallel histories, making it difficult to correlate one to the other or even to tell which is which. An arguably cleaner solution would be try to recreate all of my trees' history artificially as if they were based on the monorepo prototype history all along, but this has two problems. First, it's a very significant tooling effort to do this - I'd need to match up several years of merge points to their corresponding spots in the monorepo prototype and somehow redo all of the merges in the same ways. Tools like "rebase --preserve-merges" don't really help here, since they abort on merge conflicts and ask a human to resolve them again. Even if I were to come up with tooling that managed this, I'm still left with a completely new set of hashes for commits and no easy way to map them to existing references in emails, bug trackers, and release notes. Finally, there's the option of throwing away all of my history and applying my out of tree work in a single patch. This makes git-log and git-blame useless for investigating issues in my codebase for a few years. It also means that when fixes go into older branches they can't be merged forward and need to be redone by hand. All of these have very significant drawbacks, and none of them really sounds like a good option at all. We're in this situation. We have over 7 years of git history for our out-of-tree target and it would be a huge pain and drawback if we were to lose that history by e.g. needing to apply all our changes as a single patch to the new monorepo. We haven't started moving to the monorepo yet so while we haven't hit the issues in practice yet, we will. Preserving the history from the git mirrors would surely be beneficial. We are also in the same situation for our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm https://github.com/CTSRD-CHERI/clang https://github.com/CTSRD-CHERI/lld). I am aware there were some attempts at converting our repos to a monorepo structure a few years ago according to <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>. However, I'm not sure if the script mentioned there can be reused with the new git monorepo and it seems that it only handles clang. We would have to also include our forks of llvm,lld,libunwind and libc++. Thanks, Alex An alternative approach ----------------------- All of these problems could be mitigated if we could preserve the history of the existing git mirrors when generating the monorepo. There are two ways to do this. 1. Start the monorepo by subtree-merging the various repos together at an arbitrary point in time. 2. "Zip" together the commits in each official git mirror repo by merging them into a combined view after each commit. While I personally don't see a problem with (1), I've heard people claim that they want to use the monorepo to bisect arbitrarily far back into history. If this is the case, we'd prefer an approach like (2). A zippered repository gives us a lot of the benefits of the prototype, without a lot of the issues that are caused by rewriting history: - The commits from the official git mirrors exist as they are now, and we don't need to deal with changing hashes. - Out-of-tree branches have all of their history whether they opt in to creating a monorepo style history or not - All of the repo's history is visible as a monorepo by looking only at the merge commits. Bisect scripts can easily filter to these. - The monorepo commits and individual repo commits are easily discernible and have a direct link between them in git's DAG, making it easy to find one from the other. To demonstrate this approach, I've put up a snapshot of what LLVM might look like if we did this, using some scripts that Duncan wrote a while back to experiment with the idea: https://github.com/bogner/llvm-zipper-prototype I took a quick look at the zipper prototype and I think it looks awesome! (Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of memory (and continued grabbing more) but I don't know if that's a problem that is perhaps solved in a more recent git version than I'm running or what the problem really is.) Thanks, Mikael Note that this is just a demo/prototype. It has some minor issues, isn't being automatically updated, and I may regenerate it at some point. Thoughts? Thanks, -- Justin Bogner [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm [official git mirrors]: https://git.llvm.org/git/llvm.git [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414 _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181101/e50a425b/attachment.html>
Chris Bieneman via llvm-dev
2018-Nov-01 18:08 UTC
[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo
Agreed. I also would argue that this problem isn't unique to out-of-tree backends. Generally it could impact any fork that has out-of-tree changes. I think out-of-tree backends is probably the most common type of use case for that, however it will also likely impact a variety of forks of LLVM projects. For example this will likely have impact on the Swift project's forks of LLVM & Clang which have out-of-tree modifications. -Chris> On Nov 1, 2018, at 11:00 AM, paul.robinson at sony.com wrote: > > While my team doesn't have one, it's clear that out-of-tree backends are an important long-standing valuable use-case for downstream consumers of LLVM, and the new monorepo should try very hard NOT to make their lives difficult. > --paulr > <> > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chris Bieneman via llvm-dev > Sent: Thursday, November 01, 2018 1:27 PM > To: llvm-dev > Subject: Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo > > I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016. > > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102657.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102657.html> > > As James said in that email: > > That we'll be getting incompatible history has been glossed over, and it is > indeed really important to make it clear and have a good plan there. This > doesn't only affect actual "forks", it also affects every single developer > with a local git clone which contains unfinished work. > > So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo. > > I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years. > > -Chris > > > On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > Hi, > > Thanks for starting this discussion Justin! > > On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote: > > Hi all, > > I've spent some time in the last couple of days trying to figure out how > to adopt the [LLVM git monorepo prototype] for an out of tree backend. > TLDR: I'm not convinced that this prototype is the right approach to > converting to the monorepo, and I have a possible alternative. > > The main problems I'm running into stem from the fact that this > prototype rewrites all of history from scratch rather than leverage the > existing [official git mirrors]. This makes migrating out-of-tree work > from the official git mirrors to this repo very difficult, since there > is no shared history. Some efforts have gone into [documenting how to > port in-progress patches], but this doesn't attempt to discuss how to > handle more substantial out of tree work. > > Issues with integrating the prototype > ------------------------------------- > > As far as I can tell, my options for trying to integrate with this > monorepo are fairly limited. > > If I merge my trees directly into the monorepo prototype at head, I end > up with two copies of every commit, one of which is a monorepo style > commit and one with the singular repo history. These commits are > completely unrelated to each other, and exist in two separate parallel > histories, making it difficult to correlate one to the other or even to > tell which is which. > > An arguably cleaner solution would be try to recreate all of my trees' > history artificially as if they were based on the monorepo prototype > history all along, but this has two problems. First, it's a very > significant tooling effort to do this - I'd need to match up several > years of merge points to their corresponding spots in the monorepo > prototype and somehow redo all of the merges in the same ways. Tools > like "rebase --preserve-merges" don't really help here, since they abort > on merge conflicts and ask a human to resolve them again. Even if I were > to come up with tooling that managed this, I'm still left with a > completely new set of hashes for commits and no easy way to map them to > existing references in emails, bug trackers, and release notes. > > Finally, there's the option of throwing away all of my history and > applying my out of tree work in a single patch. This makes git-log and > git-blame useless for investigating issues in my codebase for a few > years. It also means that when fixes go into older branches they can't > be merged forward and need to be redone by hand. > > All of these have very significant drawbacks, and none of them really > sounds like a good option at all. > > > We're in this situation. We have over 7 years of git history for our > out-of-tree target and it would be a huge pain and drawback if we were > to lose that history by e.g. needing to apply all our changes as a > single patch to the new monorepo. > > We haven't started moving to the monorepo yet so while we haven't hit > the issues in practice yet, we will. Preserving the history from the git > mirrors would surely be beneficial. > > > We are also in the same situation for our out-of-tree CHERI backend > (https://github.com/CTSRD-CHERI/llvm <https://github.com/CTSRD-CHERI/llvm> > https://github.com/CTSRD-CHERI/clang <https://github.com/CTSRD-CHERI/clang> > https://github.com/CTSRD-CHERI/lld <https://github.com/CTSRD-CHERI/lld>). I am aware there were some > attempts at converting our repos to a monorepo structure a few years > ago according to > <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html <http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>>. > However, I'm not sure if the script mentioned there can be reused with > the new git monorepo and it seems that it only handles clang. We would > have to also include our forks of llvm,lld,libunwind and libc++. > > Thanks, > Alex > > > An alternative approach > ----------------------- > > All of these problems could be mitigated if we could preserve the > history of the existing git mirrors when generating the monorepo. There > are two ways to do this. > > 1. Start the monorepo by subtree-merging the various repos together at > an arbitrary point in time. > > 2. "Zip" together the commits in each official git mirror repo by > merging them into a combined view after each commit. > > While I personally don't see a problem with (1), I've heard people claim > that they want to use the monorepo to bisect arbitrarily far back into > history. If this is the case, we'd prefer an approach like (2). > > A zippered repository gives us a lot of the benefits of the prototype, > without a lot of the issues that are caused by rewriting history: > > - The commits from the official git mirrors exist as they are now, and > we don't need to deal with changing hashes. > > - Out-of-tree branches have all of their history whether they opt in to > creating a monorepo style history or not > > - All of the repo's history is visible as a monorepo by looking only at > the merge commits. Bisect scripts can easily filter to these. > > - The monorepo commits and individual repo commits are easily > discernible and have a direct link between them in git's DAG, making > it easy to find one from the other. > > To demonstrate this approach, I've put up a snapshot of what LLVM might > look like if we did this, using some scripts that Duncan wrote a while > back to experiment with the idea: > > https://github.com/bogner/llvm-zipper-prototype <https://github.com/bogner/llvm-zipper-prototype> > > I took a quick look at the zipper prototype and I think it looks awesome! > > (Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of > memory (and continued grabbing more) but I don't know if that's a > problem that is perhaps solved in a more recent git version than I'm > running or what the problem really is.) > > Thanks, > Mikael > > > > Note that this is just a demo/prototype. It has some minor issues, isn't > being automatically updated, and I may regenerate it at some point. > > Thoughts? > > Thanks, > -- Justin Bogner > > [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm <https://github.com/llvm-git-prototype/llvm> > [official git mirrors]: https://git.llvm.org/git/llvm.git <https://git.llvm.org/git/llvm.git> > [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414 <https://reviews.llvm.org/D53414> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181101/d5207e99/attachment.html>