David Greene via llvm-dev
2019-Jan-29 18:33 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes:> In the new monorepo UC1 may or may not be a parent to UL1. > We could actually have something like this: > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > Our DL1 commit should preferably have UL1 as parent after > conversion > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | > ...->DL1 > > but since it also includes DC1 (via submodule reference) we > want to zip in DC1 before DL1, right? > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | > ...->DC1->DL1 > > The problem is that DC1 is based on UC1, so we would get something > like this > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | | > ...->DC1->DL1 | > ^ | > | | > -------------- > > Which is not correct, since then we also get the UL0 commit > as predecessor to DL1.To be clear, is DC1 a commit that updates the clang submodule to UC1 and DL1 a separate local commit to llvm that merges in UL1? When zip-downstream-fork.py runs, it *always* uses the exact trees in use by each downstream commit, whether from submodules or the umbrella itself. It tries very hard to maintain the state of the trees as they appeared in the umbrella repository. Since in your case llvm isn't a submodule (it's the "umbrella"), DL1 will absolutely have the tree from UL1, not UL0. This is how migrate-downstream-fork.py works and zip-downstream-fork.py won't touch the llvm tree since it's not a submodule. The commit DL1 doesn't update any submodules so it will just use the clang tree from DC1. I haven't tested this case explicitly but I would expect the resulting history graph to look as you diagrammed above (reformatted to make it clear there isn't a cycle): UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master | | \ | `-----------. | \ ... ->DC1->DL1 <- zip/master The "redundant" edge here is indicating that the state of the llvm tree at DL1 is based on UL1, not UL0. All other projects will be in the state at UC1 (assuming you don't have other submodules under llvm). I know it looks strange but this is the best I could come up with because in general there is no guarantee that submodule updates were in any way correlated with when upstream commits were made (as you discovered!). There's some discussion of this issue on the documentation I posted [1], as well as in header comments in zip-downstream-fork.py. The difficulty with this is that going forward, if you merge from monorepo/master git will think you already have the changes from UL0. There are at least two ways to work around this issue. The first is to just manually apply the llvm diff from UL1 to UL0 on top of zip/master and then merge from monorepo/master after that. The other way is to freeze your local split repositories and merge from the upstream split masters for all subprojects before running migrate-downstream-fork.py and zip-downstream-fork.py. Then everything will have the most up-to-date trees and you should be fine going forward. Doing such a merge isn't possible for everyone at the time they want to migrate, but the manual diff/patch method should suffice for those situations. You just have to somehow remember to do it before the next merge from upstream. Creating an auxilliary branch with the patch applied is one way to remember. I haven't really thought of a better way to handle situations like this so I'm open to ideas! -David [1] https://reviews.llvm.org/D56550
Björn Pettersson A via llvm-dev
2019-Jan-30 19:52 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
> -----Original Message----- > From: David Greene <dag at cray.com> > Sent: den 29 januari 2019 19:33 > To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> > Cc: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; openmp- > dev at lists.llvm.org; clangd-dev at lists.llvm.org; libclc- > dev at lists.llvm.org; libcxx-dev at lists.llvm.org; lldb-dev at lists.llvm.org > Subject: Re: [monorepo] Much improved downstream zipping tool available > > Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes: > > > In the new monorepo UC1 may or may not be a parent to UL1. > > We could actually have something like this: > > > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > > > Our DL1 commit should preferably have UL1 as parent after > > conversion > > > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > | > > ...->DL1 > > > > but since it also includes DC1 (via submodule reference) we > > want to zip in DC1 before DL1, right? > > > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > | > > ...->DC1->DL1 > > > > The problem is that DC1 is based on UC1, so we would get something > > like this > > > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > | | > > ...->DC1->DL1 | > > ^ | > > | | > > -------------- > > > > Which is not correct, since then we also get the UL0 commit > > as predecessor to DL1. > > To be clear, is DC1 a commit that updates the clang submodule to UC1 > and > DL1 a separate local commit to llvm that merges in UL1?In llvm (split) we have: UL4->UL3->UL2->UL1->UL0 \ ...->DL2->DL1 In clang (split) we have: UC4->UC3->UC2->UC1->UC0 \ ...->DC2->DC1 DL1 is a commit that updates the clang submodule to DC1 (and in this scenario at the same time merges UL1 and DL2 in llvm).> > When zip-downstream-fork.py runs, it *always* uses the exact trees in > use by each downstream commit, whether from submodules or the umbrella > itself. It tries very hard to maintain the state of the trees as they > appeared in the umbrella repository. > > Since in your case llvm isn't a submodule (it's the "umbrella"), DL1 > will absolutely have the tree from UL1, not UL0. This is how > migrate-downstream-fork.py works and zip-downstream-fork.py won't touch > the llvm tree since it's not a submodule. The commit DL1 doesn't > update > any submodules so it will just use the clang tree from DC1. > > I haven't tested this case explicitly but I would expect the resulting > history graph to look as you diagrammed above (reformatted to make it > clear there isn't a cycle): > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master > | | > \ | > `-----------. > | \ > ... ->DC1->DL1 <- zip/master > > The "redundant" edge here is indicating that the state of the llvm tree > at DL1 is based on UL1, not UL0. All other projects will be in the > state at UC1 (assuming you don't have other submodules under llvm). I > know it looks strange but this is the best I could come up with because > in general there is no guarantee that submodule updates were in any way > correlated with when upstream commits were made (as you discovered!). > There's some discussion of this issue on the documentation I posted > [1], > as well as in header comments in zip-downstream-fork.py.How does git know that it should follow the parent relation from DL1 to UL1 for the llvm subdir, and not the UL0->UC1->DC1->DL1 path? I mean, if I check out commit DC1 I will see the contribution from UL0 in the llvm subdir, and DL1 includes the changes from DC1. (I understand that we never really want to check out the old clang commits after the migration, it will be the DLx commits that matters and that should have a synced view between the different subdirs, it is also the DLx commits that may have old release labels in our downstream release track)
David Greene via llvm-dev
2019-Jan-30 21:43 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes:> In llvm (split) we have: > > UL4->UL3->UL2->UL1->UL0 > \ > ...->DL2->DL1 > > In clang (split) we have: > > UC4->UC3->UC2->UC1->UC0 > \ > ...->DC2->DC1 > > > DL1 is a commit that updates the clang submodule to DC1 (and in this > scenario at the same time merges UL1 and DL2 in llvm).Ok, in that case I would expect the resulting history to look like this: UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master | \ \ `---. `------------. \ \| ... ->DL2->DL1/DC2 <- zip/master / ... ->DC2--' As a submodule update, DC1 is "inlined" into DL1 and its commit message is appended to that of DL1. I'm presuming here that llvm never updated the clang submodule to DC2, so it remains an independent commit. The inlining is done assuming that submodule updates represent a single logical change. Submodule updates are assumed to be related to whatever changes happen in the umbrella so they all get smushed together into one commit. The edge UC1->DL1 represents the use of UC1 tree for every project *except* llvm, because clang was a submodule of llvm (and updated to DC1 which merged UC1) and no other project was a submodule in llvm. DL1 still has the llvm tree from UL1 plus any local changes you may have made. Admittedly, this is tricky to understand. Believe me, there were a lot of headaches involved trying to figure out what the right thing to do is. This is my best stab at that. I don't think I have a test that creates this kind of graph. It would be interesting to see if it works. :) At the moment I'm busy with other things. Give it a try and see if it does what you expect.> How does git know that it should follow the parent relation from > DL1 to UL1 for the llvm subdir, and not the UL0->UC1->DC1->DL1 > path? I mean, if I check out commit DC1 I will see the contribution > from UL0 in the llvm subdir, and DL1 includes the changes from DC1.With the history above this is no longer an issue since you can't check out DC1 as such. It's related to the llvm tree in DL1. Let's say we have a commit DC3 and commit DL3 updated llvm's clang submodule to DC3. Commit DC4 was never referenced in a submodule update. The graph should then look like this: UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master | \ \ `-------. `----------------. \ \| ... ->DL3/DC3->DL2->DL1/DC1 <- zip/master /\ / ... ->DC4--' `--->DC2----' DC3 is related to DL3 so it got inlined. DC2 has an llvm tree based on DL3. Hopefully, this is now clear as mud. :) -David