David Greene via llvm-dev
2019-Jan-29 16:01 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
He all, I've updated the downstream fork zipping tool that I posted about last November [1]. It is much improved in every way. The most important enhancements are: - Does a better job of simplifying history - Handles nested submodules - Will put non-submodule-update content in a subdirectory of the monorepo - Updates tags In addition there are plenty of the requisite bug fixes. The latest version of the tool can be found here: https://github.com/greened/llvm-git-migration/tree/zip With the nested submodules and the subdirectory features, the tool can now take a downstream llvm repository with submodules (e.g. clang in tools/clang and so on) as an umbrella and order the commits according to changes in llvm and its submodules. Björn, this new version may well be able to handle the tasks you outlined in December [2]. I've written some recipes as proposed additions to the GitHub migration proposal [3]. If you have a different scenario, please comment there and if it seems a like a common case I can add a recipe for it so we can all benefit from the learning. Much of the bugfixing work was the result of some artificial histories I created to shake out problems. I believe it is ready for some testing in the wild. If you do try it, please let me know how it worked for you and any problems you run into. I will try to fix them. It's easiest if you can provide me with a test repository showing the problem but even a verbal description of what is happening can help. I hope this tool is helpful to the community. -David [1] http://lists.llvm.org/pipermail/llvm-dev/2018-November/127704.html [2] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128620.html [3] https://reviews.llvm.org/D56550
Björn Pettersson A via llvm-dev
2019-Jan-29 16:41 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
Thanks for working on this David. One problem that I've found with our downstream repos (and nestled submodule structure) is that we haven't always been in sync when updating the llvm and clang repos (considering svn-id:s). Basically we can have a commit DL1 on our downstream llvm branch being based on commit UL1 from the upstream llvm branch, pointing out the submodule commit DC1 in clang (from downstream branch) and that one could be based on UC1 from upstream clang branch. In the new monorepo UC1 may or may not be a parent to UL1. We could actually have something like this: UL4->UC2->UL3->UL2->UL1->UL0->UC1 Our DL1 commit should preferably have UL1 as parent after conversion UL4->UC2->UL3->UL2->UL1->UL0->UC1 | ...->DL1 but since it also includes DC1 (via submodule reference) we want to zip in DC1 before DL1, right? UL4->UC2->UL3->UL2->UL1->UL0->UC1 | ...->DC1->DL1 The problem is that DC1 is based on UC1, so we would get something like this UL4->UC2->UL3->UL2->UL1->UL0->UC1 | | ...->DC1->DL1 | ^ | | | -------------- Which is not correct, since then we also get the UL0 commit as predecessor to DL1. This make me wonder if zipping really is that interesting for us. When checking out an old downstream commit like DL1 in the monorepo I would not be certain that I see the same version of clang as in the old split repos (with submodule updates). Often it would be correct, but not always. I'll take a look at your updated script to see if it would make any sense for us to use it (to get some kind of zipped history). Although, I got at feeling that doing the octopus merge might be the simple solution for us. If we ever want to build something old, we would use our old split repos. The octopus merge would indicate how far back we can do bisects etc. in the monorepo. Even with some kind of zipping it would be hard to build/bisect older commits (on our downstream branches). Regards, Björn> -----Original Message----- > From: David Greene <dag at cray.com> > Sent: den 29 januari 2019 17:02 > To: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; openmp- > dev at lists.llvm.org; clangd-dev at lists.llvm.org; libclc- > dev at lists.llvm.org; libcxx-dev at lists.llvm.org; lldb-dev at lists.llvm.org > Cc: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> > Subject: [monorepo] Much improved downstream zipping tool available > > He all, > > I've updated the downstream fork zipping tool that I posted about last > November [1]. It is much improved in every way. The most important > enhancements are: > > - Does a better job of simplifying history > > - Handles nested submodules > > - Will put non-submodule-update content in a subdirectory of the > monorepo > > - Updates tags > > In addition there are plenty of the requisite bug fixes. The latest > version of the tool can be found here: > > https://github.com/greened/llvm-git-migration/tree/zip > > With the nested submodules and the subdirectory features, the tool can > now take a downstream llvm repository with submodules (e.g. clang in > tools/clang and so on) as an umbrella and order the commits according > to > changes in llvm and its submodules. > > Björn, this new version may well be able to handle the tasks you > outlined in December [2]. > > I've written some recipes as proposed additions to the GitHub migration > proposal [3]. If you have a different scenario, please comment there > and if it seems a like a common case I can add a recipe for it so we > can > all benefit from the learning. > > Much of the bugfixing work was the result of some artificial histories > I > created to shake out problems. I believe it is ready for some testing > in the wild. If you do try it, please let me know how it worked for > you > and any problems you run into. I will try to fix them. It's easiest > if > you can provide me with a test repository showing the problem but even > a > verbal description of what is happening can help. > > I hope this tool is helpful to the community. > > -David > > [1] http://lists.llvm.org/pipermail/llvm-dev/2018-November/127704.html > [2] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128620.html > [3] https://reviews.llvm.org/D56550
David Greene via llvm-dev
2019-Jan-29 18:33 UTC
[llvm-dev] [monorepo] Much improved downstream zipping tool available
Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes:> In the new monorepo UC1 may or may not be a parent to UL1. > We could actually have something like this: > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > > Our DL1 commit should preferably have UL1 as parent after > conversion > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | > ...->DL1 > > but since it also includes DC1 (via submodule reference) we > want to zip in DC1 before DL1, right? > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | > ...->DC1->DL1 > > The problem is that DC1 is based on UC1, so we would get something > like this > > UL4->UC2->UL3->UL2->UL1->UL0->UC1 > | | > ...->DC1->DL1 | > ^ | > | | > -------------- > > Which is not correct, since then we also get the UL0 commit > as predecessor to DL1.To be clear, is DC1 a commit that updates the clang submodule to UC1 and DL1 a separate local commit to llvm that merges in UL1? When zip-downstream-fork.py runs, it *always* uses the exact trees in use by each downstream commit, whether from submodules or the umbrella itself. It tries very hard to maintain the state of the trees as they appeared in the umbrella repository. Since in your case llvm isn't a submodule (it's the "umbrella"), DL1 will absolutely have the tree from UL1, not UL0. This is how migrate-downstream-fork.py works and zip-downstream-fork.py won't touch the llvm tree since it's not a submodule. The commit DL1 doesn't update any submodules so it will just use the clang tree from DC1. I haven't tested this case explicitly but I would expect the resulting history graph to look as you diagrammed above (reformatted to make it clear there isn't a cycle): UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master | | \ | `-----------. | \ ... ->DC1->DL1 <- zip/master The "redundant" edge here is indicating that the state of the llvm tree at DL1 is based on UL1, not UL0. All other projects will be in the state at UC1 (assuming you don't have other submodules under llvm). I know it looks strange but this is the best I could come up with because in general there is no guarantee that submodule updates were in any way correlated with when upstream commits were made (as you discovered!). There's some discussion of this issue on the documentation I posted [1], as well as in header comments in zip-downstream-fork.py. The difficulty with this is that going forward, if you merge from monorepo/master git will think you already have the changes from UL0. There are at least two ways to work around this issue. The first is to just manually apply the llvm diff from UL1 to UL0 on top of zip/master and then merge from monorepo/master after that. The other way is to freeze your local split repositories and merge from the upstream split masters for all subprojects before running migrate-downstream-fork.py and zip-downstream-fork.py. Then everything will have the most up-to-date trees and you should be fine going forward. Doing such a merge isn't possible for everyone at the time they want to migrate, but the manual diff/patch method should suffice for those situations. You just have to somehow remember to do it before the next merge from upstream. Creating an auxilliary branch with the patch applied is one way to remember. I haven't really thought of a better way to handle situations like this so I'm open to ideas! -David [1] https://reviews.llvm.org/D56550