David Greene via llvm-dev
2018-Nov-12 21:26 UTC
[llvm-dev] [monorepo] Downstream branch zipping tool available
Building on the great work that James Knight did on migrate-downstream-fork.py (Thanks, James!) [1], I've created a simple tool to take migrated downstream fork branches and zip them into a single history given a history containing submodule updates of subprojects [2]. With migrate-downstream-fork.py, one is left with a set of unrelated histories, one per subproject: llvm clang compiler-rt * V Add my fancy LLVM feature * G Fix my dumb clang bug * Z Merge from upstream compiler-rt One can do an octopus merge to unify them: *-- Merge llvm, clang and compiler-rt |\ \ * \ \ V Add my fancy LLVM feature | * | G Fix my dumb clang bug | | * Z Merge from upstream compiler-rt Unfortunately, that doesn't show the logical history of development, where changes were effectively applied to subprojects in a linear fashion. This makes it more difficult to do bisects, among other things because none of the downstream integration happens until the octopus merge. Let's say that downstream you have a local mirror for each LLVM subproject you work on. Suppose also that you have an "umbrella" repository that holds submodule references to all those local mirrors. Various commits in the umbrella update submodule references: * Update llvm submodule to V * Update clang submodule to G * Don't update any submodules, fix scripts or something * Update compiler-rt submodule to Z | zip-downstream-fork.py will take these submodule updates and "inline" them into the umbrella history, making it appear that the downstream commits were applied against the monorepo in the order implied by the umbrella history: * A Add my fancy LLVM feature * B Fix my dumb clang bug * C Merge from upstream compiler-rt | Parent relationships for merges from upstream are preserved, though as top-level comments in zip-downstream-fork.py explain, the history graph can look a little strange. Commits that don't update submodules are skipped on the assumption that they modify things uninteresting to a monorepo history. Such commits could be preserved but doing so has some caveats as explained in the comments. Perhaps your umbrella repository holds your build scripts. You'd probably want to migrate that to the zipped history. If there's strong demand for this I could look into doing it. There are various other limitations to the tool explained in the comments. It was enough to get us going and I'm hopeful it will be useful for others. It seems to do the right thing with our repositories but YMMV. Feel free to open PRs with bug fixes. :) To get this to work, you'll need to apply a PR for migrate-downstream-fork.py to fix issues with --revmap-out [3]. -David [1] https://github.com/jyknight/llvm-git-migration/blob/master/migrate-downstream-fork.py [2] https://github.com/jyknight/llvm-git-migration/pull/2/commits/a3b44a294c20f1762cb42b5794e6130c5b27f22d [3] https://github.com/jyknight/llvm-git-migration/pull/1
Björn Pettersson A via llvm-dev
2018-Dec-18 10:23 UTC
[llvm-dev] [monorepo] Downstream branch zipping tool available
Hi David. Thanks for sharing your branch zipping migration script. Unfortunately I think our situation is a little bit more complicated. We have used llvm as the umbrella repo, so in llvm we have a "master" branch (from the git single repo version of llvm) and a couple of downstream branches (let's call them "down0", "down1") containing our downstream work (with frequent merges from "master"). The downstream branches has tools/clang and runtimes/compiler-rt as submodules, as well as a couple of downstream submodules. In our downstream version of clang we have a similar structure. A "master" branch (mapping to the git single repo version clang), and a couple of downstream branches. The downstream branches has tools/extra (i.e. clang-tools-extra) as a submodule. I can also mention that the clang, compiler-rt and clang-tools-extra submodules aren't present from the beginning of history. They have been added later on. I doubt that zip-downstream-fork.py will work out-of-the-box. Hopefully I'll be able to patch it for our scenario. Any guidelines might be helpful. But maybe it isn't even worth trying to adapt zip-downstream-fork.py to do something useful for our scenario? If someone else got a similar scenario, let me know. Perhaps we can do some joint effort in adapting the zipper script. Regards, Björn> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of David > Greene via llvm-dev > Sent: den 12 november 2018 22:27 > To: llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org; libcxx- > dev at lists.llvm.org; lldb-dev at lists.llvm.org; openmp-dev at lists.llvm.org; > libclc-dev at lists.llvm.org; clangd-dev at lists.llvm.org > Subject: [llvm-dev] [monorepo] Downstream branch zipping tool available > > Building on the great work that James Knight did on > migrate-downstream-fork.py (Thanks, James!) [1], I've created a simple > tool to take migrated downstream fork branches and zip them into a > single history given a history containing submodule updates of > subprojects [2]. > > With migrate-downstream-fork.py, one is left with a set of unrelated > histories, one per subproject: > > llvm clang compiler-rt > * V Add my fancy LLVM feature * G Fix my dumb clang bug * Z Merge > from upstream compiler-rt > > One can do an octopus merge to unify them: > > *-- Merge llvm, clang and compiler-rt > |\ \ > * \ \ V Add my fancy LLVM feature > | * | G Fix my dumb clang bug > | | * Z Merge from upstream compiler-rt > > Unfortunately, that doesn't show the logical history of development, > where changes were effectively applied to subprojects in a linear > fashion. This makes it more difficult to do bisects, among other things > because none of the downstream integration happens until the octopus > merge. > > Let's say that downstream you have a local mirror for each LLVM > subproject you work on. Suppose also that you have an "umbrella" > repository that holds submodule references to all those local mirrors. > Various commits in the umbrella update submodule references: > > * Update llvm submodule to V > * Update clang submodule to G > * Don't update any submodules, fix scripts or something > * Update compiler-rt submodule to Z > | > > zip-downstream-fork.py will take these submodule updates and "inline" > them into the umbrella history, making it appear that the downstream > commits were applied against the monorepo in the order implied by the > umbrella history: > > * A Add my fancy LLVM feature > * B Fix my dumb clang bug > * C Merge from upstream compiler-rt > | > > Parent relationships for merges from upstream are preserved, though as > top-level comments in zip-downstream-fork.py explain, the history graph > can look a little strange. Commits that don't update submodules are > skipped on the assumption that they modify things uninteresting to a > monorepo history. Such commits could be preserved but doing so has some > caveats as explained in the comments. Perhaps your umbrella repository > holds your build scripts. You'd probably want to migrate that to the > zipped history. If there's strong demand for this I could look into > doing it. > > There are various other limitations to the tool explained in the > comments. It was enough to get us going and I'm hopeful it will be > useful for others. It seems to do the right thing with our repositories > but YMMV. Feel free to open PRs with bug fixes. :) > > To get this to work, you'll need to apply a PR for > migrate-downstream-fork.py to fix issues with --revmap-out [3]. > > -David > > [1] https://github.com/jyknight/llvm-git-migration/blob/master/migrate- > downstream-fork.py > [2] https://github.com/jyknight/llvm-git- > migration/pull/2/commits/a3b44a294c20f1762cb42b5794e6130c5b27f22d > [3] https://github.com/jyknight/llvm-git-migration/pull/1 > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
David Greene via llvm-dev
2018-Dec-18 15:45 UTC
[llvm-dev] [monorepo] Downstream branch zipping tool available
Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes:> We have used llvm as the umbrella repo, so in llvm we have a "master" > branch (from the git single repo version of llvm) and a couple of > downstream branches (let's call them "down0", "down1") containing our > downstream work (with frequent merges from "master").Ok.> The downstream branches has tools/clang and runtimes/compiler-rt as > submodules, as well as a couple of downstream submodules.Ok.> In our downstream version of clang we have a similar structure. > A "master" branch (mapping to the git single repo version clang), > and a couple of downstream branches. The downstream branches has > tools/extra (i.e. clang-tools-extra) as a submodule.So the clang submodule in llvm has a submodule itself? I wasn't even aware that was possible.> I can also mention that the clang, compiler-rt and clang-tools-extra > submodules aren't present from the beginning of history. They have > been added later on.That shouldn't be a problem for the script. We have the same sort of history.> I doubt that zip-downstream-fork.py will work out-of-the-box. > Hopefully I'll be able to patch it for our scenario. Any guidelines > might be helpful. But maybe it isn't even worth trying to adapt > zip-downstream-fork.py to do something useful for our scenario?Yeah, non-submodule-update commits in the llvm repository would be droppped per this comment: # - The script assumes that any commits in the umbrella history that # do not update submodules should be discarded. It is not clear # what should happen if such a commit happens to touch files with # the same name as those in the monorepo (README files are typical). # Adding support to keep these commits should be straightforward, # but because decisions are likely to vary based on particular # setups, we just punt for now. This happens around line 288 in zip-downstream-fork.py: if self.prev_submodules == submodules: # This is a commit that modified some file in the umbrella and # didn't update any submodules.. Assume we don't want it. self.debug('No submodule updates') return self.substitute_commit(commit, githash) If you return commit here instead of doing substitute_commit it should retain the commit unaltered. That's not quite what you want for the monorepo, you want commits to llvm to appear under the llvm directory in the monorepo. The code to do that is in migrate-downstream-fork.py arount line 106 in commit_filter: # OK -- NOT an upstream commit: move the tree under the correct subdir, and # preserve everything outside that subdir. The tricky part is figuring out # *which* parent to get the rest of the tree (other than the named subproject) # from, in case of a merge. You could try to copy this verbatim into zip-downstream-fork.py or it could be factored out into a common library. If a significant number of people have a setup similar to yours, it may very well be worth doing that. You'd also need to add the check for upstream commits. Now that I think about it, what you really want is something that runs migrate-downstream-fork.py on the commits in llvm and something that runs zip-downstream-fork.py on commits in other projects, but they have to ruin simultaneously to keep the commits in the proper order. If both migrate-downstream-fork.py and zip-downstream-fork.py were refactored to put most of their code in a package/library, then a third tool could be created to do what you need. Obviously, that will take some work to accomplish. You'd also want James' guidance on changing migrate-downstream-fork.py. There are certain enhancements to zip-downstream-fork.py that I didn't make because I didn't want to mess with migrate-downstream-fork.py (see the comments at the top of zip-downstream-fork.py). zip-downstream-fork.py also doesn't consider submodules of other submodules. You can maybe get that to work by altering how find_submodules looks for submodule commits. It would have to recurse over the submodules it finds.> If someone else got a similar scenario, let me know. Perhaps we can > do some joint effort in adapting the zipper script.Unfortunately, I don't have any bandwidth to hack on this right now. I'm happy to answer questions, though. -David